Series comparison

-[Qemu-devel] [PULL 00/48] target-arm queue
+[PULL 00/39] target-arm queue
-Arm queue; the bulk of this is the VFP decodetree conversion...
+Most of this is the Neon decodetree patches, followed by Edgar's versal cleanups.
 thanks
 -- PMM
-The following changes since commit 4747524f9f243ca5ff1f146d37e423c00e923ee1:
-  Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2019-06-12' into staging (2019-06-13 11:58:00 +0100)
+The following changes since commit 2ef486e76d64436be90f7359a3071fb2a56ce835:
   Merge remote-tracking branch 'remotes/marcel/tags/rdma-pull-request' into staging (2020-05-03 14:12:56 +0100)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190613
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200504
-for you to fetch changes up to 07e4c7f769120c9a5bd6a26c2dc1421f2f838d80:
+for you to fetch changes up to 9aefc6cf9b73f66062d2f914a0136756e7a28211:
-  target/arm: Fix short-vector increment behaviour (2019-06-13 12:57:37 +0100)
+  target/arm: Move gen_ function typedefs to translate.h (2020-05-04 12:59:26 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * convert aarch32 VFP decoder to decodetree
+ * Start of conversion of Neon insns to decodetree
-   (includes tightening up decode in a few places)
+ * versal board: support SD and RTC
- * fix minor bugs in VFP short-vector handling
+ * Implement ARMv8.2-TTS2UXN
- * hw/core/bus.c: Only the main system bus can have no parent
+ * Make VQDMULL undefined when U=1
- * smmuv3: Fix decoding of ID register range
+ * Some minor code cleanups
  * Implement NSACR gating of floating point
  * Use tcg_gen_gvec_bitsel
  * Vectorize USHL and SSHL
 ----------------------------------------------------------------
-Peter Maydell (44):
+Edgar E. Iglesias (11):
-      target/arm: Implement NSACR gating of floating point
+      hw/arm: versal: Remove inclusion of arm_gicv3_common.h
-      hw/arm/smmuv3: Fix decoding of ID register range
+      hw/arm: versal: Move misplaced comment
-      hw/core/bus.c: Only the main system bus can have no parent
+      hw/arm: versal-virt: Fix typo xlnx-ve -> xlnx-versal
-      target/arm: Add stubs for AArch32 VFP decodetree
+      hw/arm: versal: Embed the UARTs into the SoC type
-      target/arm: Factor out VFP access checking code
+      hw/arm: versal: Embed the GEMs into the SoC type
-      target/arm: Fix Cortex-R5F MVFR values
+      hw/arm: versal: Embed the ADMAs into the SoC type
-      target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max
+      hw/arm: versal: Embed the APUs into the SoC type
-      target/arm: Convert the VSEL instructions to decodetree
+      hw/arm: versal: Add support for SD
-      target/arm: Convert VMINNM, VMAXNM to decodetree
+      hw/arm: versal: Add support for the RTC
-      target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
+      hw/arm: versal-virt: Add support for SD
-      target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree
+      hw/arm: versal-virt: Add support for the RTC
       target/arm: Move the VFP trans_* functions to translate-vfp.inc.c
       target/arm: Add helpers for VFP register loads and stores
       target/arm: Convert "double-precision" register moves to decodetree
       target/arm: Convert "single-precision" register moves to decodetree
       target/arm: Convert VFP two-register transfer insns to decodetree
       target/arm: Convert VFP VLDR and VSTR to decodetree
       target/arm: Convert the VFP load/store multiple insns to decodetree
       target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
       target/arm: Convert VFP VMLA to decodetree
       target/arm: Convert VFP VMLS to decodetree
       target/arm: Convert VFP VNMLS to decodetree
       target/arm: Convert VFP VNMLA to decodetree
       target/arm: Convert VMUL to decodetree
       target/arm: Convert VNMUL to decodetree
       target/arm: Convert VADD to decodetree
       target/arm: Convert VSUB to decodetree
       target/arm: Convert VDIV to decodetree
       target/arm: Convert VFP fused multiply-add insns to decodetree
       target/arm: Convert VMOV (imm) to decodetree
       target/arm: Convert VABS to decodetree
       target/arm: Convert VNEG to decodetree
       target/arm: Convert VSQRT to decodetree
       target/arm: Convert VMOV (register) to decodetree
       target/arm: Convert VFP comparison insns to decodetree
       target/arm: Convert the VCVT-from-f16 insns to decodetree
       target/arm: Convert the VCVT-to-f16 insns to decodetree
       target/arm: Convert VFP round insns to decodetree
       target/arm: Convert double-single precision conversion insns to decodetree
       target/arm: Convert integer-to-float insns to decodetree
       target/arm: Convert VJCVT to decodetree
       target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree
       target/arm: Convert float-to-integer VCVT insns to decodetree
       target/arm: Fix short-vector increment behaviour
-Richard Henderson (4):
+Fredrik Strupe (1):
-      target/arm: Vectorize USHL and SSHL
+      target/arm: Make VQDMULL undefined when U=1
       target/arm: Use tcg_gen_gvec_bitsel
       target/arm: Fix output of PAuth Auth
       decodetree: Fix comparison of Field
- target/arm/Makefile.objs          |   13 +
+Peter Maydell (25):
- tests/tcg/aarch64/Makefile.target |    2 +-
+      target/arm: Don't use a TLB for ARMMMUIdx_Stage2
- target/arm/cpu.h                  |   11 +
+      target/arm: Use enum constant in get_phys_addr_lpae() call
- target/arm/helper.h               |   11 +-
+      target/arm: Add new 's1_is_el0' argument to get_phys_addr_lpae()
- target/arm/translate-a64.h        |    2 +
+      target/arm: Implement ARMv8.2-TTS2UXN
- target/arm/translate.h            |    9 +-
+      target/arm: Use correct variable for setting 'max' cpu's ID_AA64DFR0
- hw/arm/smmuv3.c                   |    2 +-
+      target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
- hw/core/bus.c                     |   21 +-
+      target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
- target/arm/cpu.c                  |    6 +
+      target/arm: Add stubs for AArch32 Neon decodetree
- target/arm/helper.c               |   75 +-
+      target/arm: Convert VCMLA (vector) to decodetree
- target/arm/neon_helper.c          |   33 -
+      target/arm: Convert VCADD (vector) to decodetree
- target/arm/pauth_helper.c         |    4 +-
+      target/arm: Convert V[US]DOT (vector) to decodetree
- target/arm/translate-a64.c        |   33 +-
+      target/arm: Convert VFM[AS]L (vector) to decodetree
- target/arm/translate-vfp.inc.c    | 2672 +++++++++++++++++++++++++++++++++++++
+      target/arm: Convert VCMLA (scalar) to decodetree
- target/arm/translate.c            | 1881 +++++---------------------
+      target/arm: Convert V[US]DOT (scalar) to decodetree
- target/arm/vec_helper.c           |   88 ++
+      target/arm: Convert VFM[AS]L (scalar) to decodetree
- tests/tcg/aarch64/pauth-2.c       |   61 +
+      target/arm: Convert Neon load/store multiple structures to decodetree
- scripts/decodetree.py             |    2 +-
+      target/arm: Convert Neon 'load single structure to all lanes' to decodetree
- target/arm/vfp-uncond.decode      |   63 +
+      target/arm: Convert Neon 'load/store single structure' to decodetree
- target/arm/vfp.decode             |  242 ++++
+      target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
-files changed, 3593 insertions(+), 1638 deletions(-)
+      target/arm: Convert Neon 3-reg-same logic ops to decodetree
- create mode 100644 target/arm/translate-vfp.inc.c
+      target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
- create mode 100644 tests/tcg/aarch64/pauth-2.c
+      target/arm: Convert Neon 3-reg-same comparisons to decodetree
- create mode 100644 target/arm/vfp-uncond.decode
+      target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
- create mode 100644 target/arm/vfp.decode
+      target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
       target/arm: Move gen_ function typedefs to translate.h
+Philippe Mathieu-Daudé (2):
+      hw/arm/mps2-tz: Use TYPE_IOTKIT instead of hardcoded string
+      target/arm: Use uint64_t for midr field in CPU state struct
+ include/hw/arm/xlnx-versal.h    |  31 +-
+ target/arm/cpu-param.h          |   2 +-
+ target/arm/cpu.h                |  38 ++-
+ target/arm/translate-a64.h      |   9 -
+ target/arm/translate.h          |  26 ++
+ target/arm/neon-dp.decode       |  86 +++++
+ target/arm/neon-ls.decode       |  52 +++
+ target/arm/neon-shared.decode   |  66 ++++
+ hw/arm/mps2-tz.c                |   2 +-
+ hw/arm/xlnx-versal-virt.c       |  74 ++++-
+ hw/arm/xlnx-versal.c            | 115 +++++--
+ target/arm/cpu.c                |   3 +-
+ target/arm/cpu64.c              |   8 +-
+ target/arm/helper.c             | 183 ++++------
+ target/arm/translate-a64.c      |  17 -
+ target/arm/translate-neon.inc.c | 714 +++++++++++++++++++++++++++++++++++++++
+ target/arm/translate-vfp.inc.c  |   6 -
+ target/arm/translate.c          | 716 +++-------------------------------------
+ target/arm/Makefile.objs        |  18 +
+files changed, 1302 insertions(+), 864 deletions(-)
+ create mode 100644 target/arm/neon-dp.decode
+ create mode 100644 target/arm/neon-ls.decode
+ create mode 100644 target/arm/neon-shared.decode
+ create mode 100644 target/arm/translate-neon.inc.c

-[Qemu-devel] [PULL 01/48] target/arm: Vectorize USHL and SSHL
+[PULL 01/39] target/arm: Make VQDMULL undefined when U=1
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Fredrik Strupe <fredrik@strupe.net>
-These instructions shift left or right depending on the sign
+According to Arm ARM, VQDMULL is only valid when U=0, while having
-of the input, and 7 bits are significant to the shift.  This
+U=1 is unallocated.
 requires several masks and selects in addition to the actual
 shifts to form the complete answer.
-That said, the operation is still a small improvement even for
+Signed-off-by: Fredrik Strupe <fredrik@strupe.net>
-two 64-bit elements -- 13 vector operations instead of 2 * 7
+Fixes: 695272dcb976 ("target-arm: Handle UNDEF cases for Neon 3-regs-different-widths")
 integer operations.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20190603232209.20704-1-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.h        |  11 +-
+ target/arm/translate.c | 2 +-
- target/arm/translate.h     |   6 +
+file changed, 1 insertion(+), 1 deletion(-)
  target/arm/neon_helper.c   |  33 ----
  target/arm/translate-a64.c |  18 +--
  target/arm/translate.c     | 300 +++++++++++++++++++++++++++++++++++--
  target/arm/vec_helper.c    |  88 +++++++++++
 files changed, 390 insertions(+), 66 deletions(-)
-diff --git a/target/arm/helper.h b/target/arm/helper.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
-+++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32)
- DEF_HELPER_2(neon_abd_u32, i32, i32, i32)
- DEF_HELPER_2(neon_abd_s32, i32, i32, i32)
--DEF_HELPER_2(neon_shl_u8, i32, i32, i32)
--DEF_HELPER_2(neon_shl_s8, i32, i32, i32)
- DEF_HELPER_2(neon_shl_u16, i32, i32, i32)
- DEF_HELPER_2(neon_shl_s16, i32, i32, i32)
--DEF_HELPER_2(neon_shl_u32, i32, i32, i32)
--DEF_HELPER_2(neon_shl_s32, i32, i32, i32)
--DEF_HELPER_2(neon_shl_u64, i64, i64, i64)
--DEF_HELPER_2(neon_shl_s64, i64, i64, i64)
- DEF_HELPER_2(neon_rshl_u8, i32, i32, i32)
- DEF_HELPER_2(neon_rshl_s8, i32, i32, i32)
- DEF_HELPER_2(neon_rshl_u16, i32, i32, i32)
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, ptr)
- DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr)
- DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr)
-+DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+
- #ifdef TARGET_AARCH64
- #include "helper-a64.h"
- #include "helper-sve.h"
-diff --git a/target/arm/translate.h b/target/arm/translate.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
-+++ b/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@ extern const GVecGen3 bif_op;
- extern const GVecGen3 mla_op[4];
- extern const GVecGen3 mls_op[4];
- extern const GVecGen3 cmtst_op[4];
-+extern const GVecGen3 sshl_op[4];
-+extern const GVecGen3 ushl_op[4];
- extern const GVecGen2i ssra_op[4];
- extern const GVecGen2i usra_op[4];
- extern const GVecGen2i sri_op[4];
-@@ -XXX,XX +XXX,XX @@ extern const GVecGen4 sqadd_op[4];
- extern const GVecGen4 uqsub_op[4];
- extern const GVecGen4 sqsub_op[4];
- void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
-+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
-+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
-+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
-+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
- /*
-  * Forward to the isar_feature_* tests given a DisasContext pointer.
-diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon_helper.c
-+++ b/target/arm/neon_helper.c
-@@ -XXX,XX +XXX,XX @@ NEON_VOP(abd_u32, neon_u32, 1)
-     } else { \
-         dest = src1 << tmp; \
-     }} while (0)
--NEON_VOP(shl_u8, neon_u8, 4)
- NEON_VOP(shl_u16, neon_u16, 2)
--NEON_VOP(shl_u32, neon_u32, 1)
- #undef NEON_FN
--uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
--{
--    int8_t shift = (int8_t)shiftop;
--    if (shift >= 64 || shift <= -64) {
--        val = 0;
--    } else if (shift < 0) {
--        val >>= -shift;
--    } else {
--        val <<= shift;
--    }
--    return val;
--}
--
- #define NEON_FN(dest, src1, src2) do { \
-     int8_t tmp; \
-     tmp = (int8_t)src2; \
-@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
-     } else { \
-         dest = src1 << tmp; \
-     }} while (0)
--NEON_VOP(shl_s8, neon_s8, 4)
- NEON_VOP(shl_s16, neon_s16, 2)
--NEON_VOP(shl_s32, neon_s32, 1)
- #undef NEON_FN
--uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
--{
--    int8_t shift = (int8_t)shiftop;
--    int64_t val = valop;
--    if (shift >= 64) {
--        val = 0;
--    } else if (shift <= -64) {
--        val >>= 63;
--    } else if (shift < 0) {
--        val >>= -shift;
--    } else {
--        val <<= shift;
--    }
--    return val;
--}
--
- #define NEON_FN(dest, src1, src2) do { \
-     int8_t tmp; \
-     tmp = (int8_t)src2; \
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
-+++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
-         break;
-     case 0x8: /* SSHL, USHL */
-         if (u) {
--            gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm);
-+            gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm);
-         } else {
--            gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm);
-+            gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm);
-         }
-         break;
-     case 0x9: /* SQSHL, UQSHL */
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
-                        is_q ? 16 : 8, vec_full_reg_size(s),
-                        (u ? uqsub_op : sqsub_op) + size);
-         return;
-+    case 0x08: /* SSHL, USHL */
-+        gen_gvec_op3(s, is_q, rd, rn, rm,
-+                     u ? &ushl_op[size] : &sshl_op[size]);
-+        return;
-     case 0x0c: /* SMAX, UMAX */
-         if (u) {
-             gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
-                 genfn = fns[size][u];
-                 break;
-             }
--            case 0x8: /* SSHL, USHL */
--            {
--                static NeonGenTwoOpFn * const fns[3][2] = {
--                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
--                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
--                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
--                };
--                genfn = fns[size][u];
--                break;
--            }
-             case 0x9: /* SQSHL, UQSHL */
-             {
-                 static NeonGenTwoOpEnvFn * const fns[3][2] = {
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_shift_narrow(int size, TCGv_i32 var, TCGv_i32 shift,
-         if (u) {
-             switch (size) {
-             case 1: gen_helper_neon_shl_u16(var, var, shift); break;
--            case 2: gen_helper_neon_shl_u32(var, var, shift); break;
-+            case 2: gen_ushl_i32(var, var, shift); break;
-             default: abort();
-             }
-         } else {
-             switch (size) {
-             case 1: gen_helper_neon_shl_s16(var, var, shift); break;
--            case 2: gen_helper_neon_shl_s32(var, var, shift); break;
-+            case 2: gen_sshl_i32(var, var, shift); break;
-             default: abort();
-             }
-         }
-@@ -XXX,XX +XXX,XX @@ const GVecGen3 cmtst_op[4] = {
-       .vece = MO_64 },
- };
-+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
-+{
-+    TCGv_i32 lval = tcg_temp_new_i32();
-+    TCGv_i32 rval = tcg_temp_new_i32();
-+    TCGv_i32 lsh = tcg_temp_new_i32();
-+    TCGv_i32 rsh = tcg_temp_new_i32();
-+    TCGv_i32 zero = tcg_const_i32(0);
-+    TCGv_i32 max = tcg_const_i32(32);
-+
-+    /*
-+     * Rely on the TCG guarantee that out of range shifts produce
-+     * unspecified results, not undefined behaviour (i.e. no trap).
-+     * Discard out-of-range results after the fact.
-+     */
-+    tcg_gen_ext8s_i32(lsh, b);
-+    tcg_gen_neg_i32(rsh, lsh);
-+    tcg_gen_shl_i32(lval, a, lsh);
-+    tcg_gen_shr_i32(rval, a, rsh);
-+    tcg_gen_movcond_i32(TCG_COND_LTU, d, lsh, max, lval, zero);
-+    tcg_gen_movcond_i32(TCG_COND_LTU, d, rsh, max, rval, d);
-+
-+    tcg_temp_free_i32(lval);
-+    tcg_temp_free_i32(rval);
-+    tcg_temp_free_i32(lsh);
-+    tcg_temp_free_i32(rsh);
-+    tcg_temp_free_i32(zero);
-+    tcg_temp_free_i32(max);
-+}
-+
-+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
-+{
-+    TCGv_i64 lval = tcg_temp_new_i64();
-+    TCGv_i64 rval = tcg_temp_new_i64();
-+    TCGv_i64 lsh = tcg_temp_new_i64();
-+    TCGv_i64 rsh = tcg_temp_new_i64();
-+    TCGv_i64 zero = tcg_const_i64(0);
-+    TCGv_i64 max = tcg_const_i64(64);
-+
-+    /*
-+     * Rely on the TCG guarantee that out of range shifts produce
-+     * unspecified results, not undefined behaviour (i.e. no trap).
-+     * Discard out-of-range results after the fact.
-+     */
-+    tcg_gen_ext8s_i64(lsh, b);
-+    tcg_gen_neg_i64(rsh, lsh);
-+    tcg_gen_shl_i64(lval, a, lsh);
-+    tcg_gen_shr_i64(rval, a, rsh);
-+    tcg_gen_movcond_i64(TCG_COND_LTU, d, lsh, max, lval, zero);
-+    tcg_gen_movcond_i64(TCG_COND_LTU, d, rsh, max, rval, d);
-+
-+    tcg_temp_free_i64(lval);
-+    tcg_temp_free_i64(rval);
-+    tcg_temp_free_i64(lsh);
-+    tcg_temp_free_i64(rsh);
-+    tcg_temp_free_i64(zero);
-+    tcg_temp_free_i64(max);
-+}
-+
-+static void gen_ushl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
-+{
-+    TCGv_vec lval = tcg_temp_new_vec_matching(d);
-+    TCGv_vec rval = tcg_temp_new_vec_matching(d);
-+    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
-+    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
-+    TCGv_vec msk, max;
-+
-+    /*
-+     * Rely on the TCG guarantee that out of range shifts produce
-+     * unspecified results, not undefined behaviour (i.e. no trap).
-+     * Discard out-of-range results after the fact.
-+     */
-+    tcg_gen_neg_vec(vece, rsh, b);
-+    if (vece == MO_8) {
-+        tcg_gen_mov_vec(lsh, b);
-+    } else {
-+        msk = tcg_temp_new_vec_matching(d);
-+        tcg_gen_dupi_vec(vece, msk, 0xff);
-+        tcg_gen_and_vec(vece, lsh, b, msk);
-+        tcg_gen_and_vec(vece, rsh, rsh, msk);
-+        tcg_temp_free_vec(msk);
-+    }
-+
-+    /*
-+     * Perform possibly out of range shifts, trusting that the operation
-+     * does not trap.  Discard unused results after the fact.
-+     */
-+    tcg_gen_shlv_vec(vece, lval, a, lsh);
-+    tcg_gen_shrv_vec(vece, rval, a, rsh);
-+
-+    max = tcg_temp_new_vec_matching(d);
-+    tcg_gen_dupi_vec(vece, max, 8 << vece);
-+
-+    /*
-+     * The choice of LT (signed) and GEU (unsigned) are biased toward
-+     * the instructions of the x86_64 host.  For MO_8, the whole byte
-+     * is significant so we must use an unsigned compare; otherwise we
-+     * have already masked to a byte and so a signed compare works.
-+     * Other tcg hosts have a full set of comparisons and do not care.
-+     */
-+    if (vece == MO_8) {
-+        tcg_gen_cmp_vec(TCG_COND_GEU, vece, lsh, lsh, max);
-+        tcg_gen_cmp_vec(TCG_COND_GEU, vece, rsh, rsh, max);
-+        tcg_gen_andc_vec(vece, lval, lval, lsh);
-+        tcg_gen_andc_vec(vece, rval, rval, rsh);
-+    } else {
-+        tcg_gen_cmp_vec(TCG_COND_LT, vece, lsh, lsh, max);
-+        tcg_gen_cmp_vec(TCG_COND_LT, vece, rsh, rsh, max);
-+        tcg_gen_and_vec(vece, lval, lval, lsh);
-+        tcg_gen_and_vec(vece, rval, rval, rsh);
-+    }
-+    tcg_gen_or_vec(vece, d, lval, rval);
-+
-+    tcg_temp_free_vec(max);
-+    tcg_temp_free_vec(lval);
-+    tcg_temp_free_vec(rval);
-+    tcg_temp_free_vec(lsh);
-+    tcg_temp_free_vec(rsh);
-+}
-+
-+static const TCGOpcode ushl_list[] = {
-+    INDEX_op_neg_vec, INDEX_op_shlv_vec,
-+    INDEX_op_shrv_vec, INDEX_op_cmp_vec, 0
-+};
-+
-+const GVecGen3 ushl_op[4] = {
-+    { .fniv = gen_ushl_vec,
-+      .fno = gen_helper_gvec_ushl_b,
-+      .opt_opc = ushl_list,
-+      .vece = MO_8 },
-+    { .fniv = gen_ushl_vec,
-+      .fno = gen_helper_gvec_ushl_h,
-+      .opt_opc = ushl_list,
-+      .vece = MO_16 },
-+    { .fni4 = gen_ushl_i32,
-+      .fniv = gen_ushl_vec,
-+      .opt_opc = ushl_list,
-+      .vece = MO_32 },
-+    { .fni8 = gen_ushl_i64,
-+      .fniv = gen_ushl_vec,
-+      .opt_opc = ushl_list,
-+      .vece = MO_64 },
-+};
-+
-+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
-+{
-+    TCGv_i32 lval = tcg_temp_new_i32();
-+    TCGv_i32 rval = tcg_temp_new_i32();
-+    TCGv_i32 lsh = tcg_temp_new_i32();
-+    TCGv_i32 rsh = tcg_temp_new_i32();
-+    TCGv_i32 zero = tcg_const_i32(0);
-+    TCGv_i32 max = tcg_const_i32(31);
-+
-+    /*
-+     * Rely on the TCG guarantee that out of range shifts produce
-+     * unspecified results, not undefined behaviour (i.e. no trap).
-+     * Discard out-of-range results after the fact.
-+     */
-+    tcg_gen_ext8s_i32(lsh, b);
-+    tcg_gen_neg_i32(rsh, lsh);
-+    tcg_gen_shl_i32(lval, a, lsh);
-+    tcg_gen_umin_i32(rsh, rsh, max);
-+    tcg_gen_sar_i32(rval, a, rsh);
-+    tcg_gen_movcond_i32(TCG_COND_LEU, lval, lsh, max, lval, zero);
-+    tcg_gen_movcond_i32(TCG_COND_LT, d, lsh, zero, rval, lval);
-+
-+    tcg_temp_free_i32(lval);
-+    tcg_temp_free_i32(rval);
-+    tcg_temp_free_i32(lsh);
-+    tcg_temp_free_i32(rsh);
-+    tcg_temp_free_i32(zero);
-+    tcg_temp_free_i32(max);
-+}
-+
-+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
-+{
-+    TCGv_i64 lval = tcg_temp_new_i64();
-+    TCGv_i64 rval = tcg_temp_new_i64();
-+    TCGv_i64 lsh = tcg_temp_new_i64();
-+    TCGv_i64 rsh = tcg_temp_new_i64();
-+    TCGv_i64 zero = tcg_const_i64(0);
-+    TCGv_i64 max = tcg_const_i64(63);
-+
-+    /*
-+     * Rely on the TCG guarantee that out of range shifts produce
-+     * unspecified results, not undefined behaviour (i.e. no trap).
-+     * Discard out-of-range results after the fact.
-+     */
-+    tcg_gen_ext8s_i64(lsh, b);
-+    tcg_gen_neg_i64(rsh, lsh);
-+    tcg_gen_shl_i64(lval, a, lsh);
-+    tcg_gen_umin_i64(rsh, rsh, max);
-+    tcg_gen_sar_i64(rval, a, rsh);
-+    tcg_gen_movcond_i64(TCG_COND_LEU, lval, lsh, max, lval, zero);
-+    tcg_gen_movcond_i64(TCG_COND_LT, d, lsh, zero, rval, lval);
-+
-+    tcg_temp_free_i64(lval);
-+    tcg_temp_free_i64(rval);
-+    tcg_temp_free_i64(lsh);
-+    tcg_temp_free_i64(rsh);
-+    tcg_temp_free_i64(zero);
-+    tcg_temp_free_i64(max);
-+}
-+
-+static void gen_sshl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
-+{
-+    TCGv_vec lval = tcg_temp_new_vec_matching(d);
-+    TCGv_vec rval = tcg_temp_new_vec_matching(d);
-+    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
-+    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
-+    TCGv_vec tmp = tcg_temp_new_vec_matching(d);
-+
-+    /*
-+     * Rely on the TCG guarantee that out of range shifts produce
-+     * unspecified results, not undefined behaviour (i.e. no trap).
-+     * Discard out-of-range results after the fact.
-+     */
-+    tcg_gen_neg_vec(vece, rsh, b);
-+    if (vece == MO_8) {
-+        tcg_gen_mov_vec(lsh, b);
-+    } else {
-+        tcg_gen_dupi_vec(vece, tmp, 0xff);
-+        tcg_gen_and_vec(vece, lsh, b, tmp);
-+        tcg_gen_and_vec(vece, rsh, rsh, tmp);
-+    }
-+
-+    /* Bound rsh so out of bound right shift gets -1.  */
-+    tcg_gen_dupi_vec(vece, tmp, (8 << vece) - 1);
-+    tcg_gen_umin_vec(vece, rsh, rsh, tmp);
-+    tcg_gen_cmp_vec(TCG_COND_GT, vece, tmp, lsh, tmp);
-+
-+    tcg_gen_shlv_vec(vece, lval, a, lsh);
-+    tcg_gen_sarv_vec(vece, rval, a, rsh);
-+
-+    /* Select in-bound left shift.  */
-+    tcg_gen_andc_vec(vece, lval, lval, tmp);
-+
-+    /* Select between left and right shift.  */
-+    if (vece == MO_8) {
-+        tcg_gen_dupi_vec(vece, tmp, 0);
-+        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, rval, lval);
-+    } else {
-+        tcg_gen_dupi_vec(vece, tmp, 0x80);
-+        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, lval, rval);
-+    }
-+
-+    tcg_temp_free_vec(lval);
-+    tcg_temp_free_vec(rval);
-+    tcg_temp_free_vec(lsh);
-+    tcg_temp_free_vec(rsh);
-+    tcg_temp_free_vec(tmp);
-+}
-+
-+static const TCGOpcode sshl_list[] = {
-+    INDEX_op_neg_vec, INDEX_op_umin_vec, INDEX_op_shlv_vec,
-+    INDEX_op_sarv_vec, INDEX_op_cmp_vec, INDEX_op_cmpsel_vec, 0
-+};
-+
-+const GVecGen3 sshl_op[4] = {
-+    { .fniv = gen_sshl_vec,
-+      .fno = gen_helper_gvec_sshl_b,
-+      .opt_opc = sshl_list,
-+      .vece = MO_8 },
-+    { .fniv = gen_sshl_vec,
-+      .fno = gen_helper_gvec_sshl_h,
-+      .opt_opc = sshl_list,
-+      .vece = MO_16 },
-+    { .fni4 = gen_sshl_i32,
-+      .fniv = gen_sshl_vec,
-+      .opt_opc = sshl_list,
-+      .vece = MO_32 },
-+    { .fni8 = gen_sshl_i64,
-+      .fniv = gen_sshl_vec,
-+      .opt_opc = sshl_list,
-+      .vece = MO_64 },
-+};
-+
- static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
-                           TCGv_vec a, TCGv_vec b)
- {
 @@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                                   vec_size, vec_size);
+                     {0, 0, 0, 0}, /* VMLSL */
-             }
+                     {0, 0, 0, 9}, /* VQDMLSL */
-             return 0;
+                     {0, 0, 0, 0}, /* Integer VMULL */
-+
+-                    {0, 0, 0, 1}, /* VQDMULL */
-+        case NEON_3R_VSHL:
++                    {0, 0, 0, 9}, /* VQDMULL */
-+            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
+                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
-+                           u ? &ushl_op[size] : &sshl_op[size]);
+                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
-+            return 0;
+                 };
          }
          if (size == 3) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  neon_load_reg64(cpu_V0, rn + pass);
                  neon_load_reg64(cpu_V1, rm + pass);
                  switch (op) {
 -                case NEON_3R_VSHL:
 -                    if (u) {
 -                        gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0);
 -                    } else {
 -                        gen_helper_neon_shl_s64(cpu_V0, cpu_V1, cpu_V0);
 -                    }
 -                    break;
                  case NEON_3R_VQSHL:
                      if (u) {
                          gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          }
          pairwise = 0;
          switch (op) {
 -        case NEON_3R_VSHL:
          case NEON_3R_VQSHL:
          case NEON_3R_VRSHL:
          case NEON_3R_VQRSHL:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          case NEON_3R_VHSUB:
              GEN_NEON_INTEGER_OP(hsub);
              break;
 -        case NEON_3R_VSHL:
 -            GEN_NEON_INTEGER_OP(shl);
 -            break;
          case NEON_3R_VQSHL:
              GEN_NEON_INTEGER_OP_ENV(qshl);
              break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              }
                          } else {
                              if (input_unsigned) {
 -                                gen_helper_neon_shl_u64(cpu_V0, in, tmp64);
 +                                gen_ushl_i64(cpu_V0, in, tmp64);
                              } else {
 -                                gen_helper_neon_shl_s64(cpu_V0, in, tmp64);
 +                                gen_sshl_i64(cpu_V0, in, tmp64);
                              }
                          }
                          tmp = tcg_temp_new_i32();
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
      do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status, desc,
                   get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
  }
 +
 +void HELPER(gvec_sshl_b)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    int8_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz; ++i) {
 +        int8_t mm = m[i];
 +        int8_t nn = n[i];
 +        int8_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 8) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            res = nn >> (mm > -8 ? -mm : 7);
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_sshl_h)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    int16_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 2; ++i) {
 +        int8_t mm = m[i];   /* only 8 bits of shift are significant */
 +        int16_t nn = n[i];
 +        int16_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 16) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            res = nn >> (mm > -16 ? -mm : 15);
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_ushl_b)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    uint8_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz; ++i) {
 +        int8_t mm = m[i];
 +        uint8_t nn = n[i];
 +        uint8_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 8) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            if (mm > -8) {
 +                res = nn >> -mm;
 +            }
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    uint16_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 2; ++i) {
 +        int8_t mm = m[i];   /* only 8 bits of shift are significant */
 +        uint16_t nn = n[i];
 +        uint16_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 16) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            if (mm > -16) {
 +                res = nn >> -mm;
 +            }
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
 .20.1

-[Qemu-devel] [PULL 48/48] target/arm: Fix short-vector increment behaviour
+[PULL 02/39] hw/arm/mps2-tz: Use TYPE_IOTKIT instead of hardcoded string
-For VFP short vectors, the VFP registers are divided into a
+From: Philippe Mathieu-Daudé <f4bug@amsat.org>
 series of banks: for single-precision these are s0-s7, s8-s15,
 s16-s23 and s24-s31; for double-precision they are d0-d3,
 d4-d7, ... d28-d31. Some banks are "scalar" meaning that
 use of a register within them triggers a pure-scalar or
 mixed vector-scalar operation rather than a full vector
 operation. The scalar banks are s0-s7, d0-d3 and d16-d19.
 When using a bank as part of a vector operation, we
 iterate through it, increasing the register number by
 the specified stride each time, and wrapping around to
 the beginning of the bank.
-Unfortunately our calculation of the "increment" part of this
+By using the TYPE_* definitions for devices, we can:
-was incorrect:
+ - quickly find where devices are used with 'git-grep'
- vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask)
+ - easily rename a device (one-line change).
 will only do the intended thing if bank_mask has exactly
 one set high bit. For instance for doubles (bank_mask = 0xc),
 if we start with vd = 6 and delta_d = 2 then vd is updated
 to 12 rather than the intended 4.
-This only causes problems in the unlikely case that the
+Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-starting register is not the first in its bank: if the
+Message-id: 20200428154650.21991-1-f4bug@amsat.org
-register number doesn't have to wrap around then the
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-expression happens to give the right answer.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/mps2-tz.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-Fix this bug by abstracting out the "check whether register
+diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
 is in a scalar bank" and "advance register within bank"
 operations to utility functions which use the right
 bit masking operations.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++-------------
 file changed, 60 insertions(+), 40 deletions(-)
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/arm/mps2-tz.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/arm/mps2-tz.c
-@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpDPFn(TCGv_i64 vd,
+@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
- typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
+         exit(EXIT_FAILURE);
  typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
 +/*
 + * Return true if the specified S reg is in a scalar bank
 + * (ie if it is s0..s7)
 + */
 +static inline bool vfp_sreg_is_scalar(int reg)
 +{
 +    return (reg & 0x18) == 0;
 +}
 +
 +/*
 + * Return true if the specified D reg is in a scalar bank
 + * (ie if it is d0..d3 or d16..d19)
 + */
 +static inline bool vfp_dreg_is_scalar(int reg)
 +{
 +    return (reg & 0xc) == 0;
 +}
 +
 +/*
 + * Advance the S reg number forwards by delta within its bank
 + * (ie increment the low 3 bits but leave the rest the same)
 + */
 +static inline int vfp_advance_sreg(int reg, int delta)
 +{
 +    return ((reg + delta) & 0x7) | (reg & ~0x7);
 +}
 +
 +/*
 + * Advance the D reg number forwards by delta within its bank
 + * (ie increment the low 2 bits but leave the rest the same)
 + */
 +static inline int vfp_advance_dreg(int reg, int delta)
 +{
 +    return ((reg + delta) & 0x3) | (reg & ~0x3);
 +}
 +
  /*
   * Perform a 3-operand VFP data processing instruction. fn is the
   * callback to do the actual operation; this function deals with the
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
  {
      uint32_t delta_m = 0;
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i32 f0, f1, fd;
      TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
      }
-     if (veclen > 0) {
+-    sysbus_init_child_obj(OBJECT(machine), "iotkit", &mms->iotkit,
--        bank_mask = 0x18;
++    sysbus_init_child_obj(OBJECT(machine), TYPE_IOTKIT, &mms->iotkit,
--
+                           sizeof(mms->iotkit), mmc->armsse_type);
-         /* Figure out what type of vector operation this is.  */
+     iotkitdev = DEVICE(&mms->iotkit);
--        if ((vd & bank_mask) == 0) {
+     object_property_set_link(OBJECT(&mms->iotkit), OBJECT(system_memory),
 +        if (vfp_sreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
              delta_d = s->vec_stride + 1;
 -            if ((vm & bank_mask) == 0) {
 +            if (vfp_sreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        vd = vfp_advance_sreg(vd, delta_d);
 +        vn = vfp_advance_sreg(vn, delta_d);
          neon_load_reg32(f0, vn);
          if (delta_m) {
 -            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            vm = vfp_advance_sreg(vm, delta_m);
              neon_load_reg32(f1, vm);
          }
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
  {
      uint32_t delta_m = 0;
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i64 f0, f1, fd;
      TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
      }
      if (veclen > 0) {
 -        bank_mask = 0xc;
 -
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_dreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
              delta_d = (s->vec_stride >> 1) + 1;
 -            if ((vm & bank_mask) == 0) {
 +            if (vfp_dreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
          }
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        vd = vfp_advance_dreg(vd, delta_d);
 +        vn = vfp_advance_dreg(vn, delta_d);
          neon_load_reg64(f0, vn);
          if (delta_m) {
 -            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            vm = vfp_advance_dreg(vm, delta_m);
              neon_load_reg64(f1, vm);
          }
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
  {
      uint32_t delta_m = 0;
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i32 f0, fd;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      }
      if (veclen > 0) {
 -        bank_mask = 0x18;
 -
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_sreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
              delta_d = s->vec_stride + 1;
 -            if ((vm & bank_mask) == 0) {
 +            if (vfp_sreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
          if (delta_m == 0) {
              /* single source one-many */
              while (veclen--) {
 -                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +                vd = vfp_advance_sreg(vd, delta_d);
                  neon_store_reg32(fd, vd);
              }
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +        vd = vfp_advance_sreg(vd, delta_d);
 +        vm = vfp_advance_sreg(vm, delta_m);
          neon_load_reg32(f0, vm);
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
  {
      uint32_t delta_m = 0;
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i64 f0, fd;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
      }
      if (veclen > 0) {
 -        bank_mask = 0xc;
 -
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_dreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
              delta_d = (s->vec_stride >> 1) + 1;
 -            if ((vm & bank_mask) == 0) {
 +            if (vfp_dreg_is_scalar(vm)) {
                  /* mixed scalar/vector */
                  delta_m = 0;
              } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          if (delta_m == 0) {
              /* single source one-many */
              while (veclen--) {
 -                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +                vd = vfp_advance_dreg(vd, delta_d);
                  neon_store_reg64(fd, vd);
              }
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 -        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +        vd = vfp_advance_dreg(vd, delta_d);
 +        vd = vfp_advance_dreg(vm, delta_m);
          neon_load_reg64(f0, vm);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
  static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
  {
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i32 fd;
      uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
      }
      if (veclen > 0) {
 -        bank_mask = 0x18;
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_sreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vd = vfp_advance_sreg(vd, delta_d);
      }
      tcg_temp_free_i32(fd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
  static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
  {
      uint32_t delta_d = 0;
 -    uint32_t bank_mask = 0;
      int veclen = s->vec_len;
      TCGv_i64 fd;
      uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      }
      if (veclen > 0) {
 -        bank_mask = 0xc;
          /* Figure out what type of vector operation this is.  */
 -        if ((vd & bank_mask) == 0) {
 +        if (vfp_dreg_is_scalar(vd)) {
              /* scalar */
              veclen = 0;
          } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
          /* Set up the operands for the next iteration */
          veclen--;
 -        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vfp_advance_dreg(vd, delta_d);
      }
      tcg_temp_free_i64(fd);
 --
 .20.1

-[Qemu-devel] [PULL 12/48] target/arm: Convert the VSEL instructions to decodetree
+[PULL 03/39] target/arm: Don't use a TLB for ARMMMUIdx_Stage2
-Convert the VSEL instructions to decodetree.
+We define ARMMMUIdx_Stage2 as being an MMU index which uses a QEMU
-We leave trans_VSEL() in translate.c for now as this allows
+TLB.  However we never actually use the TLB -- all stage 2 lookups
-the patch to show just the changes from the old handle_vsel().
+are done by direct calls to get_phys_addr_lpae() followed by a
+physical address load via address_space_ld*().
-In the old code the check for "do D16-D31 exist" was hidden in
-the VFP_DREG macro, and assumed that VFPv3 always implied that
+Remove Stage2 from the list of ARM MMU indexes which correspond to
-D16-D31 exist. In the new code we do the correct ID register test.
+real core MMU indexes, and instead put it in the set of "NOTLB" ARM
-This gives identical behaviour for most of our CPUs, and fixes
+MMU indexes.
-previously incorrect handling for  Cortex-R5F, Cortex-M4 and
-Cortex-M33, which all implement VFPv3 or better with only 16
+This allows us to drop NB_MMU_MODES to 11.  It also means we can
-double-precision registers.
+safely add support for the ARMv8.3-TTS2UXN extension, which adds
 permission bits to the stage 2 descriptors which define execute
 permission separatel for EL0 and EL1; supporting that while keeping
 Stage2 in a QEMU TLB would require us to use separate TLBs for
 "Stage2 for an EL0 access" and "Stage2 for an EL1 access", which is a
 lot of extra complication given we aren't even using the QEMU TLB.
 In the process of updating the comment on our MMU index use,
 fix a couple of other minor errors:
  * NS EL2 EL2&0 was missing from the list in the comment
  * some text hadn't been updated from when we bumped NB_MMU_MODES
    above 8
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200330210400.11724-2-peter.maydell@linaro.org
 ---
- target/arm/cpu.h               |  6 ++++++
+ target/arm/cpu-param.h |   2 +-
- target/arm/translate-vfp.inc.c |  9 +++++++++
+ target/arm/cpu.h       |  21 +++++---
- target/arm/translate.c         | 35 ++++++++++++++++++++++++----------
+ target/arm/helper.c    | 112 ++++-------------------------------------
- target/arm/vfp-uncond.decode   | 19 ++++++++++++++++++
+files changed, 27 insertions(+), 108 deletions(-)
-files changed, 59 insertions(+), 10 deletions(-)
+diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu-param.h
 +++ b/target/arm/cpu-param.h
@@ -XXX,XX +XXX,XX @@
  # define TARGET_PAGE_BITS_MIN  10
  #endif
 -#define NB_MMU_MODES 12
 +#define NB_MMU_MODES 11
  #endif
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
-     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
+  *     handling via the TLB. The only way to do a stage 1 translation without
   *     the immediate stage 2 translation is via the ATS or AT system insns,
   *     which can be slow-pathed and always do a page table walk.
 + *     The only use of stage 2 translations is either as part of an s1+2
 + *     lookup or when loading the descriptors during a stage 1 page table walk,
 + *     and in both those cases we don't use the TLB.
   *  4. we can also safely fold together the "32 bit EL3" and "64 bit EL3"
   *     translation regimes, because they map reasonably well to each other
   *     and they can't both be active at the same time.
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
   * NS EL1 EL1&0 stage 1+2 (aka NS PL1)
   * NS EL1 EL1&0 stage 1+2 +PAN
   * NS EL0 EL2&0
 + * NS EL2 EL2&0
   * NS EL2 EL2&0 +PAN
   * NS EL2 (aka NS PL2)
   * S EL0 EL1&0 (aka S PL0)
   * S EL1 EL1&0 (not used if EL3 is 32 bit)
   * S EL1 EL1&0 +PAN
   * S EL3 (aka S PL1)
 - * NS EL1&0 stage 2
   *
 - * for a total of 12 different mmu_idx.
 + * for a total of 11 different mmu_idx.
   *
   * R profile CPUs have an MPU, but can use the same set of MMU indexes
   * as A profile. They only need to distinguish NS EL0 and NS EL1 (and
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
   * are not quite the same -- different CPU types (most notably M profile
   * vs A/R profile) would like to use MMU indexes with different semantics,
   * but since we don't ever need to use all of those in a single CPU we
 - * can avoid setting NB_MMU_MODES to more than 8. The lower bits of
 + * can avoid having to set NB_MMU_MODES to "total number of A profile MMU
 + * modes + total number of M profile MMU modes". The lower bits of
   * ARMMMUIdx are the core TLB mmu index, and the higher bits are always
   * the same for any particular CPU.
   * Variables of type ARMMUIdx are always full values, and the core
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
      ARMMMUIdx_SE10_1_PAN = 9 | ARM_MMU_IDX_A,
      ARMMMUIdx_SE3        = 10 | ARM_MMU_IDX_A,
 -    ARMMMUIdx_Stage2     = 11 | ARM_MMU_IDX_A,
 -
      /*
       * These are not allocated TLBs and are used only for AT system
       * instructions or for the first stage of an S12 page table walk.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
      ARMMMUIdx_Stage1_E0 = 0 | ARM_MMU_IDX_NOTLB,
      ARMMMUIdx_Stage1_E1 = 1 | ARM_MMU_IDX_NOTLB,
      ARMMMUIdx_Stage1_E1_PAN = 2 | ARM_MMU_IDX_NOTLB,
 +    /*
 +     * Not allocated a TLB: used only for second stage of an S12 page
 +     * table walk, or for descriptor loads during first stage of an S1
 +     * page table walk. Note that if we ever want to have a TLB for this
 +     * then various TLB flush insns which currently are no-ops or flush
 +     * only stage 1 MMU indexes will need to change to flush stage 2.
 +     */
 +    ARMMMUIdx_Stage2     = 3 | ARM_MMU_IDX_NOTLB,
      /*
       * M-profile.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdxBit {
      TO_CORE_BIT(SE10_1),
      TO_CORE_BIT(SE10_1_PAN),
      TO_CORE_BIT(SE3),
 -    TO_CORE_BIT(Stage2),
      TO_CORE_BIT(MUser),
      TO_CORE_BIT(MPriv),
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri,
      tlb_flush_by_mmuidx(cs,
                          ARMMMUIdxBit_E10_1 |
                          ARMMMUIdxBit_E10_1_PAN |
 -                        ARMMMUIdxBit_E10_0 |
 -                        ARMMMUIdxBit_Stage2);
 +                        ARMMMUIdxBit_E10_0);
  }
-+static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
+ static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-+{
+@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-+    /* Return true if D16-D31 are implemented */
+     tlb_flush_by_mmuidx_all_cpus_synced(cs,
-+    return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
+                                         ARMMMUIdxBit_E10_1 |
-+}
+                                         ARMMMUIdxBit_E10_1_PAN |
-+
+-                                        ARMMMUIdxBit_E10_0 |
- /*
+-                                        ARMMMUIdxBit_Stage2);
-  * We always set the FP and SIMD FP16 fields to indicate identical
++                                        ARMMMUIdxBit_E10_0);
   * levels of support (assuming SIMD is implemented at all), so
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.inc.c
 +++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
      return true;
  }
-+
-+/*
+-static void tlbiipas2_write(CPUARMState *env, const ARMCPRegInfo *ri,
-+ * The most usual kind of VFP access check, for everything except
+-                            uint64_t value)
-+ * FMXR/FMRX to the always-available special registers.
+-{
-+ */
+-    /* Invalidate by IPA. This has to invalidate any structures that
-+static bool vfp_access_check(DisasContext *s)
+-     * contain only stage 2 translation information, but does not need
-+{
+-     * to apply to structures that contain combined stage 1 and stage 2
-+    return full_vfp_access_check(s, false);
+-     * translation information.
-+}
+-     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+-     */
-index XXXXXXX..XXXXXXX 100644
+-    CPUState *cs = env_cpu(env);
---- a/target/arm/translate.c
+-    uint64_t pageaddr;
-+++ b/target/arm/translate.c
+-
-@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
+-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-     tcg_temp_free_i32(tmp);
+-        return;
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 40);
 -
 -    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
 -}
 -
 -static void tlbiipas2_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 -                               uint64_t value)
 -{
 -    CPUState *cs = env_cpu(env);
 -    uint64_t pageaddr;
 -
 -    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
 -        return;
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 40);
 -
 -    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
 -                                             ARMMMUIdxBit_Stage2);
 -}
  static void tlbiall_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                uint64_t value)
@@ -XXX,XX +XXX,XX @@ static void vttbr_write(CPUARMState *env, const ARMCPRegInfo *ri,
          tlb_flush_by_mmuidx(cs,
                              ARMMMUIdxBit_E10_1 |
                              ARMMMUIdxBit_E10_1_PAN |
 -                            ARMMMUIdxBit_E10_0 |
 -                            ARMMMUIdxBit_Stage2);
 +                            ARMMMUIdxBit_E10_0);
          raw_write(env, ri, value);
      }
  }
+@@ -XXX,XX +XXX,XX @@ static int alle1_tlbmask(CPUARMState *env)
--static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
+         return ARMMMUIdxBit_SE10_1 |
--                       uint32_t dp)
+                ARMMMUIdxBit_SE10_1_PAN |
-+static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+                ARMMMUIdxBit_SE10_0;
 -    } else if (arm_feature(env, ARM_FEATURE_EL2)) {
 -        return ARMMMUIdxBit_E10_1 |
 -               ARMMMUIdxBit_E10_1_PAN |
 -               ARMMMUIdxBit_E10_0 |
 -               ARMMMUIdxBit_Stage2;
      } else {
          return ARMMMUIdxBit_E10_1 |
                 ARMMMUIdxBit_E10_1_PAN |
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                               ARMMMUIdxBit_SE3);
  }
 -static void tlbi_aa64_ipas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri,
 -                                    uint64_t value)
 -{
 -    /* Invalidate by IPA. This has to invalidate any structures that
 -     * contain only stage 2 translation information, but does not need
 -     * to apply to structures that contain combined stage 1 and stage 2
 -     * translation information.
 -     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
 -     */
 -    ARMCPU *cpu = env_archcpu(env);
 -    CPUState *cs = CPU(cpu);
 -    uint64_t pageaddr;
 -
 -    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
 -        return;
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 48);
 -
 -    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
 -}
 -
 -static void tlbi_aa64_ipas2e1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 -                                      uint64_t value)
 -{
 -    CPUState *cs = env_cpu(env);
 -    uint64_t pageaddr;
 -
 -    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
 -        return;
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 48);
 -
 -    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
 -                                             ARMMMUIdxBit_Stage2);
 -}
 -
  static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri,
                                        bool isread)
  {
--    uint32_t cc = extract32(insn, 20, 2);
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-+    uint32_t rd, rn, rm;
+       .writefn = tlbi_aa64_vae1_write },
-+    bool dp = a->dp;
+     { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
-+
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
-+    if (!dc_isar_feature(aa32_vsel, s)) {
+-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-+        return false;
+-      .writefn = tlbi_aa64_ipas2e1is_write },
-+    }
++      .access = PL2_W, .type = ARM_CP_NOP },
-+
+     { .name = "TLBI_IPAS2LE1IS", .state = ARM_CP_STATE_AA64,
-+    /* UNDEF accesses to D16-D31 if they don't exist */
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
-+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-+        ((a->vm | a->vn | a->vd) & 0x10)) {
+-      .writefn = tlbi_aa64_ipas2e1is_write },
-+        return false;
++      .access = PL2_W, .type = ARM_CP_NOP },
-+    }
+     { .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64,
-+    rd = a->vd;
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4,
-+    rn = a->vn;
+       .access = PL2_W, .type = ARM_CP_NO_RAW,
-+    rm = a->vm;
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-+
+       .writefn = tlbi_aa64_alle1is_write },
-+    if (!vfp_access_check(s)) {
+     { .name = "TLBI_IPAS2E1", .state = ARM_CP_STATE_AA64,
-+        return true;
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
-+    }
+-      .access = PL2_W, .type = ARM_CP_NO_RAW,
+-      .writefn = tlbi_aa64_ipas2e1_write },
-     if (dp) {
++      .access = PL2_W, .type = ARM_CP_NOP },
-         TCGv_i64 frn, frm, dest;
+     { .name = "TLBI_IPAS2LE1", .state = ARM_CP_STATE_AA64,
-@@ -XXX,XX +XXX,XX @@ static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
+-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-         tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
+-      .writefn = tlbi_aa64_ipas2e1_write },
-         tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
++      .access = PL2_W, .type = ARM_CP_NOP },
--        switch (cc) {
+     { .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64,
-+        switch (a->cc) {
+       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4,
-         case 0: /* eq: Z */
+       .access = PL2_W, .type = ARM_CP_NO_RAW,
-             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-                                 frn, frm);
+       .writefn = tlbimva_hyp_is_write },
-@@ -XXX,XX +XXX,XX @@ static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
+     { .name = "TLBIIPAS2",
-         dest = tcg_temp_new_i32();
+       .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
-         tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
+-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-         tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+-      .writefn = tlbiipas2_write },
--        switch (cc) {
++      .type = ARM_CP_NOP, .access = PL2_W },
-+        switch (a->cc) {
+     { .name = "TLBIIPAS2IS",
-         case 0: /* eq: Z */
+       .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
-             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
+-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-                                 frn, frm);
+-      .writefn = tlbiipas2_is_write },
-@@ -XXX,XX +XXX,XX @@ static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
++      .type = ARM_CP_NOP, .access = PL2_W },
-         tcg_temp_free_i32(zero);
+     { .name = "TLBIIPAS2L",
-     }
+       .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
+-      .type = ARM_CP_NO_RAW, .access = PL2_W,
--    return 0;
+-      .writefn = tlbiipas2_write },
-+    return true;
++      .type = ARM_CP_NOP, .access = PL2_W },
- }
+     { .name = "TLBIIPAS2LIS",
+       .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
- static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
+-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
+-      .writefn = tlbiipas2_is_write },
-         rm = VFP_SREG_M(insn);
++      .type = ARM_CP_NOP, .access = PL2_W },
-     }
+     /* 32 bit cache operations */
+     { .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
--    if ((insn & 0x0f800e50) == 0x0e000a00 && dc_isar_feature(aa32_vsel, s)) {
+       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
 -        return handle_vsel(insn, rd, rn, rm, dp);
 -    } else if ((insn & 0x0fb00e10) == 0x0e800a00 &&
 -               dc_isar_feature(aa32_vminmaxnm, s)) {
 +    if ((insn & 0x0fb00e10) == 0x0e800a00 &&
 +        dc_isar_feature(aa32_vminmaxnm, s)) {
          return handle_vminmaxnm(insn, rd, rn, rm, dp);
      } else if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
                 dc_isar_feature(aa32_vrint, s)) {
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp-uncond.decode
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@
  #  1111 1110 .... .... .... 101. .... ....
  # (but those patterns might also cover some Neon instructions,
  # which do not live in this file.)
 +
 +# VFP registers have an odd encoding with a four-bit field
 +# and a one-bit field which are assembled in different orders
 +# depending on whether the register is double or single precision.
 +# Each individual instruction function must do the checks for
 +# "double register selected but CPU does not have double support"
 +# and "double register number has bit 4 set but CPU does not
 +# support D16-D31" (which should UNDEF).
 +%vm_dp  5:1 0:4
 +%vm_sp  0:4 5:1
 +%vn_dp  7:1 16:4
 +%vn_sp  16:4 7:1
 +%vd_dp  22:1 12:4
 +%vd_sp  12:4 22:1
 +
 +VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
 +            vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 +VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
 +            vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 --
 .20.1

-[Qemu-devel] [PULL 03/48] target/arm: Implement NSACR gating of floating point
+[PULL 04/39] target/arm: Use enum constant in get_phys_addr_lpae() call
-The NSACR register allows secure code to configure the FPU
+The access_type argument to get_phys_addr_lpae() is an MMUAccessType;
-to be inaccessible to non-secure code. If the NSACR.CP10
+use the enum constant MMU_DATA_LOAD rather than a literal 0 when we
-bit is set then:
+call it in S1_ptw_translate().
  * NS accesses to the FPU trap as UNDEF (ie to NS EL1 or EL2)
  * CPACR.{CP10,CP11} behave as if RAZ/WI
  * HCPTR.{TCP11,TCP10} behave as if RAO/WI
-Note that we do not implement the NSACR.NSASEDIS bit which
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-gates only access to Advanced SIMD, in the same way that
+Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 we don't implement the equivalent CPACR.ASEDIS and HCPTR.TASE.
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200330210400.11724-3-peter.maydell@linaro.org
 Message-id: 20190510110357.18825-1-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 75 +++++++++++++++++++++++++++++++++++++++++++--
+ target/arm/helper.c | 5 +++--
-file changed, 73 insertions(+), 2 deletions(-)
+file changed, 3 insertions(+), 2 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
              pcacheattrs = &cacheattrs;
          }
-         value &= mask;
-     }
+-        ret = get_phys_addr_lpae(env, addr, 0, ARMMMUIdx_Stage2, &s2pa,
-+
+-                                 &txattrs, &s2prot, &s2size, fi, pcacheattrs);
-+    /*
++        ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
-+     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
++                                 &s2pa, &txattrs, &s2prot, &s2size, fi,
-+     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
++                                 pcacheattrs);
-+     */
+         if (ret) {
-+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+             assert(fi->type != ARMFault_None);
-+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+             fi->s2addr = addr;
 +        value &= ~(0xf << 20);
 +        value |= env->cp15.cpacr_el1 & (0xf << 20);
 +    }
 +
      env->cp15.cpacr_el1 = value;
  }
 +static uint64_t cpacr_read(CPUARMState *env, const ARMCPRegInfo *ri)
 +{
 +    /*
 +     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
 +     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
 +     */
 +    uint64_t value = env->cp15.cpacr_el1;
 +
 +    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
 +        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
 +        value &= ~(0xf << 20);
 +    }
 +    return value;
 +}
 +
 +
  static void cpacr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
  {
      /* Call cpacr_write() so that we reset with the correct RAO bits set
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
      { .name = "CPACR", .state = ARM_CP_STATE_BOTH, .opc0 = 3,
        .crn = 1, .crm = 0, .opc1 = 0, .opc2 = 2, .accessfn = cpacr_access,
        .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.cpacr_el1),
 -      .resetfn = cpacr_reset, .writefn = cpacr_write },
 +      .resetfn = cpacr_reset, .writefn = cpacr_write, .readfn = cpacr_read },
      REGINFO_SENTINEL
  };
@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
      return ret;
  }
 +static void cptr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
 +                           uint64_t value)
 +{
 +    /*
 +     * For A-profile AArch32 EL3, if NSACR.CP10
 +     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
 +     */
 +    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
 +        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
 +        value &= ~(0x3 << 10);
 +        value |= env->cp15.cptr_el[2] & (0x3 << 10);
 +    }
 +    env->cp15.cptr_el[2] = value;
 +}
 +
 +static uint64_t cptr_el2_read(CPUARMState *env, const ARMCPRegInfo *ri)
 +{
 +    /*
 +     * For A-profile AArch32 EL3, if NSACR.CP10
 +     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
 +     */
 +    uint64_t value = env->cp15.cptr_el[2];
 +
 +    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
 +        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
 +        value |= 0x3 << 10;
 +    }
 +    return value;
 +}
 +
  static const ARMCPRegInfo el2_cp_reginfo[] = {
      { .name = "HCR_EL2", .state = ARM_CP_STATE_AA64,
        .type = ARM_CP_IO,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
      { .name = "CPTR_EL2", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 2,
        .access = PL2_RW, .accessfn = cptr_access, .resetvalue = 0,
 -      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]) },
 +      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]),
 +      .readfn = cptr_el2_read, .writefn = cptr_el2_write },
      { .name = "MAIR_EL2", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 4, .crn = 10, .crm = 2, .opc2 = 0,
        .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.mair_el[2]),
@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
          break;
      }
 +    /*
 +     * The NSACR allows A-profile AArch32 EL3 and M-profile secure mode
 +     * to control non-secure access to the FPU. It doesn't have any
 +     * effect if EL3 is AArch64 or if EL3 doesn't exist at all.
 +     */
 +    if ((arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
 +         cur_el <= 2 && !arm_is_secure_below_el3(env))) {
 +        if (!extract32(env->cp15.nsacr, 10, 1)) {
 +            /* FP insns act as UNDEF */
 +            return cur_el == 2 ? 2 : 1;
 +        }
 +    }
 +
      /* For the CPTR registers we don't need to guard with an ARM_FEATURE
       * check because zero bits in the registers mean "don't trap".
       */
 --
 .20.1

-[Qemu-devel] [PULL 15/48] target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree
+[PULL 05/39] target/arm: Add new 's1_is_el0' argument to get_phys_addr_lpae()
-Convert the VCVTA/VCVTN/VCVTP/VCVTM instructions to decodetree.
+For ARMv8.2-TTS2UXN, the stage 2 page table walk wants to know
-trans_VCVT() is temporarily left in translate.c.
+whether the stage 1 access is for EL0 or not, because whether
 exec permission is given can depend on whether this is an EL0
 or EL1 access. Add a new argument to get_phys_addr_lpae() so
 the call sites can pass this information in.
 Since get_phys_addr_lpae() doesn't already have a doc comment,
 add one so we have a place to put the documentation of the
 semantics of the new s1_is_el0 argument.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200330210400.11724-4-peter.maydell@linaro.org
 ---
- target/arm/translate.c       | 72 +++++++++++++++++-------------------
+ target/arm/helper.c | 29 ++++++++++++++++++++++++++++-
- target/arm/vfp-uncond.decode |  6 +++
+file changed, 28 insertions(+), 1 deletion(-)
 files changed, 39 insertions(+), 39 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
+@@ -XXX,XX +XXX,XX @@
-     return true;
  static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                               bool s1_is_el0,
                                 hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
                                 target_ulong *page_size_ptr,
                                 ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs);
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
          }
          ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
 +                                 false,
                                   &s2pa, &txattrs, &s2prot, &s2size, fi,
                                   pcacheattrs);
          if (ret) {
@@ -XXX,XX +XXX,XX @@ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
      };
  }
--static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
++/**
--                       int rounding)
++ * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
-+static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
++ *
- {
++ * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
--    bool is_signed = extract32(insn, 7, 1);
++ * prot and page_size may not be filled in, and the populated fsr value provides
--    TCGv_ptr fpst = get_fpstatus_ptr(0);
++ * information on why the translation aborted, in the format of a long-format
-+    uint32_t rd, rm;
++ * DFSR/IFSR fault register, with the following caveats:
-+    bool dp = a->dp;
++ *  * the WnR bit is never set (the caller must do this).
-+    TCGv_ptr fpst;
++ *
-     TCGv_i32 tcg_rmode, tcg_shift;
++ * @env: CPUARMState
-+    int rounding = fp_decode_rm[a->rm];
++ * @address: virtual address to get physical address for
-+    bool is_signed = a->op;
++ * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
-+
++ * @mmu_idx: MMU index indicating required translation regime
-+    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
++ * @s1_is_el0: if @mmu_idx is ARMMMUIdx_Stage2 (so this is a stage 2 page table
-+        return false;
++ *             walk), must be true if this is stage 2 of a stage 1+2 walk for an
-+    }
++ *             EL0 access). If @mmu_idx is anything else, @s1_is_el0 is ignored.
-+
++ * @phys_ptr: set to the physical address corresponding to the virtual address
-+    /* UNDEF accesses to D16-D31 if they don't exist */
++ * @attrs: set to the memory transaction attributes to use
-+    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
++ * @prot: set to the permissions for the page containing phys_ptr
-+        return false;
++ * @page_size_ptr: set to the size of the page containing phys_ptr
-+    }
++ * @fi: set to fault info if the translation fails
-+    rd = a->vd;
++ * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
-+    rm = a->vm;
++ */
-+
+ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
-+    if (!vfp_access_check(s)) {
+                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
-+        return true;
++                               bool s1_is_el0,
-+    }
+                                hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
-+
+                                target_ulong *page_size_ptr,
-+    fpst = get_fpstatus_ptr(0);
+                                ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
+@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
-     tcg_shift = tcg_const_i32(0);
+             /* S1 is done. Now do S2 translation.  */
-@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
+             ret = get_phys_addr_lpae(env, ipa, access_type, ARMMMUIdx_Stage2,
-     if (dp) {
++                                     mmu_idx == ARMMMUIdx_E10_0,
-         TCGv_i64 tcg_double, tcg_res;
+                                      phys_ptr, attrs, &s2_prot,
-         TCGv_i32 tcg_tmp;
+                                      page_size, fi,
--        /* Rd is encoded as a single precision register even when the source
+                                      cacheattrs != NULL ? &cacheattrs2 : NULL);
--         * is double precision.
+@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 -         */
 -        rd = ((rd << 1) & 0x1e) | ((rd >> 4) & 0x1);
          tcg_double = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
          tcg_tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
      tcg_temp_free_ptr(fpst);
 -    return 0;
 -}
 -
 -static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
 -{
 -    uint32_t rd, rm, dp = extract32(insn, 8, 1);
 -
 -    if (dp) {
 -        VFP_DREG_D(rd, insn);
 -        VFP_DREG_M(rm, insn);
 -    } else {
 -        rd = VFP_SREG_D(insn);
 -        rm = VFP_SREG_M(insn);
 -    }
 -
 -    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
 -        dc_isar_feature(aa32_vcvt_dr, s)) {
 -        /* VCVTA, VCVTN, VCVTP, VCVTM */
 -        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
 -        return handle_vcvt(insn, rd, rm, dp, rounding);
 -    }
 -    return 1;
 +    return true;
  }
  /*
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          }
      }
-+    if (extract32(insn, 28, 4) == 0xf) {
+     if (regime_using_lpae_format(env, mmu_idx)) {
-+        /*
+-        return get_phys_addr_lpae(env, address, access_type, mmu_idx,
-+         * Encodings with T=1 (Thumb) or unconditional (ARM): these
++        return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
-+         * were all handled by the decodetree decoder, so any insn
+                                   phys_ptr, attrs, prot, page_size,
-+         * patterns which get here must be UNDEF.
+                                   fi, cacheattrs);
-+         */
+     } else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
 +        return 1;
 +    }
 +
      /*
       * FIXME: this access check should not take precedence over UNDEF
       * for invalid encodings; we will generate incorrect syndrome information
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          return 0;
      }
 -    if (extract32(insn, 28, 4) == 0xf) {
 -        /*
 -         * Encodings with T=1 (Thumb) or unconditional (ARM):
 -         * only used for the "miscellaneous VFP features" added in v8A
 -         * and v7M (and gated on the MVFR2.FPMisc field).
 -         */
 -        return disas_vfp_misc_insn(s, insn);
 -    }
 -
      dp = ((insn & 0xf00) == 0xb00);
      switch ((insn >> 24) & 0xf) {
      case 0xe:
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp-uncond.decode
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
              vm=%vm_sp vd=%vd_sp dp=0
  VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
              vm=%vm_dp vd=%vd_dp dp=1
 +
 +# VCVT float to int with specified rounding mode; Vd is always single-precision
 +VCVT        1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
 +            vm=%vm_sp vd=%vd_sp dp=0
 +VCVT        1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
 +            vm=%vm_dp vd=%vd_sp dp=1
 --
 .20.1

-[Qemu-devel] [PULL 24/48] target/arm: Convert VFP VMLA to decodetree
+[PULL 06/39] target/arm: Implement ARMv8.2-TTS2UXN
-Convert the VFP VMLA instruction to decodetree.
+The ARMv8.2-TTS2UXN feature extends the XN field in stage 2
+translation table descriptors from just bit [54] to bits [54:53],
-This is the first of the VFP 3-operand data processing instructions,
+allowing stage 2 to control execution permissions separately for EL0
-so we include in this patch the code which loops over the elements
+and EL1. Implement the new semantics of the XN field and enable
-for an old-style VFP vector operation. The existing code to do this
+the feature for our 'max' CPU.
 looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
 we are going to be converting instructions one at a time anyway
 we can take the opportunity to make the new loop use TCG temporaries,
 which means we can do that conversion one operation at a time
 rather than needing to do it all in one go.
 We include an UNDEF check which was missing in the old code:
 short-vector operations (with stride or length non-zero) were
 deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
 field does not indicate that support for short vectors is present
 we UNDEF the operations that would use them. (This is a change
 of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
 previously were all incorrectly allowing short-vector operations.)
 Note that the conversion fixes a bug in the old code for the
 case of VFP short-vector "mixed scalar/vector operations". These
 happen where the destination register is in a vector bank but
 but the second operand is in a scalar bank. For example
   vmla.f64 d10, d1, d16   with length 2 stride 2
 is equivalent to the pair of scalar operations
   vmla.f64 d10, d1, d16
   vmla.f64 d8, d3, d16
 where the destination and first input register cycle through
 their vector but the second input is scalar (d16). In the
 old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
 as a temporary output for the multiply, which trashes the
 second input operand. For the fully-scalar case (where we
 never do a second iteration) and the fully-vector case
 (where the loop loads the new second input operand) this
 doesn't matter, but for the mixed scalar/vector case we
 will end up using the wrong value for later loop iterations.
 In the new code we use TCG temporaries and so avoid the bug.
 This bug is present for all the multiply-accumulate insns
 that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.
 Note 2: the expression used to calculate the next register
 number in the vector bank is not in fact correct; we leave
 this behaviour unchanged from the old decoder and will
 fix this bug later in the series.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200330210400.11724-5-peter.maydell@linaro.org
 ---
- target/arm/cpu.h               |   5 +
+ target/arm/cpu.h    | 15 +++++++++++++++
- target/arm/translate-vfp.inc.c | 205 +++++++++++++++++++++++++++++++++
+ target/arm/cpu.c    |  1 +
- target/arm/translate.c         |  14 ++-
+ target/arm/cpu64.c  |  2 ++
- target/arm/vfp.decode          |   6 +
+ target/arm/helper.c | 37 +++++++++++++++++++++++++++++++------
-files changed, 224 insertions(+), 6 deletions(-)
+files changed, 49 insertions(+), 6 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_ccidx(const ARMISARegisters *id)
-     return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
+     return FIELD_EX32(id->id_mmfr4, ID_MMFR4, CCIDX) != 0;
  }
-+static inline bool isar_feature_aa32_fpshvec(const ARMISARegisters *id)
++static inline bool isar_feature_aa32_tts2uxn(const ARMISARegisters *id)
 +{
-+    return FIELD_EX64(id->mvfr0, MVFR0, FPSHVEC) > 0;
++    return FIELD_EX32(id->id_mmfr4, ID_MMFR4, XNX) != 0;
 +}
 +
  /*
-  * We always set the FP and SIMD FP16 fields to indicate identical
+  * 64-bit feature tests via id registers.
-  * levels of support (assuming SIMD is implemented at all), so
+  */
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
-index XXXXXXX..XXXXXXX 100644
+     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
 --- a/target/arm/translate-vfp.inc.c
 +++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
      return true;
  }
-+
-+/*
++static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
 + * Types for callbacks for do_vfp_3op_sp() and do_vfp_3op_dp().
 + * The callback should emit code to write a value to vd. If
 + * do_vfp_3op_{sp,dp}() was passed reads_vd then the TCGv vd
 + * will contain the old value of the relevant VFP register;
 + * otherwise it must be written to only.
 + */
 +typedef void VFPGen3OpSPFn(TCGv_i32 vd,
 +                           TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst);
 +typedef void VFPGen3OpDPFn(TCGv_i64 vd,
 +                           TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
 +
 +/*
 + * Perform a 3-operand VFP data processing instruction. fn is the
 + * callback to do the actual operation; this function deals with the
 + * code to handle looping around for VFP vector processing.
 + */
 +static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
 +                          int vd, int vn, int vm, bool reads_vd)
 +{
-+    uint32_t delta_m = 0;
++    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
 +    uint32_t delta_d = 0;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i32 f0, f1, fd;
 +    TCGv_ptr fpst;
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0x18;
 +
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = s->vec_stride + 1;
 +
 +            if ((vm & bank_mask) == 0) {
 +                /* mixed scalar/vector */
 +                delta_m = 0;
 +            } else {
 +                /* vector */
 +                delta_m = delta_d;
 +            }
 +        }
 +    }
 +
 +    f0 = tcg_temp_new_i32();
 +    f1 = tcg_temp_new_i32();
 +    fd = tcg_temp_new_i32();
 +    fpst = get_fpstatus_ptr(0);
 +
 +    neon_load_reg32(f0, vn);
 +    neon_load_reg32(f1, vm);
 +
 +    for (;;) {
 +        if (reads_vd) {
 +            neon_load_reg32(fd, vd);
 +        }
 +        fn(fd, f0, f1, fpst);
 +        neon_store_reg32(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        neon_load_reg32(f0, vn);
 +        if (delta_m) {
 +            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            neon_load_reg32(f1, vm);
 +        }
 +    }
 +
 +    tcg_temp_free_i32(f0);
 +    tcg_temp_free_i32(f1);
 +    tcg_temp_free_i32(fd);
 +    tcg_temp_free_ptr(fpst);
 +
 +    return true;
 +}
 +
-+static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
+ /*
-+                          int vd, int vn, int vm, bool reads_vd)
+  * Feature tests for "does this exist in either 32-bit or 64-bit?"
   */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_any_ccidx(const ARMISARegisters *id)
      return isar_feature_aa64_ccidx(id) || isar_feature_aa32_ccidx(id);
  }
 +static inline bool isar_feature_any_tts2uxn(const ARMISARegisters *id)
 +{
-+    uint32_t delta_m = 0;
++    return isar_feature_aa64_tts2uxn(id) || isar_feature_aa32_tts2uxn(id);
 +    uint32_t delta_d = 0;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i64 f0, f1, fd;
 +    TCGv_ptr fpst;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vn | vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0xc;
 +
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = (s->vec_stride >> 1) + 1;
 +
 +            if ((vm & bank_mask) == 0) {
 +                /* mixed scalar/vector */
 +                delta_m = 0;
 +            } else {
 +                /* vector */
 +                delta_m = delta_d;
 +            }
 +        }
 +    }
 +
 +    f0 = tcg_temp_new_i64();
 +    f1 = tcg_temp_new_i64();
 +    fd = tcg_temp_new_i64();
 +    fpst = get_fpstatus_ptr(0);
 +
 +    neon_load_reg64(f0, vn);
 +    neon_load_reg64(f1, vm);
 +
 +    for (;;) {
 +        if (reads_vd) {
 +            neon_load_reg64(fd, vd);
 +        }
 +        fn(fd, f0, f1, fpst);
 +        neon_store_reg64(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
 +        neon_load_reg64(f0, vn);
 +        if (delta_m) {
 +            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +            neon_load_reg64(f1, vm);
 +        }
 +    }
 +
 +    tcg_temp_free_i64(f0);
 +    tcg_temp_free_i64(f1);
 +    tcg_temp_free_i64(fd);
 +    tcg_temp_free_ptr(fpst);
 +
 +    return true;
 +}
 +
-+static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+ /*
-+{
+  * Forward to the above feature tests given an ARMCPU pointer.
-+    /* Note that order of inputs to the add matters for NaNs */
+  */
-+    TCGv_i32 tmp = tcg_temp_new_i32();
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
              t = FIELD_DP32(t, ID_MMFR4, HPDS, 1); /* AA32HPD */
              t = FIELD_DP32(t, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
              t = FIELD_DP32(t, ID_MMFR4, CNP, 1); /* TTCNP */
 +            t = FIELD_DP32(t, ID_MMFR4, XNX, 1); /* TTS2UXN */
              cpu->isar.id_mmfr4 = t;
          }
  #endif
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);
          t = FIELD_DP64(t, ID_AA64MMFR1, PAN, 2); /* ATS1E1 */
          t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* VMID16 */
 +        t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1); /* TTS2UXN */
          cpu->isar.id_aa64mmfr1 = t;
          t = cpu->isar.id_aa64mmfr2;
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          u = FIELD_DP32(u, ID_MMFR4, HPDS, 1); /* AA32HPD */
          u = FIELD_DP32(u, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
          u = FIELD_DP32(u, ID_MMFR4, CNP, 1); /* TTCNP */
 +        u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
          cpu->isar.id_mmfr4 = u;
          u = cpu->isar.id_aa64dfr0;
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
   *
   * @env:     CPUARMState
   * @s2ap:    The 2-bit stage2 access permissions (S2AP)
 - * @xn:      XN (execute-never) bit
 + * @xn:      XN (execute-never) bits
 + * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
   */
 -static int get_S2prot(CPUARMState *env, int s2ap, int xn)
 +static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
  {
      int prot = 0;
@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn)
      if (s2ap & 2) {
          prot |= PAGE_WRITE;
      }
 -    if (!xn) {
 -        if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
 +
-+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
++    if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
-+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
++        switch (xn) {
-+    tcg_temp_free_i32(tmp);
++        case 0:
-+}
+             prot |= PAGE_EXEC;
-+
++            break;
-+static bool trans_VMLA_sp(DisasContext *s, arg_VMLA_sp *a)
++        case 1:
-+{
++            if (s1_is_el0) {
-+    return do_vfp_3op_sp(s, gen_VMLA_sp, a->vd, a->vn, a->vm, true);
++                prot |= PAGE_EXEC;
 +}
 +
 +static void gen_VMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
 +{
 +    /* Note that order of inputs to the add matters for NaNs */
 +    TCGv_i64 tmp = tcg_temp_new_i64();
 +
 +    gen_helper_vfp_muld(tmp, vn, vm, fpst);
 +    gen_helper_vfp_addd(vd, vd, tmp, fpst);
 +    tcg_temp_free_i64(tmp);
 +}
 +
 +static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
 +{
 +    return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
              rn = VFP_SREG_N(insn);
 +            switch (op) {
 +            case 0:
 +                /* Already handled by decodetree */
 +                return 1;
 +            default:
 +                break;
 +            }
-+
++            break;
-             if (op == 15) {
++        case 2:
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
++            break;
-                 switch (rn) {
++        case 3:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
++            if (!s1_is_el0) {
-             for (;;) {
++                prot |= PAGE_EXEC;
-                 /* Perform the calculation.  */
++            }
-                 switch (op) {
++            break;
--                case 0: /* VMLA: fd + (fn * fm) */
++        default:
--                    /* Note that order of inputs to the add matters for NaNs */
++            g_assert_not_reached();
--                    gen_vfp_F1_mul(dp);
++        }
--                    gen_mov_F0_vreg(dp, rd);
++    } else {
--                    gen_vfp_add(dp);
++        if (!extract32(xn, 1, 1)) {
--                    break;
++            if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
-                 case 1: /* VMLS: fd + -(fn * fm) */
++                prot |= PAGE_EXEC;
-                     gen_vfp_mul(dp);
++            }
-                     gen_vfp_F1_neg(dp);
+         }
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
+     }
-index XXXXXXX..XXXXXXX 100644
+     return prot;
---- a/target/arm/vfp.decode
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
-+++ b/target/arm/vfp.decode
+     }
-@@ -XXX,XX +XXX,XX @@ VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
-              vd=%vd_sp p=1 u=0 w=1
+     ap = extract32(attrs, 4, 2);
- VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
+-    xn = extract32(attrs, 12, 1);
-              vd=%vd_dp p=1 u=0 w=1
-+
+     if (mmu_idx == ARMMMUIdx_Stage2) {
-+# 3-register VFP data-processing; bits [23,21:20,6] identify the operation.
+         ns = true;
-+VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
+-        *prot = get_S2prot(env, ap, xn);
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
++        xn = extract32(attrs, 11, 2);
-+VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
++        *prot = get_S2prot(env, ap, xn, s1_is_el0);
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
+     } else {
          ns = extract32(attrs, 3, 1);
 +        xn = extract32(attrs, 12, 1);
          pxn = extract32(attrs, 11, 1);
          *prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
      }
 --
 .20.1

-[Qemu-devel] [PULL 42/48] target/arm: Convert VFP round insns to decodetree
+[PULL 07/39] target/arm: Use correct variable for setting 'max' cpu's ID_AA64DFR0
-Convert the VFP round-to-integer instructions VRINTR, VRINTZ and
+In aarch64_max_initfn() we update both 32-bit and 64-bit ID
-VRINTX to decodetree.
+registers.  The intended pattern is that for 64-bit ID registers we
 use FIELD_DP64 and the uint64_t 't' register, while 32-bit ID
 registers use FIELD_DP32 and the uint32_t 'u' register.  For
 ID_AA64DFR0 we accidentally used 'u', meaning that the top 32 bits of
 this 64-bit ID register would end up always zero.  Luckily at the
 moment that's what they should be anyway, so this bug has no visible
 effects.
-These instructions were only introduced as part of the "VFP misc"
+Use the right-sized variable.
 additions in v8A, so we check this. The old decoder's implementation
 was incorrectly providing them even for v7A CPUs.
+Fixes: 3bec78447a958d481991
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20200423110915.10527-1-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 163 +++++++++++++++++++++++++++++++++
+ target/arm/cpu64.c | 6 +++---
- target/arm/translate.c         |  45 +--------
+file changed, 3 insertions(+), 3 deletions(-)
  target/arm/vfp.decode          |  15 +++
 files changed, 179 insertions(+), 44 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/cpu64.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-     tcg_temp_free_i32(tmp);
+         u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
-     return true;
+         cpu->isar.id_mmfr4 = u;
- }
-+
+-        u = cpu->isar.id_aa64dfr0;
-+static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
+-        u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
-+{
+-        cpu->isar.id_aa64dfr0 = u;
-+    TCGv_ptr fpst;
++        t = cpu->isar.id_aa64dfr0;
-+    TCGv_i32 tmp;
++        t = FIELD_DP64(t, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
-+
++        cpu->isar.id_aa64dfr0 = t;
-+    if (!dc_isar_feature(aa32_vrint, s)) {
-+        return false;
+         u = cpu->isar.id_dfr0;
-+    }
+         u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    neon_load_reg32(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rints(tmp, tmp, fpst);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
 +static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i64 tmp;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i64();
 +    neon_load_reg64(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rintd(tmp, tmp, fpst);
 +    neon_store_reg64(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i64(tmp);
 +    return true;
 +}
 +
 +static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 tmp;
 +    TCGv_i32 tcg_rmode;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    neon_load_reg32(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    tcg_rmode = tcg_const_i32(float_round_to_zero);
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    gen_helper_rints(tmp, tmp, fpst);
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tcg_rmode);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
 +static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i64 tmp;
 +    TCGv_i32 tcg_rmode;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i64();
 +    neon_load_reg64(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    tcg_rmode = tcg_const_i32(float_round_to_zero);
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    gen_helper_rintd(tmp, tmp, fpst);
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    neon_store_reg64(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i64(tmp);
 +    tcg_temp_free_i32(tcg_rmode);
 +    return true;
 +}
 +
 +static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i32();
 +    neon_load_reg32(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rints_exact(tmp, tmp, fpst);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
 +static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i64 tmp;
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tmp = tcg_temp_new_i64();
 +    neon_load_reg64(tmp, a->vm);
 +    fpst = get_fpstatus_ptr(false);
 +    gen_helper_rintd_exact(tmp, tmp, fpst);
 +    neon_store_reg64(tmp, a->vd);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i64(tmp);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  return 1;
              case 15:
                  switch (rn) {
 -                case 0 ... 11:
 +                case 0 ... 14:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              if (op == 15) {
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
 -                case 0x0c: /* vrintr */
 -                case 0x0d: /* vrintz */
 -                case 0x0e: /* vrintx */
 -                    break;
 -
                  case 0x0f: /* vcvt double<->single */
                      rd_is_dp = !dp;
                      break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 12: /* vrintr */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(0);
 -                        if (dp) {
 -                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
 -                        } else {
 -                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
 -                        }
 -                        tcg_temp_free_ptr(fpst);
 -                        break;
 -                    }
 -                    case 13: /* vrintz */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(0);
 -                        TCGv_i32 tcg_rmode;
 -                        tcg_rmode = tcg_const_i32(float_round_to_zero);
 -                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -                        if (dp) {
 -                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
 -                        } else {
 -                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
 -                        }
 -                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -                        tcg_temp_free_i32(tcg_rmode);
 -                        tcg_temp_free_ptr(fpst);
 -                        break;
 -                    }
 -                    case 14: /* vrintx */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(0);
 -                        if (dp) {
 -                            gen_helper_rintd_exact(cpu_F0d, cpu_F0d, fpst);
 -                        } else {
 -                            gen_helper_rints_exact(cpu_F0s, cpu_F0s, fpst);
 -                        }
 -                        tcg_temp_free_ptr(fpst);
 -                        break;
 -                    }
                      case 15: /* single<->double conversion */
                          if (dp) {
                              gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_dp
 +
 +VRINTR_sp    ---- 1110 1.11 0110 .... 1010 01.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VRINTR_dp    ---- 1110 1.11 0110 .... 1011 01.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 +
 +VRINTZ_sp    ---- 1110 1.11 0110 .... 1010 11.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VRINTZ_dp    ---- 1110 1.11 0110 .... 1011 11.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 +
 +VRINTX_sp    ---- 1110 1.11 0111 .... 1010 01.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VRINTX_dp    ---- 1110 1.11 0111 .... 1011 01.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 --
 .20.1

-[Qemu-devel] [PULL 11/48] target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max
+[PULL 08/39] target/arm: Use uint64_t for midr field in CPU state struct
-At the moment our -cpu max for AArch32 supports VFP short-vectors
+From: Philippe Mathieu-Daudé <f4bug@amsat.org>
 because we always implement them, even for CPUs which should
 not have them. The following commits are going to switch to
 using the correct ID-register-check to enable or disable short
 vector support, so we need to turn it on explicitly for -cpu max,
 because Cortex-A15 doesn't implement it.
-We don't enable this for the AArch64 -cpu max, because the v8A
+MIDR_EL1 is a 64-bit system register with the top 32-bit being RES0.
-architecture never supports short-vectors.
+Represent it in QEMU's ARMCPU struct with a uint64_t, not a
 uint32_t.
+This fixes an error when compiling with -Werror=conversion
+because we were manipulating the register value using a
+local uint64_t variable:
+  target/arm/cpu64.c: In function ‘aarch64_max_initfn’:
+  target/arm/cpu64.c:628:21: error: conversion from ‘uint64_t’ {aka ‘long unsigned int’} to ‘uint32_t’ {aka ‘unsigned int’} may change value [-Werror=conversion]
+|         cpu->midr = t;
+        |                     ^
+and future-proofs us against a possible future architecture
+change using some of the top 32 bits.
+Suggested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
+Message-id: 20200428172634.29707-1-f4bug@amsat.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/cpu.c | 4 ++++
+ target/arm/cpu.h | 2 +-
-file changed, 4 insertions(+)
+ target/arm/cpu.c | 2 +-
 files changed, 2 insertions(+), 2 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
+         uint64_t id_aa64dfr0;
+         uint64_t id_aa64dfr1;
+     } isar;
+-    uint32_t midr;
++    uint64_t midr;
+     uint32_t revidr;
+     uint32_t reset_fpsid;
+     uint32_t ctr;
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_cpus[] = {
-         kvm_arm_set_cpu_features_from_host(cpu);
+ static Property arm_cpu_properties[] = {
-     } else {
+     DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
-         cortex_a15_initfn(obj);
+     DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
-+
+-    DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
-+        /* old-style VFP short-vector support */
++    DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
-+        cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSHVEC, 1);
+     DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
-+
+                         mp_affinity, ARM64_AFFINITY_INVALID),
- #ifdef CONFIG_USER_ONLY
+     DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
          /* We don't set these in system emulation mode for the moment,
           * since we don't correctly set (all of) the ID registers to
 --
 .20.1

-[Qemu-devel] [PULL 37/48] target/arm: Convert VSQRT to decodetree
+[PULL 09/39] hw/arm: versal: Remove inclusion of arm_gicv3_common.h
-Convert the VSQRT instruction to decodetree.
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+Remove inclusion of arm_gicv3_common.h, this already gets
+included via xlnx-versal.h.
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Message-id: 20200427181649.26851-2-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 20 ++++++++++++++++++++
+ hw/arm/xlnx-versal.c | 1 -
- target/arm/translate.c         | 14 +-------------
+file changed, 1 deletion(-)
  target/arm/vfp.decode          |  5 +++++
 files changed, 26 insertions(+), 13 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/arm/xlnx-versal.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/arm/xlnx-versal.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
+@@ -XXX,XX +XXX,XX @@
- {
+ #include "hw/arm/boot.h"
-     return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
+ #include "kvm_arm.h"
- }
+ #include "hw/misc/unimp.h"
-+
+-#include "hw/intc/arm_gicv3_common.h"
-+static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
+ #include "hw/arm/xlnx-versal.h"
-+{
+ #include "hw/char/pl011.h"
-+    gen_helper_vfp_sqrts(vd, vm, cpu_env);
 +}
 +
 +static bool trans_VSQRT_sp(DisasContext *s, arg_VSQRT_sp *a)
 +{
 +    return do_vfp_2op_sp(s, gen_VSQRT_sp, a->vd, a->vm);
 +}
 +
 +static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
 +{
 +    gen_helper_vfp_sqrtd(vd, vm, cpu_env);
 +}
 +
 +static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
 +{
 +    return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
          gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
  }
 -static inline void gen_vfp_sqrt(int dp)
 -{
 -    if (dp)
 -        gen_helper_vfp_sqrtd(cpu_F0d, cpu_F0d, cpu_env);
 -    else
 -        gen_helper_vfp_sqrts(cpu_F0s, cpu_F0s, cpu_env);
 -}
 -
  static inline void gen_vfp_cmp(int dp)
  {
      if (dp)
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  return 1;
              case 15:
                  switch (rn) {
 -                case 1 ... 2:
 +                case 1 ... 3:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
                  case 0x00: /* vmov */
 -                case 0x03: /* vsqrt */
                      break;
                  case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                      case 0: /* cpy */
                          /* no-op */
                          break;
 -                    case 3: /* sqrt */
 -                        gen_vfp_sqrt(dp);
 -                        break;
                      case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
                      {
                          TCGv_ptr fpst = get_fpstatus_ptr(false);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
               vd=%vd_sp vm=%vm_sp
  VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
               vd=%vd_dp vm=%vm_dp
 +
 +VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 --
 .20.1

-[Qemu-devel] [PULL 07/48] decodetree: Fix comparison of Field
+[PULL 10/39] hw/arm: versal: Move misplaced comment
-From: Richard Henderson <richard.henderson@linaro.org>
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-Typo comparing the sign of the field, twice, instead of also comparing
+Move misplaced comment.
 the mask of the field (which itself encodes both position and length).
-Reported-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Message-id: 20190604154225.26992-1-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20200427181649.26851-3-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- scripts/decodetree.py | 2 +-
+ hw/arm/xlnx-versal.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/scripts/decodetree.py b/scripts/decodetree.py
+diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/scripts/decodetree.py
+--- a/hw/arm/xlnx-versal.c
-+++ b/scripts/decodetree.py
++++ b/hw/arm/xlnx-versal.c
-@@ -XXX,XX +XXX,XX @@ class Field:
+@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
-         return '{0}(insn, {1}, {2})'.format(extr, self.pos, self.len)
+         obj = object_new(XLNX_VERSAL_ACPU_TYPE);
-     def __eq__(self, other):
+         if (!obj) {
--        return self.sign == other.sign and self.sign == other.sign
+-            /* Secondary CPUs start in PSCI powered-down state */
-+        return self.sign == other.sign and self.mask == other.mask
+             error_report("Unable to create apu.cpu[%d] of type %s",
+                          i, XLNX_VERSAL_ACPU_TYPE);
-     def __ne__(self, other):
+             exit(EXIT_FAILURE);
-         return not self.__eq__(other)
+@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
          object_property_set_int(obj, s->cfg.psci_conduit,
                                  "psci-conduit", &error_abort);
          if (i) {
 +            /* Secondary CPUs start in PSCI powered-down state */
              object_property_set_bool(obj, true,
                                       "start-powered-off", &error_abort);
          }
 --
 .20.1

-[Qemu-devel] [PULL 04/48] hw/arm/smmuv3: Fix decoding of ID register range
+[PULL 11/39] hw/arm: versal-virt: Fix typo xlnx-ve -> xlnx-versal
-The SMMUv3 ID registers cover an area 0x30 bytes in size
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
 (12 registers, 4 bytes each). We were incorrectly decoding
 only the first 0x20 bytes.
+Fix typo xlnx-ve -> xlnx-versal.
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Message-id: 20200427181649.26851-4-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
-Message-id: 20190524124829.2589-1-peter.maydell@linaro.org
 ---
- hw/arm/smmuv3.c | 2 +-
+ hw/arm/xlnx-versal-virt.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
+diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/smmuv3.c
+--- a/hw/arm/xlnx-versal-virt.c
-+++ b/hw/arm/smmuv3.c
++++ b/hw/arm/xlnx-versal-virt.c
-@@ -XXX,XX +XXX,XX @@ static MemTxResult smmu_readl(SMMUv3State *s, hwaddr offset,
+@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
-                               uint64_t *data, MemTxAttrs attrs)
+         psci_conduit = QEMU_PSCI_CONDUIT_SMC;
- {
+     }
-     switch (offset) {
--    case A_IDREGS ... A_IDREGS + 0x1f:
+-    sysbus_init_child_obj(OBJECT(machine), "xlnx-ve", &s->soc,
-+    case A_IDREGS ... A_IDREGS + 0x2f:
++    sysbus_init_child_obj(OBJECT(machine), "xlnx-versal", &s->soc,
-         *data = smmuv3_idreg(offset - A_IDREGS);
+                           sizeof(s->soc), TYPE_XLNX_VERSAL);
-         return MEMTX_OK;
+     object_property_set_link(OBJECT(&s->soc), OBJECT(machine->ram),
-     case A_IDR0 ... A_IDR5:
+                              "ddr", &error_abort);
 --
 .20.1

-[Qemu-devel] [PULL 36/48] target/arm: Convert VNEG to decodetree
+[PULL 12/39] hw/arm: versal: Embed the UARTs into the SoC type
-Convert the VNEG instruction to decodetree.
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+Embed the UARTs into the SoC type.
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Message-id: 20200427181649.26851-5-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 10 ++++++++++
+ include/hw/arm/xlnx-versal.h |  3 ++-
- target/arm/translate.c         |  6 +-----
+ hw/arm/xlnx-versal.c         | 12 ++++++------
- target/arm/vfp.decode          |  5 +++++
+files changed, 8 insertions(+), 7 deletions(-)
 files changed, 16 insertions(+), 5 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
+@@ -XXX,XX +XXX,XX @@
- {
+ #include "hw/sysbus.h"
-     return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
+ #include "hw/arm/boot.h"
  #include "hw/intc/arm_gicv3.h"
 +#include "hw/char/pl011.h"
  #define TYPE_XLNX_VERSAL "xlnx-versal"
  #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
          MemoryRegion mr_ocm;
          struct {
 -            SysBusDevice *uart[XLNX_VERSAL_NR_UARTS];
 +            PL011State uart[XLNX_VERSAL_NR_UARTS];
              SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
              SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
          } iou;
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@
  #include "kvm_arm.h"
  #include "hw/misc/unimp.h"
  #include "hw/arm/xlnx-versal.h"
 -#include "hw/char/pl011.h"
  #define XLNX_VERSAL_ACPU_TYPE ARM_CPU_TYPE_NAME("cortex-a72")
  #define GEM_REVISION        0x40070106
@@ -XXX,XX +XXX,XX @@ static void versal_create_uarts(Versal *s, qemu_irq *pic)
          DeviceState *dev;
          MemoryRegion *mr;
 -        dev = qdev_create(NULL, TYPE_PL011);
 -        s->lpd.iou.uart[i] = SYS_BUS_DEVICE(dev);
 +        sysbus_init_child_obj(OBJECT(s), name,
 +                              &s->lpd.iou.uart[i], sizeof(s->lpd.iou.uart[i]),
 +                              TYPE_PL011);
 +        dev = DEVICE(&s->lpd.iou.uart[i]);
          qdev_prop_set_chr(dev, "chardev", serial_hd(i));
 -        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
          qdev_init_nofail(dev);
 -        mr = sysbus_mmio_get_region(s->lpd.iou.uart[i], 0);
 +        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
          memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
 -        sysbus_connect_irq(s->lpd.iou.uart[i], 0, pic[irqs[i]]);
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);
          g_free(name);
      }
  }
-+
-+static bool trans_VNEG_sp(DisasContext *s, arg_VNEG_sp *a)
-+{
-+    return do_vfp_2op_sp(s, gen_helper_vfp_negs, a->vd, a->vm);
-+}
-+
-+static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
-+{
-+    return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 return 1;
-             case 15:
-                 switch (rn) {
--                case 1:
-+                case 1 ... 2:
-                     /* Already handled by decodetree */
-                     return 1;
-                 default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
-                 switch (rn) {
-                 case 0x00: /* vmov */
--                case 0x02: /* vneg */
-                 case 0x03: /* vsqrt */
-                     break;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                     case 0: /* cpy */
-                         /* no-op */
-                         break;
--                    case 2: /* neg */
--                        gen_vfp_neg(dp);
--                        break;
-                     case 3: /* sqrt */
-                         gen_vfp_sqrt(dp);
-                         break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
-              vd=%vd_sp vm=%vm_sp
- VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
-              vd=%vd_dp vm=%vm_dp
-+
-+VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
-+             vd=%vd_sp vm=%vm_sp
-+VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
-+             vd=%vd_dp vm=%vm_dp
 --
 .20.1

-[Qemu-devel] [PULL 18/48] target/arm: Convert "double-precision" register moves to decodetree
+[PULL 13/39] hw/arm: versal: Embed the GEMs into the SoC type
-Convert the "double-precision" register moves to decodetree:
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
 this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP.
-Note that the conversion process has tightened up a few of the
+Embed the GEMs into the SoC type.
 UNDEF encoding checks: we now correctly forbid:
  * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10
  * VMOV-from-gpr with opc1:opc2 == 0x10
  * VDUP with B:E == 11
  * VDUP with Q == 1 and Vn<0> == 1
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Message-id: 20200427181649.26851-6-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
-The accesses of elements < 32 bits could be improved by doing
+ include/hw/arm/xlnx-versal.h |  3 ++-
-direct ld/st of the right size rather than 32-bit read-and-shift
+ hw/arm/xlnx-versal.c         | 15 ++++++++-------
-or read-modify-write, but we leave this for later cleanup,
+files changed, 10 insertions(+), 8 deletions(-)
 since this series is generally trying to stick to fixing
 the decode.
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/translate-vfp.inc.c | 147 +++++++++++++++++++++++++++++++++
  target/arm/translate.c         |  83 +------------------
  target/arm/vfp.decode          |  36 ++++++++
 files changed, 185 insertions(+), 81 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
+@@ -XXX,XX +XXX,XX @@
+ #include "hw/arm/boot.h"
-     return true;
+ #include "hw/intc/arm_gicv3.h"
  #include "hw/char/pl011.h"
 +#include "hw/net/cadence_gem.h"
  #define TYPE_XLNX_VERSAL "xlnx-versal"
  #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
          struct {
              PL011State uart[XLNX_VERSAL_NR_UARTS];
 -            SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
 +            CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
              SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
          } iou;
      } lpd;
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_gems(Versal *s, qemu_irq *pic)
          DeviceState *dev;
          MemoryRegion *mr;
 -        dev = qdev_create(NULL, "cadence_gem");
 -        s->lpd.iou.gem[i] = SYS_BUS_DEVICE(dev);
 -        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
 +        sysbus_init_child_obj(OBJECT(s), name,
 +                              &s->lpd.iou.gem[i], sizeof(s->lpd.iou.gem[i]),
 +                              TYPE_CADENCE_GEM);
 +        dev = DEVICE(&s->lpd.iou.gem[i]);
          if (nd->used) {
              qemu_check_nic_model(nd, "cadence_gem");
              qdev_set_nic_properties(dev, nd);
          }
 -        object_property_set_int(OBJECT(s->lpd.iou.gem[i]),
 +        object_property_set_int(OBJECT(dev),
 , "num-priority-queues",
                                  &error_abort);
 -        object_property_set_link(OBJECT(s->lpd.iou.gem[i]),
 +        object_property_set_link(OBJECT(dev),
                                   OBJECT(&s->mr_ps), "dma",
                                   &error_abort);
          qdev_init_nofail(dev);
 -        mr = sysbus_mmio_get_region(s->lpd.iou.gem[i], 0);
 +        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
          memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
 -        sysbus_connect_irq(s->lpd.iou.gem[i], 0, pic[irqs[i]]);
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);
          g_free(name);
      }
  }
-+
-+static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
-+{
-+    /* VMOV scalar to general purpose register */
-+    TCGv_i32 tmp;
-+    int pass;
-+    uint32_t offset;
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
-+        return false;
-+    }
-+
-+    offset = a->index << a->size;
-+    pass = extract32(offset, 2, 1);
-+    offset = extract32(offset, 0, 2) * 8;
-+
-+    if (a->size != 2 && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    tmp = neon_load_reg(a->vn, pass);
-+    switch (a->size) {
-+    case 0:
-+        if (offset) {
-+            tcg_gen_shri_i32(tmp, tmp, offset);
-+        }
-+        if (a->u) {
-+            gen_uxtb(tmp);
-+        } else {
-+            gen_sxtb(tmp);
-+        }
-+        break;
-+    case 1:
-+        if (a->u) {
-+            if (offset) {
-+                tcg_gen_shri_i32(tmp, tmp, 16);
-+            } else {
-+                gen_uxth(tmp);
-+            }
-+        } else {
-+            if (offset) {
-+                tcg_gen_sari_i32(tmp, tmp, 16);
-+            } else {
-+                gen_sxth(tmp);
-+            }
-+        }
-+        break;
-+    case 2:
-+        break;
-+    }
-+    store_reg(s, a->rt, tmp);
-+
-+    return true;
-+}
-+
-+static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
-+{
-+    /* VMOV general purpose register to scalar */
-+    TCGv_i32 tmp, tmp2;
-+    int pass;
-+    uint32_t offset;
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
-+        return false;
-+    }
-+
-+    offset = a->index << a->size;
-+    pass = extract32(offset, 2, 1);
-+    offset = extract32(offset, 0, 2) * 8;
-+
-+    if (a->size != 2 && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    tmp = load_reg(s, a->rt);
-+    switch (a->size) {
-+    case 0:
-+        tmp2 = neon_load_reg(a->vn, pass);
-+        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
-+        tcg_temp_free_i32(tmp2);
-+        break;
-+    case 1:
-+        tmp2 = neon_load_reg(a->vn, pass);
-+        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
-+        tcg_temp_free_i32(tmp2);
-+        break;
-+    case 2:
-+        break;
-+    }
-+    neon_store_reg(a->vn, pass, tmp);
-+
-+    return true;
-+}
-+
-+static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
-+{
-+    /* VDUP (general purpose register) */
-+    TCGv_i32 tmp;
-+    int size, vec_size;
-+
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
-+        return false;
-+    }
-+
-+    if (a->b && a->e) {
-+        return false;
-+    }
-+
-+    if (a->q && (a->vn & 1)) {
-+        return false;
-+    }
-+
-+    vec_size = a->q ? 16 : 8;
-+    if (a->b) {
-+        size = 0;
-+    } else if (a->e) {
-+        size = 1;
-+    } else {
-+        size = 2;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    tmp = load_reg(s, a->rt);
-+    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
-+                         vec_size, vec_size, tmp);
-+    tcg_temp_free_i32(tmp);
-+
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             /* single register transfer */
-             rd = (insn >> 12) & 0xf;
-             if (dp) {
--                int size;
--                int pass;
--
--                VFP_DREG_N(rn, insn);
--                if (insn & 0xf)
--                    return 1;
--                if (insn & 0x00c00060
--                    && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
--                    return 1;
--                }
--
--                pass = (insn >> 21) & 1;
--                if (insn & (1 << 22)) {
--                    size = 0;
--                    offset = ((insn >> 5) & 3) * 8;
--                } else if (insn & (1 << 5)) {
--                    size = 1;
--                    offset = (insn & (1 << 6)) ? 16 : 0;
--                } else {
--                    size = 2;
--                    offset = 0;
--                }
--                if (insn & ARM_CP_RW_BIT) {
--                    /* vfp->arm */
--                    tmp = neon_load_reg(rn, pass);
--                    switch (size) {
--                    case 0:
--                        if (offset)
--                            tcg_gen_shri_i32(tmp, tmp, offset);
--                        if (insn & (1 << 23))
--                            gen_uxtb(tmp);
--                        else
--                            gen_sxtb(tmp);
--                        break;
--                    case 1:
--                        if (insn & (1 << 23)) {
--                            if (offset) {
--                                tcg_gen_shri_i32(tmp, tmp, 16);
--                            } else {
--                                gen_uxth(tmp);
--                            }
--                        } else {
--                            if (offset) {
--                                tcg_gen_sari_i32(tmp, tmp, 16);
--                            } else {
--                                gen_sxth(tmp);
--                            }
--                        }
--                        break;
--                    case 2:
--                        break;
--                    }
--                    store_reg(s, rd, tmp);
--                } else {
--                    /* arm->vfp */
--                    tmp = load_reg(s, rd);
--                    if (insn & (1 << 23)) {
--                        /* VDUP */
--                        int vec_size = pass ? 16 : 8;
--                        tcg_gen_gvec_dup_i32(size, neon_reg_offset(rn, 0),
--                                             vec_size, vec_size, tmp);
--                        tcg_temp_free_i32(tmp);
--                    } else {
--                        /* VMOV */
--                        switch (size) {
--                        case 0:
--                            tmp2 = neon_load_reg(rn, pass);
--                            tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
--                            tcg_temp_free_i32(tmp2);
--                            break;
--                        case 1:
--                            tmp2 = neon_load_reg(rn, pass);
--                            tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
--                            tcg_temp_free_i32(tmp2);
--                            break;
--                        case 2:
--                            break;
--                        }
--                        neon_store_reg(rn, pass, tmp);
--                    }
--                }
-+                /* already handled by decodetree */
-+                return 1;
-             } else { /* !dp */
-                 bool is_sysreg;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@
- #  1110 1110 .... .... .... 101. .... ....
- # (but those patterns might also cover some Neon instructions,
- # which do not live in this file.)
-+
-+# VFP registers have an odd encoding with a four-bit field
-+# and a one-bit field which are assembled in different orders
-+# depending on whether the register is double or single precision.
-+# Each individual instruction function must do the checks for
-+# "double register selected but CPU does not have double support"
-+# and "double register number has bit 4 set but CPU does not
-+# support D16-D31" (which should UNDEF).
-+%vm_dp  5:1 0:4
-+%vm_sp  0:4 5:1
-+%vn_dp  7:1 16:4
-+%vn_sp  16:4 7:1
-+%vd_dp  22:1 12:4
-+%vd_sp  12:4 22:1
-+
-+%vmov_idx_b     21:1 5:2
-+%vmov_idx_h     21:1 6:1
-+
-+# VMOV scalar to general-purpose register; note that this does
-+# include some Neon cases.
-+VMOV_to_gp   ---- 1110 u:1 1.        1 .... rt:4 1011 ... 1 0000 \
-+             vn=%vn_dp size=0 index=%vmov_idx_b
-+VMOV_to_gp   ---- 1110 u:1 0.        1 .... rt:4 1011 ..1 1 0000 \
-+             vn=%vn_dp size=1 index=%vmov_idx_h
-+VMOV_to_gp   ---- 1110 0   0 index:1 1 .... rt:4 1011 .00 1 0000 \
-+             vn=%vn_dp size=2 u=0
-+
-+VMOV_from_gp ---- 1110 0 1.        0 .... rt:4 1011 ... 1 0000 \
-+             vn=%vn_dp size=0 index=%vmov_idx_b
-+VMOV_from_gp ---- 1110 0 0.        0 .... rt:4 1011 ..1 1 0000 \
-+             vn=%vn_dp size=1 index=%vmov_idx_h
-+VMOV_from_gp ---- 1110 0 0 index:1 0 .... rt:4 1011 .00 1 0000 \
-+             vn=%vn_dp size=2
-+
-+VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
-+             vn=%vn_dp
 --
 .20.1

-[Qemu-devel] [PULL 05/48] hw/core/bus.c: Only the main system bus can have no parent
+[PULL 14/39] hw/arm: versal: Embed the ADMAs into the SoC type
-In commit 80376c3fc2c38fdd453 in 2010 we added a workaround for
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
 some qbus buses not being connected to qdev devices -- if the
 bus has no parent object then we register a reset function which
 resets the bus on system reset (and unregister it when the
 bus is unparented).
-Nearly a decade later, we have now no buses in the tree which
+Embed the ADMAs into the SoC type.
 are created with non-NULL parents, so we can remove the
 workaround and instead just assert that if the bus has a NULL
 parent then it is the main system bus.
-(The absence of other parentless buses was confirmed by
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
-code inspection of all the callsites of qbus_create() and
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-qbus_create_inplace() and cross-checked by 'make check'.)
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-7-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  include/hw/arm/xlnx-versal.h |  3 ++-
  hw/arm/xlnx-versal.c         | 14 +++++++-------
 files changed, 9 insertions(+), 8 deletions(-)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 Reviewed-by: Markus Armbruster <armbru@redhat.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Reviewed-by: Damien Hedde <damien.hedde@greensocs.com>
 Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 20190523150543.22676-1-peter.maydell@linaro.org
 ---
  hw/core/bus.c | 21 +++++++++------------
 file changed, 9 insertions(+), 12 deletions(-)
 diff --git a/hw/core/bus.c b/hw/core/bus.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/core/bus.c
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/hw/core/bus.c
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ static void qbus_realize(BusState *bus, DeviceState *parent, const char *name)
+@@ -XXX,XX +XXX,XX @@
-         bus->parent->num_child_bus++;
+ #include "hw/arm/boot.h"
-         object_property_add_child(OBJECT(bus->parent), bus->name, OBJECT(bus), NULL);
+ #include "hw/intc/arm_gicv3.h"
-         object_unref(OBJECT(bus));
+ #include "hw/char/pl011.h"
--    } else if (bus != sysbus_get_default()) {
++#include "hw/dma/xlnx-zdma.h"
--        /* TODO: once all bus devices are qdevified,
+ #include "hw/net/cadence_gem.h"
--           only reset handler for main_system_bus should be registered here. */
--        qemu_register_reset(qbus_reset_all_fn, bus);
+ #define TYPE_XLNX_VERSAL "xlnx-versal"
-+    } else {
+@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
-+        /* The only bus without a parent is the main system bus */
+         struct {
-+        assert(bus == sysbus_get_default());
+             PL011State uart[XLNX_VERSAL_NR_UARTS];
              CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
 -            SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
 +            XlnxZDMA adma[XLNX_VERSAL_NR_ADMAS];
          } iou;
      } lpd;
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_admas(Versal *s, qemu_irq *pic)
          DeviceState *dev;
          MemoryRegion *mr;
 -        dev = qdev_create(NULL, "xlnx.zdma");
 -        s->lpd.iou.adma[i] = SYS_BUS_DEVICE(dev);
 -        object_property_set_int(OBJECT(s->lpd.iou.adma[i]), 128, "bus-width",
 -                                &error_abort);
 -        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
 +        sysbus_init_child_obj(OBJECT(s), name,
 +                              &s->lpd.iou.adma[i], sizeof(s->lpd.iou.adma[i]),
 +                              TYPE_XLNX_ZDMA);
 +        dev = DEVICE(&s->lpd.iou.adma[i]);
 +        object_property_set_int(OBJECT(dev), 128, "bus-width", &error_abort);
          qdev_init_nofail(dev);
 -        mr = sysbus_mmio_get_region(s->lpd.iou.adma[i], 0);
 +        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
          memory_region_add_subregion(&s->mr_ps,
                                      MM_ADMA_CH0 + i * MM_ADMA_CH0_SIZE, mr);
 -        sysbus_connect_irq(s->lpd.iou.adma[i], 0, pic[VERSAL_ADMA_IRQ_0 + i]);
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[VERSAL_ADMA_IRQ_0 + i]);
          g_free(name);
      }
  }
-@@ -XXX,XX +XXX,XX @@ static void bus_unparent(Object *obj)
-     BusState *bus = BUS(obj);
-     BusChild *kid;
-+    /* Only the main system bus has no parent, and that bus is never freed */
-+    assert(bus->parent);
-+
-     while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
-         DeviceState *dev = kid->child;
-         object_unparent(OBJECT(dev));
-     }
--    if (bus->parent) {
--        QLIST_REMOVE(bus, sibling);
--        bus->parent->num_child_bus--;
--        bus->parent = NULL;
--    } else {
--        assert(bus != sysbus_get_default()); /* main_system_bus is never freed */
--        qemu_unregister_reset(qbus_reset_all_fn, bus);
--    }
-+    QLIST_REMOVE(bus, sibling);
-+    bus->parent->num_child_bus--;
-+    bus->parent = NULL;
- }
- void qbus_create_inplace(void *bus, size_t size, const char *typename,
 --
 .20.1

-[Qemu-devel] [PULL 14/48] target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
+[PULL 15/39] hw/arm: versal: Embed the APUs into the SoC type
-Convert the VRINTA/VRINTN/VRINTP/VRINTM instructions to decodetree.
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
 Again, trans_VRINT() is temporarily left in translate.c.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Embed the APUs into the SoC type.
 Suggested-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-8-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c       | 60 +++++++++++++++++++++++-------------
+ include/hw/arm/xlnx-versal.h |  2 +-
- target/arm/vfp-uncond.decode |  5 +++
+ hw/arm/xlnx-versal-virt.c    |  4 ++--
-files changed, 43 insertions(+), 22 deletions(-)
+ hw/arm/xlnx-versal.c         | 19 +++++--------------
 files changed, 8 insertions(+), 17 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/target/arm/translate.c
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
+@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
-     return true;
+     struct {
          struct {
              MemoryRegion mr;
 -            ARMCPU *cpu[XLNX_VERSAL_NR_ACPUS];
 +            ARMCPU cpu[XLNX_VERSAL_NR_ACPUS];
              GICv3State gic;
          } apu;
      } fpd;
 diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal-virt.c
 +++ b/hw/arm/xlnx-versal-virt.c
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
      s->binfo.get_dtb = versal_virt_get_dtb;
      s->binfo.modify_dtb = versal_virt_modify_dtb;
      if (machine->kernel_filename) {
 -        arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
 +        arm_load_kernel(&s->soc.fpd.apu.cpu[0], machine, &s->binfo);
      } else {
 -        AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
 +        AddressSpace *as = arm_boot_address_space(&s->soc.fpd.apu.cpu[0],
                                                    &s->binfo);
          /* Some boot-loaders (e.g u-boot) don't like blobs at address 0 (NULL).
           * Offset things by 4K.  */
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
      for (i = 0; i < ARRAY_SIZE(s->fpd.apu.cpu); i++) {
          Object *obj;
 -        char *name;
 -
 -        obj = object_new(XLNX_VERSAL_ACPU_TYPE);
 -        if (!obj) {
 -            error_report("Unable to create apu.cpu[%d] of type %s",
 -                         i, XLNX_VERSAL_ACPU_TYPE);
 -            exit(EXIT_FAILURE);
 -        }
 -
 -        name = g_strdup_printf("apu-cpu[%d]", i);
 -        object_property_add_child(OBJECT(s), name, obj, &error_fatal);
 -        g_free(name);
 +        object_initialize_child(OBJECT(s), "apu-cpu[*]",
 +                                &s->fpd.apu.cpu[i], sizeof(s->fpd.apu.cpu[i]),
 +                                XLNX_VERSAL_ACPU_TYPE, &error_abort, NULL);
 +        obj = OBJECT(&s->fpd.apu.cpu[i]);
          object_property_set_int(obj, s->cfg.psci_conduit,
                                  "psci-conduit", &error_abort);
          if (i) {
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
          object_property_set_link(obj, OBJECT(&s->fpd.apu.mr), "memory",
                                   &error_abort);
          object_property_set_bool(obj, true, "realized", &error_fatal);
 -        s->fpd.apu.cpu[i] = ARM_CPU(obj);
      }
  }
--static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
+@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_gic(Versal *s, qemu_irq *pic)
 -                        int rounding)
 +/*
 + * Table for converting the most common AArch32 encoding of
 + * rounding mode to arm_fprounding order (which matches the
 + * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
 + */
 +static const uint8_t fp_decode_rm[] = {
 +    FPROUNDING_TIEAWAY,
 +    FPROUNDING_TIEEVEN,
 +    FPROUNDING_POSINF,
 +    FPROUNDING_NEGINF,
 +};
 +
 +static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
  {
 -    TCGv_ptr fpst = get_fpstatus_ptr(0);
 +    uint32_t rd, rm;
 +    bool dp = a->dp;
 +    TCGv_ptr fpst;
      TCGv_i32 tcg_rmode;
 +    int rounding = fp_decode_rm[a->rm];
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +        ((a->vm | a->vd) & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
      tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
@@ -XXX,XX +XXX,XX @@ static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_ptr(fpst);
 -    return 0;
 +    return true;
  }
  static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
      return 0;
  }
 -/* Table for converting the most common AArch32 encoding of
 - * rounding mode to arm_fprounding order (which matches the
 - * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
 - */
 -static const uint8_t fp_decode_rm[] = {
 -    FPROUNDING_TIEAWAY,
 -    FPROUNDING_TIEEVEN,
 -    FPROUNDING_POSINF,
 -    FPROUNDING_NEGINF,
 -};
 -
  static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
  {
      uint32_t rd, rm, dp = extract32(insn, 8, 1);
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
          rm = VFP_SREG_M(insn);
      }
--    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
+     for (i = 0; i < nr_apu_cpus; i++) {
--        dc_isar_feature(aa32_vrint, s)) {
+-        DeviceState *cpudev = DEVICE(s->fpd.apu.cpu[i]);
--        /* VRINTA, VRINTN, VRINTP, VRINTM */
++        DeviceState *cpudev = DEVICE(&s->fpd.apu.cpu[i]);
--        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
+         int ppibase = XLNX_VERSAL_NR_IRQS + i * GIC_INTERNAL + GIC_NR_SGIS;
--        return handle_vrint(insn, rd, rm, dp, rounding);
+         qemu_irq maint_irq;
--    } else if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
+         int ti;
 -               dc_isar_feature(aa32_vcvt_dr, s)) {
 +    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
 +        dc_isar_feature(aa32_vcvt_dr, s)) {
          /* VCVTA, VCVTN, VCVTP, VCVTM */
          int rounding = fp_decode_rm[extract32(insn, 16, 2)];
          return handle_vcvt(insn, rd, rm, dp, rounding);
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp-uncond.decode
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
  VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 +
 +VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
 +            vm=%vm_sp vd=%vd_sp dp=0
 +VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
 +            vm=%vm_dp vd=%vd_dp dp=1
 --
 .20.1

-[Qemu-devel] [PULL 35/48] target/arm: Convert VABS to decodetree
+[PULL 16/39] hw/arm: versal: Add support for SD
-Convert the VFP VABS instruction to decodetree.
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or
+Add support for SD.
 VFPGen2OpDPFn because none of the operations which use this format
 and support short vectors will need it.
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Message-id: 20200427181649.26851-9-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 167 +++++++++++++++++++++++++++++++++
+ include/hw/arm/xlnx-versal.h | 12 ++++++++++++
- target/arm/translate.c         |  12 ++-
+ hw/arm/xlnx-versal.c         | 31 +++++++++++++++++++++++++++++++
- target/arm/vfp.decode          |   5 +
+files changed, 43 insertions(+)
 files changed, 180 insertions(+), 4 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpSPFn(TCGv_i32 vd,
+@@ -XXX,XX +XXX,XX @@
- typedef void VFPGen3OpDPFn(TCGv_i64 vd,
-                            TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
+ #include "hw/sysbus.h"
+ #include "hw/arm/boot.h"
-+/*
++#include "hw/sd/sdhci.h"
-+ * Types for callbacks for do_vfp_2op_sp() and do_vfp_2op_dp().
+ #include "hw/intc/arm_gicv3.h"
-+ * The callback should emit code to write a value to vd (which
+ #include "hw/char/pl011.h"
-+ * should be written to only).
+ #include "hw/dma/xlnx-zdma.h"
-+ */
+@@ -XXX,XX +XXX,XX @@
-+typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
+ #define XLNX_VERSAL_NR_UARTS   2
-+typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
+ #define XLNX_VERSAL_NR_GEMS    2
  #define XLNX_VERSAL_NR_ADMAS   8
 +#define XLNX_VERSAL_NR_SDS     2
  #define XLNX_VERSAL_NR_IRQS    192
  typedef struct Versal {
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
          } iou;
      } lpd;
 +    /* The Platform Management Controller subsystem.  */
 +    struct {
 +        struct {
 +            SDHCIState sd[XLNX_VERSAL_NR_SDS];
 +        } iou;
 +    } pmc;
 +
- /*
+     struct {
-  * Perform a 3-operand VFP data processing instruction. fn is the
+         MemoryRegion *mr_ddr;
-  * callback to do the actual operation; this function deals with the
+         uint32_t psci_conduit;
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
+@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
-     return true;
+ #define VERSAL_GEM1_IRQ_0          58
  #define VERSAL_GEM1_WAKE_IRQ_0     59
  #define VERSAL_ADMA_IRQ_0          60
 +#define VERSAL_SD0_IRQ_0           126
  /* Architecturally reserved IRQs suitable for virtualization.  */
  #define VERSAL_RSVD_IRQ_FIRST 111
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
  #define MM_FPD_CRF                  0xfd1a0000U
  #define MM_FPD_CRF_SIZE             0x140000
 +#define MM_PMC_SD0                  0xf1040000U
 +#define MM_PMC_SD0_SIZE             0x10000
  #define MM_PMC_CRP                  0xf1260000U
  #define MM_PMC_CRP_SIZE             0x10000
  #endif
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_admas(Versal *s, qemu_irq *pic)
      }
  }
-+static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
++#define SDHCI_CAPABILITIES  0x280737ec6481 /* Same as on ZynqMP.  */
 +static void versal_create_sds(Versal *s, qemu_irq *pic)
 +{
-+    uint32_t delta_m = 0;
++    int i;
 +    uint32_t delta_d = 0;
 +    uint32_t bank_mask = 0;
 +    int veclen = s->vec_len;
 +    TCGv_i32 f0, fd;
 +
-+    if (!dc_isar_feature(aa32_fpshvec, s) &&
++    for (i = 0; i < ARRAY_SIZE(s->pmc.iou.sd); i++) {
-+        (veclen != 0 || s->vec_stride != 0)) {
++        DeviceState *dev;
-+        return false;
++        MemoryRegion *mr;
 +
 +        sysbus_init_child_obj(OBJECT(s), "sd[*]",
 +                              &s->pmc.iou.sd[i], sizeof(s->pmc.iou.sd[i]),
 +                              TYPE_SYSBUS_SDHCI);
 +        dev = DEVICE(&s->pmc.iou.sd[i]);
 +
 +        object_property_set_uint(OBJECT(dev),
 +                                 3, "sd-spec-version", &error_fatal);
 +        object_property_set_uint(OBJECT(dev), SDHCI_CAPABILITIES, "capareg",
 +                                 &error_fatal);
 +        object_property_set_uint(OBJECT(dev), UHS_I, "uhs", &error_fatal);
 +        qdev_init_nofail(dev);
 +
 +        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
 +        memory_region_add_subregion(&s->mr_ps,
 +                                    MM_PMC_SD0 + i * MM_PMC_SD0_SIZE, mr);
 +
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0,
 +                           pic[VERSAL_SD0_IRQ_0 + i * 2]);
 +    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    if (veclen > 0) {
-+        bank_mask = 0x18;
-+
-+        /* Figure out what type of vector operation this is.  */
-+        if ((vd & bank_mask) == 0) {
-+            /* scalar */
-+            veclen = 0;
-+        } else {
-+            delta_d = s->vec_stride + 1;
-+
-+            if ((vm & bank_mask) == 0) {
-+                /* mixed scalar/vector */
-+                delta_m = 0;
-+            } else {
-+                /* vector */
-+                delta_m = delta_d;
-+            }
-+        }
-+    }
-+
-+    f0 = tcg_temp_new_i32();
-+    fd = tcg_temp_new_i32();
-+
-+    neon_load_reg32(f0, vm);
-+
-+    for (;;) {
-+        fn(fd, f0);
-+        neon_store_reg32(fd, vd);
-+
-+        if (veclen == 0) {
-+            break;
-+        }
-+
-+        if (delta_m == 0) {
-+            /* single source one-many */
-+            while (veclen--) {
-+                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-+                neon_store_reg32(fd, vd);
-+            }
-+            break;
-+        }
-+
-+        /* Set up the operands for the next iteration */
-+        veclen--;
-+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-+        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
-+        neon_load_reg32(f0, vm);
-+    }
-+
-+    tcg_temp_free_i32(f0);
-+    tcg_temp_free_i32(fd);
-+
-+    return true;
 +}
 +
-+static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
+ /* This takes the board allocated linear DDR memory and creates aliases
-+{
+  * for each split DDR range/aperture on the Versal address map.
-+    uint32_t delta_m = 0;
+  */
-+    uint32_t delta_d = 0;
+@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
-+    uint32_t bank_mask = 0;
+     versal_create_uarts(s, pic);
-+    int veclen = s->vec_len;
+     versal_create_gems(s, pic);
-+    TCGv_i64 f0, fd;
+     versal_create_admas(s, pic);
-+
++    versal_create_sds(s, pic);
-+    /* UNDEF accesses to D16-D31 if they don't exist */
+     versal_map_ddr(s);
-+    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vm) & 0x10)) {
+     versal_unimp(s);
-+        return false;
 +    }
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0xc;
 +
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = (s->vec_stride >> 1) + 1;
 +
 +            if ((vm & bank_mask) == 0) {
 +                /* mixed scalar/vector */
 +                delta_m = 0;
 +            } else {
 +                /* vector */
 +                delta_m = delta_d;
 +            }
 +        }
 +    }
 +
 +    f0 = tcg_temp_new_i64();
 +    fd = tcg_temp_new_i64();
 +
 +    neon_load_reg64(f0, vm);
 +
 +    for (;;) {
 +        fn(fd, f0);
 +        neon_store_reg64(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        if (delta_m == 0) {
 +            /* single source one-many */
 +            while (veclen--) {
 +                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +                neon_store_reg64(fd, vd);
 +            }
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
 +        neon_load_reg64(f0, vm);
 +    }
 +
 +    tcg_temp_free_i64(f0);
 +    tcg_temp_free_i64(fd);
 +
 +    return true;
 +}
 +
  static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
  {
      /* Note that order of inputs to the add matters for NaNs */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      tcg_temp_free_i64(fd);
      return true;
  }
 +
 +static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
 +{
 +    return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
 +}
 +
 +static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
 +{
 +    return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              case 0 ... 14:
                  /* Already handled by decodetree */
                  return 1;
 +            case 15:
 +                switch (rn) {
 +                case 1:
 +                    /* Already handled by decodetree */
 +                    return 1;
 +                default:
 +                    break;
 +                }
              default:
                  break;
              }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  /* rn is opcode, encoded as per VFP_SREG_N. */
                  switch (rn) {
                  case 0x00: /* vmov */
 -                case 0x01: /* vabs */
                  case 0x02: /* vneg */
                  case 0x03: /* vsqrt */
                      break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                      case 0: /* cpy */
                          /* no-op */
                          break;
 -                    case 1: /* abs */
 -                        gen_vfp_abs(dp);
 -                        break;
                      case 2: /* neg */
                          gen_vfp_neg(dp);
                          break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
               vd=%vd_sp
  VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
               vd=%vd_dp
 +
 +VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 --
 .20.1

-[Qemu-devel] [PULL 06/48] target/arm: Fix output of PAuth Auth
+[PULL 17/39] hw/arm: versal: Add support for the RTC
-From: Richard Henderson <richard.henderson@linaro.org>
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
-The ARM pseudocode installs the error_code into the original
+hw/arm: versal: Add support for the RTC.
 pointer, not the encrypted pointer.  The difference applies
 within the 7 bits of pac data; the result should be the sign
 extension of bit 55.
-Add a testcase to that effect.
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-10-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- tests/tcg/aarch64/Makefile.target |  2 +-
+ include/hw/arm/xlnx-versal.h |  8 ++++++++
- target/arm/pauth_helper.c         |  4 +-
+ hw/arm/xlnx-versal.c         | 21 +++++++++++++++++++++
- tests/tcg/aarch64/pauth-2.c       | 61 +++++++++++++++++++++++++++++++
+files changed, 29 insertions(+)
 files changed, 64 insertions(+), 3 deletions(-)
  create mode 100644 tests/tcg/aarch64/pauth-2.c
-diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
+diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
 index XXXXXXX..XXXXXXX 100644
---- a/tests/tcg/aarch64/Makefile.target
+--- a/include/hw/arm/xlnx-versal.h
-+++ b/tests/tcg/aarch64/Makefile.target
++++ b/include/hw/arm/xlnx-versal.h
-@@ -XXX,XX +XXX,XX @@ run-fcvt: fcvt
+@@ -XXX,XX +XXX,XX @@
-     $(call run-test,$<,$(QEMU) $<, "$< on $(TARGET_NAME)")
+ #include "hw/char/pl011.h"
-     $(call diff-out,$<,$(AARCH64_SRC)/fcvt.ref)
+ #include "hw/dma/xlnx-zdma.h"
+ #include "hw/net/cadence_gem.h"
--AARCH64_TESTS += pauth-1
++#include "hw/rtc/xlnx-zynqmp-rtc.h"
-+AARCH64_TESTS += pauth-1 pauth-2
- run-pauth-%: QEMU += -cpu max
+ #define TYPE_XLNX_VERSAL "xlnx-versal"
+ #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
- TESTS:=$(AARCH64_TESTS)
+@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
-diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
+         struct {
              SDHCIState sd[XLNX_VERSAL_NR_SDS];
          } iou;
 +
 +        XlnxZynqMPRTC rtc;
      } pmc;
      struct {
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
  #define VERSAL_GEM1_IRQ_0          58
  #define VERSAL_GEM1_WAKE_IRQ_0     59
  #define VERSAL_ADMA_IRQ_0          60
 +#define VERSAL_RTC_APB_ERR_IRQ     121
  #define VERSAL_SD0_IRQ_0           126
 +#define VERSAL_RTC_ALARM_IRQ       142
 +#define VERSAL_RTC_SECONDS_IRQ     143
  /* Architecturally reserved IRQs suitable for virtualization.  */
  #define VERSAL_RSVD_IRQ_FIRST 111
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
  #define MM_PMC_SD0_SIZE             0x10000
  #define MM_PMC_CRP                  0xf1260000U
  #define MM_PMC_CRP_SIZE             0x10000
 +#define MM_PMC_RTC                  0xf12a0000
 +#define MM_PMC_RTC_SIZE             0x10000
  #endif
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/pauth_helper.c
+--- a/hw/arm/xlnx-versal.c
-+++ b/target/arm/pauth_helper.c
++++ b/hw/arm/xlnx-versal.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
+@@ -XXX,XX +XXX,XX @@ static void versal_create_sds(Versal *s, qemu_irq *pic)
      if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
          int error_code = (keynumber << 1) | (keynumber ^ 1);
          if (param.tbi) {
 -            return deposit64(ptr, 53, 2, error_code);
 +            return deposit64(orig_ptr, 53, 2, error_code);
          } else {
 -            return deposit64(ptr, 61, 2, error_code);
 +            return deposit64(orig_ptr, 61, 2, error_code);
          }
      }
-     return orig_ptr;
+ }
-diff --git a/tests/tcg/aarch64/pauth-2.c b/tests/tcg/aarch64/pauth-2.c
-new file mode 100644
++static void versal_create_rtc(Versal *s, qemu_irq *pic)
-index XXXXXXX..XXXXXXX
++{
---- /dev/null
++    SysBusDevice *sbd;
-+++ b/tests/tcg/aarch64/pauth-2.c
++    MemoryRegion *mr;
@@ -XXX,XX +XXX,XX @@
 +#include <stdint.h>
 +#include <assert.h>
 +
-+asm(".arch armv8.4-a");
++    sysbus_init_child_obj(OBJECT(s), "rtc", &s->pmc.rtc, sizeof(s->pmc.rtc),
 +                          TYPE_XLNX_ZYNQMP_RTC);
 +    sbd = SYS_BUS_DEVICE(&s->pmc.rtc);
 +    qdev_init_nofail(DEVICE(sbd));
 +
-+void do_test(uint64_t value)
++    mr = sysbus_mmio_get_region(sbd, 0);
-+{
++    memory_region_add_subregion(&s->mr_ps, MM_PMC_RTC, mr);
 +    uint64_t salt1, salt2;
 +    uint64_t encode, decode;
 +
 +    /*
-+     * With TBI enabled and a 48-bit VA, there are 7 bits of auth,
++     * TODO: Connect the ALARM and SECONDS interrupts once our RTC model
-+     * and so a 1/128 chance of encode = pac(value,key,salt) producing
++     * supports them.
 +     * an auth for which leaves value unchanged.
 +     * Iterate until we find a salt for which encode != value.
 +     */
-+    for (salt1 = 1; ; salt1++) {
++    sysbus_connect_irq(sbd, 1, pic[VERSAL_RTC_APB_ERR_IRQ]);
 +        asm volatile("pacda %0, %2" : "=r"(encode) : "0"(value), "r"(salt1));
 +        if (encode != value) {
 +            break;
 +        }
 +    }
 +
 +    /* A valid salt must produce a valid authorization.  */
 +    asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt1));
 +    assert(decode == value);
 +
 +    /*
 +     * An invalid salt usually fails authorization, but again there
 +     * is a chance of choosing another salt that works.
 +     * Iterate until we find another salt which does fail.
 +     */
 +    for (salt2 = salt1 + 1; ; salt2++) {
 +        asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt2));
 +        if (decode != value) {
 +            break;
 +        }
 +    }
 +
 +    /* The VA bits, bit 55, and the TBI bits, should be unchanged.  */
 +    assert(((decode ^ value) & 0xff80ffffffffffffull) == 0);
 +
 +    /*
 +     * Bits [54:53] are an error indicator based on the key used;
 +     * the DA key above is keynumber 0, so error == 0b01.  Otherwise
 +     * bit 55 of the original is sign-extended into the rest of the auth.
 +     */
 +    if ((value >> 55) & 1) {
 +        assert(((decode >> 48) & 0xff) == 0b10111111);
 +    } else {
 +        assert(((decode >> 48) & 0xff) == 0b00100000);
 +    }
 +}
 +
-+int main()
+ /* This takes the board allocated linear DDR memory and creates aliases
-+{
+  * for each split DDR range/aperture on the Versal address map.
-+    do_test(0);
+  */
-+    do_test(-1);
+@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
-+    do_test(0xda004acedeadbeefull);
+     versal_create_gems(s, pic);
-+    return 0;
+     versal_create_admas(s, pic);
-+}
+     versal_create_sds(s, pic);
 +    versal_create_rtc(s, pic);
      versal_map_ddr(s);
      versal_unimp(s);
 --
 .20.1

-[Qemu-devel] [PULL 38/48] target/arm: Convert VMOV (register) to decodetree
+[PULL 18/39] hw/arm: versal-virt: Add support for SD
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+Add support for SD.
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Message-id: 20200427181649.26851-11-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-vfp.inc.c | 10 ++++++++++
+ hw/arm/xlnx-versal-virt.c | 46 +++++++++++++++++++++++++++++++++++++++
- target/arm/translate.c         |  8 +-------
+file changed, 46 insertions(+)
  target/arm/vfp.decode          |  5 +++++
 files changed, 16 insertions(+), 7 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/arm/xlnx-versal-virt.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/arm/xlnx-versal-virt.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+@@ -XXX,XX +XXX,XX @@
-     return true;
+ #include "hw/arm/sysbus-fdt.h"
  #include "hw/arm/fdt.h"
  #include "cpu.h"
 +#include "hw/qdev-properties.h"
  #include "hw/arm/xlnx-versal.h"
  #define TYPE_XLNX_VERSAL_VIRT_MACHINE MACHINE_TYPE_NAME("xlnx-versal-virt")
@@ -XXX,XX +XXX,XX @@ static void fdt_add_zdma_nodes(VersalVirt *s)
      }
  }
-+static bool trans_VMOV_reg_sp(DisasContext *s, arg_VMOV_reg_sp *a)
++static void fdt_add_sd_nodes(VersalVirt *s)
 +{
-+    return do_vfp_2op_sp(s, tcg_gen_mov_i32, a->vd, a->vm);
++    const char clocknames[] = "clk_xin\0clk_ahb";
 +    const char compat[] = "arasan,sdhci-8.9a";
 +    int i;
 +
 +    for (i = ARRAY_SIZE(s->soc.pmc.iou.sd) - 1; i >= 0; i--) {
 +        uint64_t addr = MM_PMC_SD0 + MM_PMC_SD0_SIZE * i;
 +        char *name = g_strdup_printf("/sdhci@%" PRIx64, addr);
 +
 +        qemu_fdt_add_subnode(s->fdt, name);
 +
 +        qemu_fdt_setprop_cells(s->fdt, name, "clocks",
 +                               s->phandle.clk_25Mhz, s->phandle.clk_25Mhz);
 +        qemu_fdt_setprop(s->fdt, name, "clock-names",
 +                         clocknames, sizeof(clocknames));
 +        qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
 +                               GIC_FDT_IRQ_TYPE_SPI, VERSAL_SD0_IRQ_0 + i * 2,
 +                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
 +        qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
 +                                     2, addr, 2, MM_PMC_SD0_SIZE);
 +        qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
 +        g_free(name);
 +    }
 +}
 +
-+static bool trans_VMOV_reg_dp(DisasContext *s, arg_VMOV_reg_dp *a)
+ static void fdt_nop_memory_nodes(void *fdt, Error **errp)
  {
      Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void create_virtio_regions(VersalVirt *s)
      }
  }
 +static void sd_plugin_card(SDHCIState *sd, DriveInfo *di)
 +{
-+    return do_vfp_2op_dp(s, tcg_gen_mov_i64, a->vd, a->vm);
++    BlockBackend *blk = di ? blk_by_legacy_dinfo(di) : NULL;
 +    DeviceState *card;
 +
 +    card = qdev_create(qdev_get_child_bus(DEVICE(sd), "sd-bus"), TYPE_SD_CARD);
 +    object_property_add_child(OBJECT(sd), "card[*]", OBJECT(card),
 +                              &error_fatal);
 +    qdev_prop_set_drive(card, "drive", blk, &error_fatal);
 +    object_property_set_bool(OBJECT(card), true, "realized", &error_fatal);
 +}
 +
- static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
+ static void versal_virt_init(MachineState *machine)
  {
-     return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
+     VersalVirt *s = XLNX_VERSAL_VIRT_MACHINE(machine);
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+     int psci_conduit = QEMU_PSCI_CONDUIT_DISABLED;
-index XXXXXXX..XXXXXXX 100644
++    int i;
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
+     /*
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+      * If the user provides an Operating System to be loaded, we expect them
-                 return 1;
+@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
-             case 15:
+     fdt_add_gic_nodes(s);
-                 switch (rn) {
+     fdt_add_timer_nodes(s);
--                case 1 ... 3:
+     fdt_add_zdma_nodes(s);
-+                case 0 ... 3:
++    fdt_add_sd_nodes(s);
-                     /* Already handled by decodetree */
+     fdt_add_cpu_nodes(s, psci_conduit);
-                     return 1;
+     fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
-                 default:
+     fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
-             if (op == 15) {
+     memory_region_add_subregion_overlap(get_system_memory(),
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
+, &s->soc.fpd.apu.mr, 0);
-                 switch (rn) {
--                case 0x00: /* vmov */
++    /* Plugin SD cards.  */
--                    break;
++    for (i = 0; i < ARRAY_SIZE(s->soc.pmc.iou.sd); i++) {
--
++        sd_plugin_card(&s->soc.pmc.iou.sd[i], drive_get_next(IF_SD));
-                 case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
++    }
                  case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
                      /*
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 0: /* cpy */
 -                        /* no-op */
 -                        break;
                      case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
                      {
                          TCGv_ptr fpst = get_fpstatus_ptr(false);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
  VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
               vd=%vd_dp
 +VMOV_reg_sp  ---- 1110 1.11 0000 .... 1010 01.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VMOV_reg_dp  ---- 1110 1.11 0000 .... 1011 01.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 +
- VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
+     s->binfo.ram_size = machine->ram_size;
-              vd=%vd_sp vm=%vm_sp
+     s->binfo.loader_start = 0x0;
- VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
+     s->binfo.get_dtb = versal_virt_get_dtb;
 --
 .20.1

-[Qemu-devel] [PULL 17/48] target/arm: Add helpers for VFP register loads and stores
+[PULL 19/39] hw/arm: versal-virt: Add support for the RTC
-The current VFP code has two different idioms for
+From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
 loading and storing from the VFP register file:
 using the gen_mov_F0_vreg() and similar functions,
    which load and store to a fixed set of TCG globals
    cpu_F0s, CPU_F0d, etc
 by direct calls to tcg_gen_ld_f64() and friends
-We want to phase out idiom 1 (because the use of the
+Add support for the RTC.
 fixed globals is a relic of a much older version of TCG),
 but idiom 2 is quite longwinded:
  tcg_gen_ld_f64(tmp, cpu_env, vfp_reg_offset(true, reg))
 requires us to specify the 64-bitness twice, once in
 the function name and once by passing 'true' to
 vfp_reg_offset(). There's no guard against accidentally
 passing the wrong flag.
-Instead, let's move to a convention of accessing 64-bit
+Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-registers via the existing neon_load_reg64() and
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-neon_store_reg64(), and provide new neon_load_reg32()
+Reviewed-by: Luc Michel <luc.michel@greensocs.com>
-and neon_store_reg32() for the 32-bit equivalents.
+Message-id: 20200427181649.26851-12-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/xlnx-versal-virt.c | 22 ++++++++++++++++++++++
 file changed, 22 insertions(+)
-Implement the new functions and use them in the code in
+diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
 translate-vfp.inc.c. We will convert the rest of the VFP
 code as we do the decodetree conversion in subsequent
 commits.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/translate-vfp.inc.c | 40 +++++++++++++++++-----------------
  target/arm/translate.c         | 10 +++++++++
 files changed, 30 insertions(+), 20 deletions(-)
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/hw/arm/xlnx-versal-virt.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/hw/arm/xlnx-versal-virt.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+@@ -XXX,XX +XXX,XX @@ static void fdt_add_sd_nodes(VersalVirt *s)
          tcg_gen_ext_i32_i64(nf, cpu_NF);
          tcg_gen_ext_i32_i64(vf, cpu_VF);
 -        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        neon_load_reg64(frn, rn);
 +        neon_load_reg64(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
              tcg_temp_free_i64(tmp);
              break;
          }
 -        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        neon_store_reg64(dest, rd);
          tcg_temp_free_i64(frn);
          tcg_temp_free_i64(frm);
          tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          frn = tcg_temp_new_i32();
          frm = tcg_temp_new_i32();
          dest = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        neon_load_reg32(frn, rn);
 +        neon_load_reg32(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
              tcg_temp_free_i32(tmp);
              break;
          }
 -        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        neon_store_reg32(dest, rd);
          tcg_temp_free_i32(frn);
          tcg_temp_free_i32(frm);
          tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
          frm = tcg_temp_new_i64();
          dest = tcg_temp_new_i64();
 -        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        neon_load_reg64(frn, rn);
 +        neon_load_reg64(frm, rm);
          if (vmin) {
              gen_helper_vfp_minnumd(dest, frn, frm, fpst);
          } else {
              gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
          }
 -        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        neon_store_reg64(dest, rd);
          tcg_temp_free_i64(frn);
          tcg_temp_free_i64(frm);
          tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
          frm = tcg_temp_new_i32();
          dest = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        neon_load_reg32(frn, rn);
 +        neon_load_reg32(frm, rm);
          if (vmin) {
              gen_helper_vfp_minnums(dest, frn, frm, fpst);
          } else {
              gen_helper_vfp_maxnums(dest, frn, frm, fpst);
          }
 -        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        neon_store_reg32(dest, rd);
          tcg_temp_free_i32(frn);
          tcg_temp_free_i32(frm);
          tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i64 tcg_res;
          tcg_op = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
 -        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 +        neon_load_reg64(tcg_op, rm);
          gen_helper_rintd(tcg_res, tcg_op, fpst);
 -        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 +        neon_store_reg64(tcg_res, rd);
          tcg_temp_free_i64(tcg_op);
          tcg_temp_free_i64(tcg_res);
      } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i32 tcg_res;
          tcg_op = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 +        neon_load_reg32(tcg_op, rm);
          gen_helper_rints(tcg_res, tcg_op, fpst);
 -        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 +        neon_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_op);
          tcg_temp_free_i32(tcg_res);
      }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-         tcg_double = tcg_temp_new_i64();
-         tcg_res = tcg_temp_new_i64();
-         tcg_tmp = tcg_temp_new_i32();
--        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
-+        neon_load_reg64(tcg_double, rm);
-         if (is_signed) {
-             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
-         } else {
-             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
-         }
-         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
--        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
-+        neon_store_reg32(tcg_tmp, rd);
-         tcg_temp_free_i32(tcg_tmp);
-         tcg_temp_free_i64(tcg_res);
-         tcg_temp_free_i64(tcg_double);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-         TCGv_i32 tcg_single, tcg_res;
-         tcg_single = tcg_temp_new_i32();
-         tcg_res = tcg_temp_new_i32();
--        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
-+        neon_load_reg32(tcg_single, rm);
-         if (is_signed) {
-             gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
-         } else {
-             gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
-         }
--        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
-+        neon_store_reg32(tcg_res, rd);
-         tcg_temp_free_i32(tcg_res);
-         tcg_temp_free_i32(tcg_single);
-     }
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
-     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
  }
-+static inline void neon_load_reg32(TCGv_i32 var, int reg)
++static void fdt_add_rtc_node(VersalVirt *s)
 +{
-+    tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
++    const char compat[] = "xlnx,zynqmp-rtc";
 +    const char interrupt_names[] = "alarm\0sec";
 +    char *name = g_strdup_printf("/rtc@%x", MM_PMC_RTC);
 +
 +    qemu_fdt_add_subnode(s->fdt, name);
 +
 +    qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
 +                           GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_ALARM_IRQ,
 +                           GIC_FDT_IRQ_FLAGS_LEVEL_HI,
 +                           GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_SECONDS_IRQ,
 +                           GIC_FDT_IRQ_FLAGS_LEVEL_HI);
 +    qemu_fdt_setprop(s->fdt, name, "interrupt-names",
 +                     interrupt_names, sizeof(interrupt_names));
 +    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
 +                                 2, MM_PMC_RTC, 2, MM_PMC_RTC_SIZE);
 +    qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
 +    g_free(name);
 +}
 +
-+static inline void neon_store_reg32(TCGv_i32 var, int reg)
+ static void fdt_nop_memory_nodes(void *fdt, Error **errp)
 +{
 +    tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 +}
 +
  static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  {
-     TCGv_ptr ret = tcg_temp_new_ptr();
+     Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
      fdt_add_timer_nodes(s);
      fdt_add_zdma_nodes(s);
      fdt_add_sd_nodes(s);
 +    fdt_add_rtc_node(s);
      fdt_add_cpu_nodes(s, psci_conduit);
      fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
      fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
 --
 .20.1

-[Qemu-devel] [PULL 09/48] target/arm: Factor out VFP access checking code
+[PULL 20/39] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
-Factor out the VFP access checking code so that we can use it in the
+Somewhere along theline we accidentally added a duplicate
-leaf functions of the decodetree decoder.
+"using D16-D31 when they don't exist" check to do_vfm_dp()
+(probably an artifact of a patchseries rebase). Remove it.
 We call the function full_vfp_access_check() so we can keep
 the more natural vfp_access_check() for a version which doesn't
 have the 'ignore_vfp_enabled' flag -- that way almost all VFP
 insns will be able to use vfp_access_check(s) and only the
 special-register access function will have to use
 full_vfp_access_check(s, ignore_vfp_enabled).
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20200430181003.21682-2-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++++++++++++++
+ target/arm/translate-vfp.inc.c | 6 ------
- target/arm/translate.c         | 101 +++++----------------------------
+file changed, 6 deletions(-)
 files changed, 113 insertions(+), 88 deletions(-)
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.inc.c
 +++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
- /* Include the generated VFP decoder */
+         return false;
  #include "decode-vfp.inc.c"
  #include "decode-vfp-uncond.inc.c"
 +
 +/*
 + * Check that VFP access is enabled. If it is, do the necessary
 + * M-profile lazy-FP handling and then return true.
 + * If not, emit code to generate an appropriate exception and
 + * return false.
 + * The ignore_vfp_enabled argument specifies that we should ignore
 + * whether VFP is enabled via FPEXC[EN]: this should be true for FMXR/FMRX
 + * accesses to FPSID, FPEXC, MVFR0, MVFR1, MVFR2, and false for all other insns.
 + */
 +static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
 +{
 +    if (s->fp_excp_el) {
 +        if (arm_dc_feature(s, ARM_FEATURE_M)) {
 +            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
 +                               s->fp_excp_el);
 +        } else {
 +            gen_exception_insn(s, 4, EXCP_UDEF,
 +                               syn_fp_access_trap(1, 0xe, false),
 +                               s->fp_excp_el);
 +        }
 +        return false;
 +    }
 +
 +    if (!s->vfp_enabled && !ignore_vfp_enabled) {
 +        assert(!arm_dc_feature(s, ARM_FEATURE_M));
 +        gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(),
 +                           default_exception_el(s));
 +        return false;
 +    }
 +
 +    if (arm_dc_feature(s, ARM_FEATURE_M)) {
 +        /* Handle M-profile lazy FP state mechanics */
 +
 +        /* Trigger lazy-state preservation if necessary */
 +        if (s->v7m_lspact) {
 +            /*
 +             * Lazy state saving affects external memory and also the NVIC,
 +             * so we must mark it as an IO operation for icount.
 +             */
 +            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 +                gen_io_start();
 +            }
 +            gen_helper_v7m_preserve_fp_state(cpu_env);
 +            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 +                gen_io_end();
 +            }
 +            /*
 +             * If the preserve_fp_state helper doesn't throw an exception
 +             * then it will clear LSPACT; we don't need to repeat this for
 +             * any further FP insns in this TB.
 +             */
 +            s->v7m_lspact = false;
 +        }
 +
 +        /* Update ownership of FP context: set FPCCR.S to match current state */
 +        if (s->v8m_fpccr_s_wrong) {
 +            TCGv_i32 tmp;
 +
 +            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
 +            if (s->v8m_secure) {
 +                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
 +            } else {
 +                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
 +            }
 +            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
 +            /* Don't need to do this for any further FP insns in this TB */
 +            s->v8m_fpccr_s_wrong = false;
 +        }
 +
 +        if (s->v7m_new_fp_ctxt_needed) {
 +            /*
 +             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
 +             * and the FPSCR.
 +             */
 +            TCGv_i32 control, fpscr;
 +            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
 +
 +            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
 +            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 +            tcg_temp_free_i32(fpscr);
 +            /*
 +             * We don't need to arrange to end the TB, because the only
 +             * parts of FPSCR which we cache in the TB flags are the VECLEN
 +             * and VECSTRIDE, and those don't exist for M-profile.
 +             */
 +
 +            if (s->v8m_secure) {
 +                bits |= R_V7M_CONTROL_SFPA_MASK;
 +            }
 +            control = load_cpu_field(v7m.control[M_REG_S]);
 +            tcg_gen_ori_i32(control, control, bits);
 +            store_cpu_field(control, v7m.control[M_REG_S]);
 +            /* Don't need to do this for any further FP insns in this TB */
 +            s->v7m_new_fp_ctxt_needed = false;
 +        }
 +    }
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
      return 1;
  }
 -/* Disassemble a VFP instruction.  Returns nonzero if an error occurred
 -   (ie. an undefined instruction).  */
 +/*
 + * Disassemble a VFP instruction.  Returns nonzero if an error occurred
 + * (ie. an undefined instruction).
 + */
  static int disas_vfp_insn(DisasContext *s, uint32_t insn)
  {
      uint32_t rd, rn, rm, op, i, n, offset, delta_d, delta_m, bank_mask;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
      TCGv_i32 addr;
      TCGv_i32 tmp;
      TCGv_i32 tmp2;
 +    bool ignore_vfp_enabled = false;
      if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
          return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          }
      }
--    /* FIXME: this access check should not take precedence over UNDEF
+-    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    /*
+-    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+     * FIXME: this access check should not take precedence over UNDEF
+-        ((a->vd | a->vn | a->vm) & 0x10)) {
-      * for invalid encodings; we will generate incorrect syndrome information
+-        return false;
       * for attempts to execute invalid vfp/neon encodings with FP disabled.
       */
 -    if (s->fp_excp_el) {
 -        if (arm_dc_feature(s, ARM_FEATURE_M)) {
 -            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
 -                               s->fp_excp_el);
 -        } else {
 -            gen_exception_insn(s, 4, EXCP_UDEF,
 -                               syn_fp_access_trap(1, 0xe, false),
 -                               s->fp_excp_el);
 -        }
 -        return 0;
 -    }
 -
--    if (!s->vfp_enabled) {
+     if (!vfp_access_check(s)) {
--        /* VFP disabled.  Only allow fmxr/fmrx to/from some control regs.  */
+         return true;
 -        if ((insn & 0x0fe00fff) != 0x0ee00a10)
 -            return 1;
 +    if ((insn & 0x0fe00fff) == 0x0ee00a10) {
          rn = (insn >> 16) & 0xf;
 -        if (rn != ARM_VFP_FPSID && rn != ARM_VFP_FPEXC && rn != ARM_VFP_MVFR2
 -            && rn != ARM_VFP_MVFR1 && rn != ARM_VFP_MVFR0) {
 -            return 1;
 +        if (rn == ARM_VFP_FPSID || rn == ARM_VFP_FPEXC || rn == ARM_VFP_MVFR2
 +            || rn == ARM_VFP_MVFR1 || rn == ARM_VFP_MVFR0) {
 +            ignore_vfp_enabled = true;
          }
      }
--
--    if (arm_dc_feature(s, ARM_FEATURE_M)) {
--        /* Handle M-profile lazy FP state mechanics */
--
--        /* Trigger lazy-state preservation if necessary */
--        if (s->v7m_lspact) {
--            /*
--             * Lazy state saving affects external memory and also the NVIC,
--             * so we must mark it as an IO operation for icount.
--             */
--            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
--                gen_io_start();
--            }
--            gen_helper_v7m_preserve_fp_state(cpu_env);
--            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
--                gen_io_end();
--            }
--            /*
--             * If the preserve_fp_state helper doesn't throw an exception
--             * then it will clear LSPACT; we don't need to repeat this for
--             * any further FP insns in this TB.
--             */
--            s->v7m_lspact = false;
--        }
--
--        /* Update ownership of FP context: set FPCCR.S to match current state */
--        if (s->v8m_fpccr_s_wrong) {
--            TCGv_i32 tmp;
--
--            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
--            if (s->v8m_secure) {
--                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
--            } else {
--                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
--            }
--            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
--            /* Don't need to do this for any further FP insns in this TB */
--            s->v8m_fpccr_s_wrong = false;
--        }
--
--        if (s->v7m_new_fp_ctxt_needed) {
--            /*
--             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
--             * and the FPSCR.
--             */
--            TCGv_i32 control, fpscr;
--            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
--
--            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
--            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
--            tcg_temp_free_i32(fpscr);
--            /*
--             * We don't need to arrange to end the TB, because the only
--             * parts of FPSCR which we cache in the TB flags are the VECLEN
--             * and VECSTRIDE, and those don't exist for M-profile.
--             */
--
--            if (s->v8m_secure) {
--                bits |= R_V7M_CONTROL_SFPA_MASK;
--            }
--            control = load_cpu_field(v7m.control[M_REG_S]);
--            tcg_gen_ori_i32(control, control, bits);
--            store_cpu_field(control, v7m.control[M_REG_S]);
--            /* Don't need to do this for any further FP insns in this TB */
--            s->v7m_new_fp_ctxt_needed = false;
--        }
-+    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
-+        return 0;
-     }
-     if (extract32(insn, 28, 4) == 0xf) {
 --
 .20.1

-[Qemu-devel] [PULL 13/48] target/arm: Convert VMINNM, VMAXNM to decodetree
+[PULL 21/39] target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
-Convert the VMINNM and VMAXNM instructions to decodetree.
+We were accidentally permitting decode of Thumb Neon insns even if
-As with VSEL, we leave the trans_VMINMAXNM() function
+the CPU didn't have the FEATURE_NEON bit set, because the feature
-in translate.c for the moment.
+check was being done before the call to disas_neon_data_insn() and
 disas_neon_ls_insn() in the Arm decoder but was omitted from the
 Thumb decoder.  Push the feature bit check down into the called
 functions so it is done for both Arm and Thumb encodings.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20200430181003.21682-3-peter.maydell@linaro.org
 ---
- target/arm/translate.c       | 41 ++++++++++++++++++++++++------------
+ target/arm/translate.c | 16 ++++++++--------
- target/arm/vfp-uncond.decode |  5 +++++
+file changed, 8 insertions(+), 8 deletions(-)
 files changed, 33 insertions(+), 13 deletions(-)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
-     return true;
+     TCGv_i32 tmp2;
- }
+     TCGv_i64 tmp64;
--static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
++    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
--                            uint32_t rm, uint32_t dp)
++        return 1;
 +static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
  {
 -    uint32_t vmin = extract32(insn, 6, 1);
 -    TCGv_ptr fpst = get_fpstatus_ptr(0);
 +    uint32_t rd, rn, rm;
 +    bool dp = a->dp;
 +    bool vmin = a->op;
 +    TCGv_ptr fpst;
 +
 +    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
 +        return false;
 +    }
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist */
+     /* FIXME: this access check should not take precedence over UNDEF
-+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+      * for invalid encodings; we will generate incorrect syndrome information
-+        ((a->vm | a->vn | a->vd) & 0x10)) {
+      * for attempts to execute invalid vfp/neon encodings with FP disabled.
-+        return false;
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-+    }
+     TCGv_ptr ptr1, ptr2, ptr3;
-+    rd = a->vd;
+     TCGv_i64 tmp64;
-+    rn = a->vn;
-+    rm = a->vm;
++    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+
++        return 1;
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    fpst = get_fpstatus_ptr(0);
+     /* FIXME: this access check should not take precedence over UNDEF
+      * for invalid encodings; we will generate incorrect syndrome information
-     if (dp) {
+      * for attempts to execute invalid vfp/neon encodings with FP disabled.
-         TCGv_i64 frn, frm, dest;
+@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
-@@ -XXX,XX +XXX,XX @@ static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
-     }
+         if (((insn >> 25) & 7) == 1) {
+             /* NEON Data processing.  */
-     tcg_temp_free_ptr(fpst);
+-            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
--    return 0;
+-                goto illegal_op;
-+    return true;
+-            }
- }
+-
+             if (disas_neon_data_insn(s, insn)) {
- static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
+                 goto illegal_op;
-@@ -XXX,XX +XXX,XX @@ static const uint8_t fp_decode_rm[] = {
+             }
+@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
- static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
+         }
- {
+         if ((insn & 0x0f100000) == 0x04000000) {
--    uint32_t rd, rn, rm, dp = extract32(insn, 8, 1);
+             /* NEON load/store.  */
-+    uint32_t rd, rm, dp = extract32(insn, 8, 1);
+-            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+-                goto illegal_op;
-     if (dp) {
+-            }
-         VFP_DREG_D(rd, insn);
+-
--        VFP_DREG_N(rn, insn);
+             if (disas_neon_ls_insn(s, insn)) {
-         VFP_DREG_M(rm, insn);
+                 goto illegal_op;
-     } else {
+             }
          rd = VFP_SREG_D(insn);
 -        rn = VFP_SREG_N(insn);
          rm = VFP_SREG_M(insn);
      }
 -    if ((insn & 0x0fb00e10) == 0x0e800a00 &&
 -        dc_isar_feature(aa32_vminmaxnm, s)) {
 -        return handle_vminmaxnm(insn, rd, rn, rm, dp);
 -    } else if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
 -               dc_isar_feature(aa32_vrint, s)) {
 +    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
 +        dc_isar_feature(aa32_vrint, s)) {
          /* VRINTA, VRINTN, VRINTP, VRINTM */
          int rounding = fp_decode_rm[extract32(insn, 16, 2)];
          return handle_vrint(insn, rd, rm, dp, rounding);
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp-uncond.decode
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
  VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 +
 +VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
 +            vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 +VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
 +            vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
 --
 .20.1

-[Qemu-devel] [PULL 08/48] target/arm: Add stubs for AArch32 VFP decodetree
+[PULL 22/39] target/arm: Add stubs for AArch32 Neon decodetree
 Add the infrastructure for building and invoking a decodetree decoder
-for the AArch32 VFP encodings.  At the moment the new decoder covers
+for the AArch32 Neon encodings.  At the moment the new decoder covers
 nothing, so we always fall back to the existing hand-written decode.
-We need to have one decoder for the unconditional insns and one for
+We follow the same pattern we did for the VFP decodetree conversion
-the conditional insns, as otherwise the patterns for conditional
+(commit 78e138bc1f672c145ef6ace74617d and following): code that deals
-insns would incorrectly match against the unconditional ones too.
+with Neon will be moving gradually out to translate-neon.vfp.inc,
+which we #include into translate.c.
-Since translate.c is over 14,000 lines long and we're going to be
-touching pretty much every line of the VFP code as part of the
+In order to share the decode files between A32 and T32, we
-decodetree conversion, we create a new translate-vfp.inc.c to hold
+split Neon into 3 parts:
-the code which deals with VFP in the new scheme.  It should be
+ * data-processing
-possible to convert this into a standalone translation unit
+ * load-store
-eventually, but the conversion process will be much simpler if we
+ * 'shared' encodings
-simply #include it midway through translate.c to start with.
 The first two groups of instructions have similar but not identical
 A32 and T32 encodings, so we need to manually transform the T32
 encoding into the A32 one before calling the decoder; the third group
 covers the Neon instructions which are identical in A32 and T32.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-4-peter.maydell@linaro.org
 ---
- target/arm/Makefile.objs       | 13 +++++++++++++
+ target/arm/neon-dp.decode       | 29 ++++++++++++++++++++++++++
- target/arm/translate-vfp.inc.c | 31 +++++++++++++++++++++++++++++++
+ target/arm/neon-ls.decode       | 29 ++++++++++++++++++++++++++
- target/arm/translate.c         | 19 +++++++++++++++++++
+ target/arm/neon-shared.decode   | 27 +++++++++++++++++++++++++
- target/arm/vfp-uncond.decode   | 28 ++++++++++++++++++++++++++++
+ target/arm/translate-neon.inc.c | 32 +++++++++++++++++++++++++++++
- target/arm/vfp.decode          | 28 ++++++++++++++++++++++++++++
+ target/arm/translate.c          | 36 +++++++++++++++++++++++++++++++--
-files changed, 119 insertions(+)
+ target/arm/Makefile.objs        | 18 +++++++++++++++++
- create mode 100644 target/arm/translate-vfp.inc.c
+files changed, 169 insertions(+), 2 deletions(-)
- create mode 100644 target/arm/vfp-uncond.decode
+ create mode 100644 target/arm/neon-dp.decode
- create mode 100644 target/arm/vfp.decode
+ create mode 100644 target/arm/neon-ls.decode
+ create mode 100644 target/arm/neon-shared.decode
  create mode 100644 target/arm/translate-neon.inc.c
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
 +# AArch32 Neon data-processing instruction descriptions
 +#
 +#  Copyright (c) 2020 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 +
 +# Encodings for Neon data processing instructions where the T32 encoding
 +# is a simple transformation of the A32 encoding.
 +# More specifically, this file covers instructions where the A32 encoding is
 +#   0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 +# and the T32 encoding is
 +#   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 +# This file works on the A32 encoding only; calling code for T32 has to
 +# transform the insn into the A32 version first.
 diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/neon-ls.decode
@@ -XXX,XX +XXX,XX @@
 +# AArch32 Neon load/store instruction descriptions
 +#
 +#  Copyright (c) 2020 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 +
 +# Encodings for Neon load/store instructions where the T32 encoding
 +# is a simple transformation of the A32 encoding.
 +# More specifically, this file covers instructions where the A32 encoding is
 +#   0b1111_0100_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
 +# and the T32 encoding is
 +#   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
 +# This file works on the A32 encoding only; calling code for T32 has to
 +# transform the insn into the A32 version first.
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@
 +# AArch32 Neon instruction descriptions
 +#
 +#  Copyright (c) 2020 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 +
 +# Encodings for Neon instructions whose encoding is the same for
 +# both A32 and T32.
 +
 +# More specifically, this covers:
 +# 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
 +# 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + *  ARM translation: AArch32 Neon instructions
 + *
 + *  Copyright (c) 2003 Fabrice Bellard
 + *  Copyright (c) 2005-2007 CodeSourcery
 + *  Copyright (c) 2007 OpenedHand, Ltd.
 + *  Copyright (c) 2020 Linaro, Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +/*
 + * This file is intended to be included from translate.c; it uses
 + * some macros and definitions provided by that file.
 + * It might be possible to convert it to a standalone .c file eventually.
 + */
 +
 +/* Include the generated Neon decoder */
 +#include "decode-neon-dp.inc.c"
 +#include "decode-neon-ls.inc.c"
 +#include "decode-neon-shared.inc.c"
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  #define ARM_CP_RW_BIT   (1 << 20)
 -/* Include the VFP decoder */
 +/* Include the VFP and Neon decoders */
  #include "translate-vfp.inc.c"
 +#include "translate-neon.inc.c"
  static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
  {
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
          /* Unconditional instructions.  */
          /* TODO: Perhaps merge these into one decodetree output file.  */
          if (disas_a32_uncond(s, insn) ||
 -            disas_vfp_uncond(s, insn)) {
 +            disas_vfp_uncond(s, insn) ||
 +            disas_neon_dp(s, insn) ||
 +            disas_neon_ls(s, insn) ||
 +            disas_neon_shared(s, insn)) {
              return;
          }
          /* fall back to legacy decoder */
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
          ARCH(6T2);
      }
 +    if ((insn & 0xef000000) == 0xef000000) {
 +        /*
 +         * T32 encodings 0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 +         * transform into
 +         * A32 encodings 0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 +         */
 +        uint32_t a32_insn = (insn & 0xe2ffffff) |
 +            ((insn & (1 << 28)) >> 4) | (1 << 28);
 +
 +        if (disas_neon_dp(s, a32_insn)) {
 +            return;
 +        }
 +    }
 +
 +    if ((insn & 0xff100000) == 0xf9000000) {
 +        /*
 +         * T32 encodings 0b1111_1001_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
 +         * transform into
 +         * A32 encodings 0b1111_0100_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
 +         */
 +        uint32_t a32_insn = (insn & 0x00ffffff) | 0xf4000000;
 +
 +        if (disas_neon_ls(s, a32_insn)) {
 +            return;
 +        }
 +    }
 +
      /*
       * TODO: Perhaps merge these into one decodetree output file.
       * Note disas_vfp is written for a32 with cond field in the
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
       */
      if (disas_t32(s, insn) ||
          disas_vfp_uncond(s, insn) ||
 +        disas_neon_shared(s, insn) ||
          ((insn >> 28) == 0xe && disas_vfp(s, insn))) {
          return;
      }
 diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/Makefile.objs
 +++ b/target/arm/Makefile.objs
 @@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
        $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
        "GEN", $(TARGET_DIR)$@)
-+target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
++target/arm/decode-neon-shared.inc.c: $(SRC_PATH)/target/arm/neon-shared.decode $(DECODETREE)
 +    $(call quiet-command,\
-+      $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
++      $(PYTHON) $(DECODETREE) --static-decode disas_neon_shared -o $@ $<,\
 +      "GEN", $(TARGET_DIR)$@)
 +
-+target/arm/decode-vfp-uncond.inc.c: $(SRC_PATH)/target/arm/vfp-uncond.decode $(DECODETREE)
++target/arm/decode-neon-dp.inc.c: $(SRC_PATH)/target/arm/neon-dp.decode $(DECODETREE)
 +    $(call quiet-command,\
-+      $(PYTHON) $(DECODETREE) --static-decode disas_vfp_uncond -o $@ $<,\
++      $(PYTHON) $(DECODETREE) --static-decode disas_neon_dp -o $@ $<,\
 +      "GEN", $(TARGET_DIR)$@)
 +
++target/arm/decode-neon-ls.inc.c: $(SRC_PATH)/target/arm/neon-ls.decode $(DECODETREE)
++    $(call quiet-command,\
++      $(PYTHON) $(DECODETREE) --static-decode disas_neon_ls -o $@ $<,\
++      "GEN", $(TARGET_DIR)$@)
++
+ target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
+     $(call quiet-command,\
+       $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
+@@ -XXX,XX +XXX,XX @@ target/arm/decode-t16.inc.c: $(SRC_PATH)/target/arm/t16.decode $(DECODETREE)
+       "GEN", $(TARGET_DIR)$@)
  target/arm/translate-sve.o: target/arm/decode-sve.inc.c
-+target/arm/translate.o: target/arm/decode-vfp.inc.c
++target/arm/translate.o: target/arm/decode-neon-shared.inc.c
-+target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
++target/arm/translate.o: target/arm/decode-neon-dp.inc.c
-+
++target/arm/translate.o: target/arm/decode-neon-ls.inc.c
- obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
+ target/arm/translate.o: target/arm/decode-vfp.inc.c
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+ target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
-new file mode 100644
+ target/arm/translate.o: target/arm/decode-a32.inc.c
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + *  ARM translation: AArch32 VFP instructions
 + *
 + *  Copyright (c) 2003 Fabrice Bellard
 + *  Copyright (c) 2005-2007 CodeSourcery
 + *  Copyright (c) 2007 OpenedHand, Ltd.
 + *  Copyright (c) 2019 Linaro, Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +/*
 + * This file is intended to be included from translate.c; it uses
 + * some macros and definitions provided by that file.
 + * It might be possible to convert it to a standalone .c file eventually.
 + */
 +
 +/* Include the generated VFP decoder */
 +#include "decode-vfp.inc.c"
 +#include "decode-vfp-uncond.inc.c"
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_mov_vreg_F0(int dp, int reg)
  #define ARM_CP_RW_BIT   (1 << 20)
 +/* Include the VFP decoder */
 +#include "translate-vfp.inc.c"
 +
  static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
  {
      tcg_gen_ld_i64(var, cpu_env, offsetof(CPUARMState, iwmmxt.regs[reg]));
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          return 1;
      }
 +    /*
 +     * If the decodetree decoder handles this insn it will always
 +     * emit code to either execute the insn or generate an appropriate
 +     * exception; so we don't need to ever return non-zero to tell
 +     * the calling code to emit an UNDEF exception.
 +     */
 +    if (extract32(insn, 28, 4) == 0xf) {
 +        if (disas_vfp_uncond(s, insn)) {
 +            return 0;
 +        }
 +    } else {
 +        if (disas_vfp(s, insn)) {
 +            return 0;
 +        }
 +    }
 +
      /* FIXME: this access check should not take precedence over UNDEF
       * for invalid encodings; we will generate incorrect syndrome information
       * for attempts to execute invalid vfp/neon encodings with FP disabled.
 diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@
 +# AArch32 VFP instruction descriptions (unconditional insns)
 +#
 +#  Copyright (c) 2019 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 +# Encodings for the unconditional VFP instructions are here:
 +# generally anything matching A32
 +#  1111 1110 .... .... .... 101. ...0 ....
 +# and T32
 +#  1111 110. .... .... .... 101. .... ....
 +#  1111 1110 .... .... .... 101. .... ....
 +# (but those patterns might also cover some Neon instructions,
 +# which do not live in this file.)
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@
 +# AArch32 VFP instruction descriptions (conditional insns)
 +#
 +#  Copyright (c) 2019 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 +# Encodings for the conditional VFP instructions are here:
 +# generally anything matching A32
 +#  cccc 11.. .... .... .... 101. .... ....
 +# and T32
 +#  1110 110. .... .... .... 101. .... ....
 +#  1110 1110 .... .... .... 101. .... ....
 +# (but those patterns might also cover some Neon instructions,
 +# which do not live in this file.)
 --
 .20.1

-[Qemu-devel] [PULL 43/48] target/arm: Convert double-single precision conversion insns to decodetree
+[PULL 23/39] target/arm: Convert VCMLA (vector) to decodetree
-Convert the VCVT double/single precision conversion insns to decodetree.
+Convert the VCMLA (vector) insns in the 3same extension group to
 decodetree.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-5-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 48 ++++++++++++++++++++++++++++++++++
+ target/arm/neon-shared.decode   | 11 ++++++++++
- target/arm/translate.c         | 13 +--------
+ target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
- target/arm/vfp.decode          |  6 +++++
+ target/arm/translate.c          | 11 +---------
-files changed, 55 insertions(+), 12 deletions(-)
+files changed, 49 insertions(+), 10 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-shared.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
+@@ -XXX,XX +XXX,XX @@
-     tcg_temp_free_i64(tmp);
+ # More specifically, this covers:
-     return true;
+ # 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
- }
+ # 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
 +
-+static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
++# VFP/Neon register fields; same as vfp.decode
 +%vm_dp  5:1 0:4
 +%vm_sp  0:4 5:1
 +%vn_dp  7:1 16:4
 +%vn_sp  16:4 7:1
 +%vd_dp  22:1 12:4
 +%vd_sp  12:4 22:1
 +
 +VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
  #include "decode-neon-dp.inc.c"
  #include "decode-neon-ls.inc.c"
  #include "decode-neon-shared.inc.c"
 +
 +static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
 +{
-+    TCGv_i64 vd;
++    int opr_sz;
-+    TCGv_i32 vm;
++    TCGv_ptr fpst;
 +    gen_helper_gvec_3_ptr *fn_gvec_ptr;
 +
 +    if (!dc_isar_feature(aa32_vcma, s)
 +        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if ((a->vn | a->vm | a->vd) & a->q) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    vm = tcg_temp_new_i32();
++    opr_sz = (1 + a->q) * 8;
-+    vd = tcg_temp_new_i64();
++    fpst = get_fpstatus_ptr(1);
-+    neon_load_reg32(vm, a->vm);
++    fn_gvec_ptr = a->size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
-+    gen_helper_vfp_fcvtds(vd, vm, cpu_env);
++    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
-+    neon_store_reg64(vd, a->vd);
++                       vfp_reg_offset(1, a->vn),
-+    tcg_temp_free_i32(vm);
++                       vfp_reg_offset(1, a->vm),
-+    tcg_temp_free_i64(vd);
++                       fpst, opr_sz, opr_sz, a->rot,
-+    return true;
++                       fn_gvec_ptr);
-+}
++    tcg_temp_free_ptr(fpst);
 +
 +static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
 +{
 +    TCGv_i64 vm;
 +    TCGv_i32 vd;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vd = tcg_temp_new_i32();
 +    vm = tcg_temp_new_i64();
 +    neon_load_reg64(vm, a->vm);
 +    gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
 +    neon_store_reg32(vd, a->vd);
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i64(vm);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
-                 return 1;
+     bool is_long = false, q = extract32(insn, 6, 1);
-             case 15:
+     bool ptr_is_env = false;
-                 switch (rn) {
--                case 0 ... 14:
+-    if ((insn & 0xfe200f10) == 0xfc200800) {
-+                case 0 ... 15:
+-        /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */
-                     /* Already handled by decodetree */
+-        int size = extract32(insn, 20, 1);
-                     return 1;
+-        data = extract32(insn, 23, 2); /* rot */
-                 default:
+-        if (!dc_isar_feature(aa32_vcma, s)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
-             if (op == 15) {
+-            return 1;
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
+-        }
-                 switch (rn) {
+-        fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
--                case 0x0f: /* vcvt double<->single */
+-    } else if ((insn & 0xfea00f10) == 0xfc800800) {
--                    rd_is_dp = !dp;
++    if ((insn & 0xfea00f10) == 0xfc800800) {
--                    break;
+         /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
--
+         int size = extract32(insn, 20, 1);
-                 case 0x10: /* vcvt.fxx.u32 */
+         data = extract32(insn, 24, 1); /* rot */
                  case 0x11: /* vcvt.fxx.s32 */
                      rm_is_dp = false;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 15: /* single<->double conversion */
 -                        if (dp) {
 -                            gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
 -                        } else {
 -                            gen_helper_vfp_fcvtds(cpu_F0d, cpu_F0s, cpu_env);
 -                        }
 -                        break;
                      case 16: /* fuito */
                          gen_vfp_uito(dp, 0);
                          break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VRINTX_sp    ---- 1110 1.11 0111 .... 1010 01.0 .... \
               vd=%vd_sp vm=%vm_sp
  VRINTX_dp    ---- 1110 1.11 0111 .... 1011 01.0 .... \
               vd=%vd_dp vm=%vm_dp
 +
 +# VCVT between single and double: Vm precision depends on size; Vd is its reverse
 +VCVT_sp      ---- 1110 1.11 0111 .... 1010 11.0 .... \
 +             vd=%vd_dp vm=%vm_sp
 +VCVT_dp      ---- 1110 1.11 0111 .... 1011 11.0 .... \
 +             vd=%vd_sp vm=%vm_dp
 --
 .20.1

-[Qemu-devel] [PULL 21/48] target/arm: Convert VFP VLDR and VSTR to decodetree
+[PULL 24/39] target/arm: Convert VCADD (vector) to decodetree
-Convert the VFP single load/store insns VLDR and VSTR to decodetree.
+Convert the VCADD (vector) insns to decodetree.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-6-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 73 ++++++++++++++++++++++++++++++++++
+ target/arm/neon-shared.decode   |  3 +++
- target/arm/translate.c         | 22 +---------
+ target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
- target/arm/vfp.decode          |  7 ++++
+ target/arm/translate.c          | 11 +---------
-files changed, 82 insertions(+), 20 deletions(-)
+files changed, 41 insertions(+), 10 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-shared.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
+@@ -XXX,XX +XXX,XX @@
  VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
      tcg_temp_free_ptr(fpst);
      return true;
  }
 +
-+static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
++static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
 +{
-+    uint32_t offset;
++    int opr_sz;
-+    TCGv_i32 addr;
++    TCGv_ptr fpst;
 +    gen_helper_gvec_3_ptr *fn_gvec_ptr;
 +
-+    if (!vfp_access_check(s)) {
++    if (!dc_isar_feature(aa32_vcma, s)
-+        return true;
++        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
 +        return false;
 +    }
 +
-+    offset = a->imm << 2;
++    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!a->u) {
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        offset = -offset;
++        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
-+    if (s->thumb && a->rn == 15) {
++    if ((a->vn | a->vm | a->vd) & a->q) {
 +        /* This is actually UNPREDICTABLE */
 +        addr = tcg_temp_new_i32();
 +        tcg_gen_movi_i32(addr, s->pc & ~2);
 +    } else {
 +        addr = load_reg(s, a->rn);
 +    }
 +    tcg_gen_addi_i32(addr, addr, offset);
 +    if (a->l) {
 +        gen_vfp_ld(s, false, addr);
 +        gen_mov_vreg_F0(false, a->vd);
 +    } else {
 +        gen_mov_F0_vreg(false, a->vd);
 +        gen_vfp_st(s, false, addr);
 +    }
 +    tcg_temp_free_i32(addr);
 +
 +    return true;
 +}
 +
 +static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 +{
 +    uint32_t offset;
 +    TCGv_i32 addr;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    offset = a->imm << 2;
++    opr_sz = (1 + a->q) * 8;
-+    if (!a->u) {
++    fpst = get_fpstatus_ptr(1);
-+        offset = -offset;
++    fn_gvec_ptr = a->size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
-+    }
++    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
-+
++                       vfp_reg_offset(1, a->vn),
-+    if (s->thumb && a->rn == 15) {
++                       vfp_reg_offset(1, a->vm),
-+        /* This is actually UNPREDICTABLE */
++                       fpst, opr_sz, opr_sz, a->rot,
-+        addr = tcg_temp_new_i32();
++                       fn_gvec_ptr);
-+        tcg_gen_movi_i32(addr, s->pc & ~2);
++    tcg_temp_free_ptr(fpst);
 +    } else {
 +        addr = load_reg(s, a->rn);
 +    }
 +    tcg_gen_addi_i32(addr, addr, offset);
 +    if (a->l) {
 +        gen_vfp_ld(s, true, addr);
 +        gen_mov_vreg_F0(true, a->vd);
 +    } else {
 +        gen_mov_F0_vreg(true, a->vd);
 +        gen_vfp_st(s, true, addr);
 +    }
 +    tcg_temp_free_i32(addr);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
-             else
+     bool is_long = false, q = extract32(insn, 6, 1);
-                 rd = VFP_SREG_D(insn);
+     bool ptr_is_env = false;
-             if ((insn & 0x01200000) == 0x01000000) {
--                /* Single load/store */
+-    if ((insn & 0xfea00f10) == 0xfc800800) {
--                offset = (insn & 0xff) << 2;
+-        /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
--                if ((insn & (1 << 23)) == 0)
+-        int size = extract32(insn, 20, 1);
--                    offset = -offset;
+-        data = extract32(insn, 24, 1); /* rot */
--                if (s->thumb && rn == 15) {
+-        if (!dc_isar_feature(aa32_vcma, s)
--                    /* This is actually UNPREDICTABLE */
+-            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
--                    addr = tcg_temp_new_i32();
+-            return 1;
--                    tcg_gen_movi_i32(addr, s->pc & ~2);
+-        }
--                } else {
+-        fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
--                    addr = load_reg(s, rn);
+-    } else if ((insn & 0xfeb00f00) == 0xfc200d00) {
--                }
++    if ((insn & 0xfeb00f00) == 0xfc200d00) {
--                tcg_gen_addi_i32(addr, addr, offset);
+         /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
--                if (insn & (1 << 20)) {
+         bool u = extract32(insn, 4, 1);
--                    gen_vfp_ld(s, dp, addr);
+         if (!dc_isar_feature(aa32_dp, s)) {
 -                    gen_mov_vreg_F0(dp, rd);
 -                } else {
 -                    gen_mov_F0_vreg(dp, rd);
 -                    gen_vfp_st(s, dp, addr);
 -                }
 -                tcg_temp_free_i32(addr);
 +                /* Already handled by decodetree */
 +                return 1;
              } else {
                  /* load/store multiple */
                  int w = insn & (1 << 21);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_64_sp   ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 .... \
               vm=%vm_sp
  VMOV_64_dp   ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 .... \
               vm=%vm_dp
 +
 +# Note that the half-precision variants of VLDR and VSTR are
 +# not part of this decodetree at all because they have bits [9:8] == 0b01
 +VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8 \
 +             vd=%vd_sp
 +VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8 \
 +             vd=%vd_dp
 --
 .20.1

-[Qemu-devel] [PULL 45/48] target/arm: Convert VJCVT to decodetree
+[PULL 25/39] target/arm: Convert V[US]DOT (vector) to decodetree
-Convert the VJCVT instruction to decodetree.
+Convert the V[US]DOT (vector) insns to decodetree.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-7-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 28 ++++++++++++++++++++++++++++
+ target/arm/neon-shared.decode   |  4 ++++
- target/arm/translate.c         | 12 +-----------
+ target/arm/translate-neon.inc.c | 32 ++++++++++++++++++++++++++++++++
- target/arm/vfp.decode          |  4 ++++
+ target/arm/translate.c          |  9 +--------
-files changed, 33 insertions(+), 11 deletions(-)
+files changed, 37 insertions(+), 8 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-shared.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
+@@ -XXX,XX +XXX,XX @@ VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
  VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +# VUDOT and VSDOT
 +VDOT           1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
      tcg_temp_free_ptr(fpst);
      return true;
  }
 +
-+static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
++static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
 +{
-+    TCGv_i32 vd;
++    int opr_sz;
-+    TCGv_i64 vm;
++    gen_helper_gvec_3 *fn_gvec;
 +
-+    if (!dc_isar_feature(aa32_jscvt, s)) {
++    if (!dc_isar_feature(aa32_dp, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if ((a->vn | a->vm | a->vd) & a->q) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    vm = tcg_temp_new_i64();
++    opr_sz = (1 + a->q) * 8;
-+    vd = tcg_temp_new_i32();
++    fn_gvec = a->u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
-+    neon_load_reg64(vm, a->vm);
++    tcg_gen_gvec_3_ool(vfp_reg_offset(1, a->vd),
-+    gen_helper_vjcvt(vd, vm, cpu_env);
++                       vfp_reg_offset(1, a->vn),
-+    neon_store_reg32(vd, a->vd);
++                       vfp_reg_offset(1, a->vm),
-+    tcg_temp_free_i64(vm);
++                       opr_sz, opr_sz, 0, fn_gvec);
 +    tcg_temp_free_i32(vd);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
-                 return 1;
+     bool is_long = false, q = extract32(insn, 6, 1);
-             case 15:
+     bool ptr_is_env = false;
-                 switch (rn) {
--                case 0 ... 17:
+-    if ((insn & 0xfeb00f00) == 0xfc200d00) {
-+                case 0 ... 19:
+-        /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
-                     /* Already handled by decodetree */
+-        bool u = extract32(insn, 4, 1);
-                     return 1;
+-        if (!dc_isar_feature(aa32_dp, s)) {
-                 default:
+-            return 1;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-        }
-                     rm_is_dp = false;
+-        fn_gvec = u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
-                     break;
+-    } else if ((insn & 0xff300f10) == 0xfc200810) {
++    if ((insn & 0xff300f10) == 0xfc200810) {
--                case 0x13: /* vjcvt */
+         /* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
--                    if (!dp || !dc_isar_feature(aa32_jscvt, s)) {
+         int is_s = extract32(insn, 23, 1);
--                        return 1;
+         if (!dc_isar_feature(aa32_fhm, s)) {
 -                    }
 -                    rd_is_dp = false;
 -                    break;
 -
                  default:
                      return 1;
                  }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 19: /* vjcvt */
 -                        gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
 -                        break;
                      case 20: /* fshto */
                          gen_vfp_shto(dp, 16 - rm, 0);
                          break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_int_sp  ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
               vd=%vd_dp vm=%vm_sp
 +
 +# VJCVT is always dp to sp
 +VJCVT        ---- 1110 1.11 1001 .... 1011 11.0 .... \
 +             vd=%vd_sp vm=%vm_dp
 --
 .20.1

-[Qemu-devel] [PULL 16/48] target/arm: Move the VFP trans_* functions to translate-vfp.inc.c
+[PULL 26/39] target/arm: Convert VFM[AS]L (vector) to decodetree
-Move the trans_*() functions we've just created from translate.c
+Convert the VFM[AS]L (vector) insns to decodetree.  This is the last
-to translate-vfp.inc.c. This is pure code motion with no textual
+insn in the legacy decoder for the 3same_ext group, so we can
-changes (this can be checked with 'git show --color-moved').
+delete the legacy decoder function for the group entirely.
 Note that in disas_thumb2_insn() the parts of this encoding space
 where the decodetree decoder returns false will correctly be directed
 to illegal_op by the "(insn & (1 << 28))" check so they won't fall
 into disas_coproc_insn() by mistake.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-8-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 337 +++++++++++++++++++++++++++++++++
+ target/arm/neon-shared.decode   |  6 +++
- target/arm/translate.c         | 337 ---------------------------------
+ target/arm/translate-neon.inc.c | 31 +++++++++++
-files changed, 337 insertions(+), 337 deletions(-)
+ target/arm/translate.c          | 92 +--------------------------------
 files changed, 38 insertions(+), 91 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-shared.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check(DisasContext *s)
+@@ -XXX,XX +XXX,XX @@ VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
- {
+ # VUDOT and VSDOT
-     return full_vfp_access_check(s, false);
+ VDOT           1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +# VFM[AS]L
 +VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
 +               vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
 +VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
                         opr_sz, opr_sz, 0, fn_gvec);
      return true;
  }
 +
-+static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
++static bool trans_VFML(DisasContext *s, arg_VFML *a)
 +{
-+    uint32_t rd, rn, rm;
++    int opr_sz;
 +    bool dp = a->dp;
 +
-+    if (!dc_isar_feature(aa32_vsel, s)) {
++    if (!dc_isar_feature(aa32_fhm, s)) {
 +        return false;
 +    }
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist */
++    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vm | a->vn | a->vd) & 0x10)) {
++        (a->vd & 0x10)) {
 +        return false;
 +    }
-+    rd = a->vd;
++
-+    rn = a->vn;
++    if (a->vd & a->q) {
-+    rm = a->vm;
++        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    if (dp) {
++    opr_sz = (1 + a->q) * 8;
-+        TCGv_i64 frn, frm, dest;
++    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
-+        TCGv_i64 tmp, zero, zf, nf, vf;
++                       vfp_reg_offset(a->q, a->vn),
-+
++                       vfp_reg_offset(a->q, a->vm),
-+        zero = tcg_const_i64(0);
++                       cpu_env, opr_sz, opr_sz, a->s, /* is_2 == 0 */
-+
++                       gen_helper_gvec_fmlal_a32);
 +        frn = tcg_temp_new_i64();
 +        frm = tcg_temp_new_i64();
 +        dest = tcg_temp_new_i64();
 +
 +        zf = tcg_temp_new_i64();
 +        nf = tcg_temp_new_i64();
 +        vf = tcg_temp_new_i64();
 +
 +        tcg_gen_extu_i32_i64(zf, cpu_ZF);
 +        tcg_gen_ext_i32_i64(nf, cpu_NF);
 +        tcg_gen_ext_i32_i64(vf, cpu_VF);
 +
 +        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        switch (a->cc) {
 +        case 0: /* eq: Z */
 +            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
 +                                frn, frm);
 +            break;
 +        case 1: /* vs: V */
 +            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
 +                                frn, frm);
 +            break;
 +        case 2: /* ge: N == V -> N ^ V == 0 */
 +            tmp = tcg_temp_new_i64();
 +            tcg_gen_xor_i64(tmp, vf, nf);
 +            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 +                                frn, frm);
 +            tcg_temp_free_i64(tmp);
 +            break;
 +        case 3: /* gt: !Z && N == V */
 +            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
 +                                frn, frm);
 +            tmp = tcg_temp_new_i64();
 +            tcg_gen_xor_i64(tmp, vf, nf);
 +            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 +                                dest, frm);
 +            tcg_temp_free_i64(tmp);
 +            break;
 +        }
 +        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i64(frn);
 +        tcg_temp_free_i64(frm);
 +        tcg_temp_free_i64(dest);
 +
 +        tcg_temp_free_i64(zf);
 +        tcg_temp_free_i64(nf);
 +        tcg_temp_free_i64(vf);
 +
 +        tcg_temp_free_i64(zero);
 +    } else {
 +        TCGv_i32 frn, frm, dest;
 +        TCGv_i32 tmp, zero;
 +
 +        zero = tcg_const_i32(0);
 +
 +        frn = tcg_temp_new_i32();
 +        frm = tcg_temp_new_i32();
 +        dest = tcg_temp_new_i32();
 +        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        switch (a->cc) {
 +        case 0: /* eq: Z */
 +            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
 +                                frn, frm);
 +            break;
 +        case 1: /* vs: V */
 +            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
 +                                frn, frm);
 +            break;
 +        case 2: /* ge: N == V -> N ^ V == 0 */
 +            tmp = tcg_temp_new_i32();
 +            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 +            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 +                                frn, frm);
 +            tcg_temp_free_i32(tmp);
 +            break;
 +        case 3: /* gt: !Z && N == V */
 +            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
 +                                frn, frm);
 +            tmp = tcg_temp_new_i32();
 +            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 +            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 +                                dest, frm);
 +            tcg_temp_free_i32(tmp);
 +            break;
 +        }
 +        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i32(frn);
 +        tcg_temp_free_i32(frm);
 +        tcg_temp_free_i32(dest);
 +
 +        tcg_temp_free_i32(zero);
 +    }
 +
 +    return true;
 +}
 +
 +static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
 +{
 +    uint32_t rd, rn, rm;
 +    bool dp = a->dp;
 +    bool vmin = a->op;
 +    TCGv_ptr fpst;
 +
 +    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +        ((a->vm | a->vn | a->vd) & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rn = a->vn;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
 +
 +    if (dp) {
 +        TCGv_i64 frn, frm, dest;
 +
 +        frn = tcg_temp_new_i64();
 +        frm = tcg_temp_new_i64();
 +        dest = tcg_temp_new_i64();
 +
 +        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        if (vmin) {
 +            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
 +        } else {
 +            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
 +        }
 +        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i64(frn);
 +        tcg_temp_free_i64(frm);
 +        tcg_temp_free_i64(dest);
 +    } else {
 +        TCGv_i32 frn, frm, dest;
 +
 +        frn = tcg_temp_new_i32();
 +        frm = tcg_temp_new_i32();
 +        dest = tcg_temp_new_i32();
 +
 +        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 +        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 +        if (vmin) {
 +            gen_helper_vfp_minnums(dest, frn, frm, fpst);
 +        } else {
 +            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
 +        }
 +        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i32(frn);
 +        tcg_temp_free_i32(frm);
 +        tcg_temp_free_i32(dest);
 +    }
 +
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +/*
 + * Table for converting the most common AArch32 encoding of
 + * rounding mode to arm_fprounding order (which matches the
 + * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
 + */
 +static const uint8_t fp_decode_rm[] = {
 +    FPROUNDING_TIEAWAY,
 +    FPROUNDING_TIEEVEN,
 +    FPROUNDING_POSINF,
 +    FPROUNDING_NEGINF,
 +};
 +
 +static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 +{
 +    uint32_t rd, rm;
 +    bool dp = a->dp;
 +    TCGv_ptr fpst;
 +    TCGv_i32 tcg_rmode;
 +    int rounding = fp_decode_rm[a->rm];
 +
 +    if (!dc_isar_feature(aa32_vrint, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +        ((a->vm | a->vd) & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
 +
 +    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +
 +    if (dp) {
 +        TCGv_i64 tcg_op;
 +        TCGv_i64 tcg_res;
 +        tcg_op = tcg_temp_new_i64();
 +        tcg_res = tcg_temp_new_i64();
 +        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 +        gen_helper_rintd(tcg_res, tcg_op, fpst);
 +        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i64(tcg_op);
 +        tcg_temp_free_i64(tcg_res);
 +    } else {
 +        TCGv_i32 tcg_op;
 +        TCGv_i32 tcg_res;
 +        tcg_op = tcg_temp_new_i32();
 +        tcg_res = tcg_temp_new_i32();
 +        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 +        gen_helper_rints(tcg_res, tcg_op, fpst);
 +        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 +        tcg_temp_free_i32(tcg_op);
 +        tcg_temp_free_i32(tcg_res);
 +    }
 +
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    tcg_temp_free_i32(tcg_rmode);
 +
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 +{
 +    uint32_t rd, rm;
 +    bool dp = a->dp;
 +    TCGv_ptr fpst;
 +    TCGv_i32 tcg_rmode, tcg_shift;
 +    int rounding = fp_decode_rm[a->rm];
 +    bool is_signed = a->op;
 +
 +    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +        return false;
 +    }
 +    rd = a->vd;
 +    rm = a->vm;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(0);
 +
 +    tcg_shift = tcg_const_i32(0);
 +
 +    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +
 +    if (dp) {
 +        TCGv_i64 tcg_double, tcg_res;
 +        TCGv_i32 tcg_tmp;
 +        tcg_double = tcg_temp_new_i64();
 +        tcg_res = tcg_temp_new_i64();
 +        tcg_tmp = tcg_temp_new_i32();
 +        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
 +        if (is_signed) {
 +            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
 +        } else {
 +            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
 +        }
 +        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
 +        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
 +        tcg_temp_free_i32(tcg_tmp);
 +        tcg_temp_free_i64(tcg_res);
 +        tcg_temp_free_i64(tcg_double);
 +    } else {
 +        TCGv_i32 tcg_single, tcg_res;
 +        tcg_single = tcg_temp_new_i32();
 +        tcg_res = tcg_temp_new_i32();
 +        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
 +        if (is_signed) {
 +            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
 +        } else {
 +            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
 +        }
 +        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
 +        tcg_temp_free_i32(tcg_res);
 +        tcg_temp_free_i32(tcg_single);
 +    }
 +
 +    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 +    tcg_temp_free_i32(tcg_rmode);
 +
 +    tcg_temp_free_i32(tcg_shift);
 +
 +    tcg_temp_free_ptr(fpst);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-     tcg_temp_free_i32(tmp);
+     return 0;
  }
--static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+-/* Advanced SIMD three registers of the same length extension.
 - *  31           25    23  22    20   16   12  11   10   9    8        3     0
 - * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
 - * | 1 1 1 1 1 1 0 | op1 | D | op2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
 - * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
 - */
 -static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
 -{
--    uint32_t rd, rn, rm;
+-    gen_helper_gvec_3 *fn_gvec = NULL;
--    bool dp = a->dp;
+-    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
 -    int rd, rn, rm, opr_sz;
 -    int data = 0;
 -    int off_rn, off_rm;
 -    bool is_long = false, q = extract32(insn, 6, 1);
 -    bool ptr_is_env = false;
 -
--    if (!dc_isar_feature(aa32_vsel, s)) {
+-    if ((insn & 0xff300f10) == 0xfc200810) {
--        return false;
+-        /* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
 -        int is_s = extract32(insn, 23, 1);
 -        if (!dc_isar_feature(aa32_fhm, s)) {
 -            return 1;
 -        }
 -        is_long = true;
 -        data = is_s; /* is_2 == 0 */
 -        fn_gvec_ptr = gen_helper_gvec_fmlal_a32;
 -        ptr_is_env = true;
 -    } else {
 -        return 1;
 -    }
 -
--    /* UNDEF accesses to D16-D31 if they don't exist */
+-    VFP_DREG_D(rd, insn);
--    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+-    if (rd & q) {
--        ((a->vm | a->vn | a->vd) & 0x10)) {
+-        return 1;
 -        return false;
 -    }
--    rd = a->vd;
+-    if (q || !is_long) {
--    rn = a->vn;
+-        VFP_DREG_N(rn, insn);
--    rm = a->vm;
+-        VFP_DREG_M(rm, insn);
--
+-        if ((rn | rm) & q & !is_long) {
--    if (!vfp_access_check(s)) {
+-            return 1;
--        return true;
+-        }
 -        off_rn = vfp_reg_offset(1, rn);
 -        off_rm = vfp_reg_offset(1, rm);
 -    } else {
 -        rn = VFP_SREG_N(insn);
 -        rm = VFP_SREG_M(insn);
 -        off_rn = vfp_reg_offset(0, rn);
 -        off_rm = vfp_reg_offset(0, rm);
 -    }
 -
--    if (dp) {
+-    if (s->fp_excp_el) {
--        TCGv_i64 frn, frm, dest;
+-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
--        TCGv_i64 tmp, zero, zf, nf, vf;
+-                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
--
+-        return 0;
--        zero = tcg_const_i64(0);
+-    }
--
+-    if (!s->vfp_enabled) {
--        frn = tcg_temp_new_i64();
+-        return 1;
 -        frm = tcg_temp_new_i64();
 -        dest = tcg_temp_new_i64();
 -
 -        zf = tcg_temp_new_i64();
 -        nf = tcg_temp_new_i64();
 -        vf = tcg_temp_new_i64();
 -
 -        tcg_gen_extu_i32_i64(zf, cpu_ZF);
 -        tcg_gen_ext_i32_i64(nf, cpu_NF);
 -        tcg_gen_ext_i32_i64(vf, cpu_VF);
 -
 -        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        switch (a->cc) {
 -        case 0: /* eq: Z */
 -            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
 -                                frn, frm);
 -            break;
 -        case 1: /* vs: V */
 -            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
 -                                frn, frm);
 -            break;
 -        case 2: /* ge: N == V -> N ^ V == 0 */
 -            tmp = tcg_temp_new_i64();
 -            tcg_gen_xor_i64(tmp, vf, nf);
 -            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 -                                frn, frm);
 -            tcg_temp_free_i64(tmp);
 -            break;
 -        case 3: /* gt: !Z && N == V */
 -            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
 -                                frn, frm);
 -            tmp = tcg_temp_new_i64();
 -            tcg_gen_xor_i64(tmp, vf, nf);
 -            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
 -                                dest, frm);
 -            tcg_temp_free_i64(tmp);
 -            break;
 -        }
 -        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i64(frn);
 -        tcg_temp_free_i64(frm);
 -        tcg_temp_free_i64(dest);
 -
 -        tcg_temp_free_i64(zf);
 -        tcg_temp_free_i64(nf);
 -        tcg_temp_free_i64(vf);
 -
 -        tcg_temp_free_i64(zero);
 -    } else {
 -        TCGv_i32 frn, frm, dest;
 -        TCGv_i32 tmp, zero;
 -
 -        zero = tcg_const_i32(0);
 -
 -        frn = tcg_temp_new_i32();
 -        frm = tcg_temp_new_i32();
 -        dest = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        switch (a->cc) {
 -        case 0: /* eq: Z */
 -            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
 -                                frn, frm);
 -            break;
 -        case 1: /* vs: V */
 -            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
 -                                frn, frm);
 -            break;
 -        case 2: /* ge: N == V -> N ^ V == 0 */
 -            tmp = tcg_temp_new_i32();
 -            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 -            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 -                                frn, frm);
 -            tcg_temp_free_i32(tmp);
 -            break;
 -        case 3: /* gt: !Z && N == V */
 -            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
 -                                frn, frm);
 -            tmp = tcg_temp_new_i32();
 -            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
 -            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
 -                                dest, frm);
 -            tcg_temp_free_i32(tmp);
 -            break;
 -        }
 -        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i32(frn);
 -        tcg_temp_free_i32(frm);
 -        tcg_temp_free_i32(dest);
 -
 -        tcg_temp_free_i32(zero);
 -    }
 -
--    return true;
+-    opr_sz = (1 + q) * 8;
 -    if (fn_gvec_ptr) {
 -        TCGv_ptr ptr;
 -        if (ptr_is_env) {
 -            ptr = cpu_env;
 -        } else {
 -            ptr = get_fpstatus_ptr(1);
 -        }
 -        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
 -                           opr_sz, opr_sz, data, fn_gvec_ptr);
 -        if (!ptr_is_env) {
 -            tcg_temp_free_ptr(ptr);
 -        }
 -    } else {
 -        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
 -                           opr_sz, opr_sz, data, fn_gvec);
 -    }
 -    return 0;
 -}
 -
--static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
+ /* Advanced SIMD two registers and a scalar extension.
--{
+  *  31             24   23  22   20   16   12  11   10   9    8        3     0
--    uint32_t rd, rn, rm;
+  * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
--    bool dp = a->dp;
+@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
--    bool vmin = a->op;
+                     }
--    TCGv_ptr fpst;
+                 }
--
+             }
--    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
+-        } else if ((insn & 0x0e000a00) == 0x0c000800
--        return false;
+-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
--    }
+-            if (disas_neon_insn_3same_ext(s, insn)) {
--
+-                goto illegal_op;
--    /* UNDEF accesses to D16-D31 if they don't exist */
+-            }
--    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+-            return;
--        ((a->vm | a->vn | a->vd) & 0x10)) {
+         } else if ((insn & 0x0f000a00) == 0x0e000800
--        return false;
+                    && arm_dc_feature(s, ARM_FEATURE_V8)) {
--    }
+             if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
--    rd = a->vd;
+@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
--    rn = a->vn;
+             }
--    rm = a->vm;
+             break;
--
+         }
--    if (!vfp_access_check(s)) {
+-        if ((insn & 0xfe000a00) == 0xfc000800
--        return true;
++        if ((insn & 0xff000a00) == 0xfe000800
--    }
+             && arm_dc_feature(s, ARM_FEATURE_V8)) {
--
+             /* The Thumb2 and ARM encodings are identical.  */
--    fpst = get_fpstatus_ptr(0);
+-            if (disas_neon_insn_3same_ext(s, insn)) {
--
+-                goto illegal_op;
--    if (dp) {
+-            }
--        TCGv_i64 frn, frm, dest;
+-        } else if ((insn & 0xff000a00) == 0xfe000800
--
+-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
--        frn = tcg_temp_new_i64();
+-            /* The Thumb2 and ARM encodings are identical.  */
--        frm = tcg_temp_new_i64();
+             if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
--        dest = tcg_temp_new_i64();
+                 goto illegal_op;
--
+             }
 -        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        if (vmin) {
 -            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
 -        } else {
 -            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
 -        }
 -        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i64(frn);
 -        tcg_temp_free_i64(frm);
 -        tcg_temp_free_i64(dest);
 -    } else {
 -        TCGv_i32 frn, frm, dest;
 -
 -        frn = tcg_temp_new_i32();
 -        frm = tcg_temp_new_i32();
 -        dest = tcg_temp_new_i32();
 -
 -        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
 -        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
 -        if (vmin) {
 -            gen_helper_vfp_minnums(dest, frn, frm, fpst);
 -        } else {
 -            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
 -        }
 -        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i32(frn);
 -        tcg_temp_free_i32(frm);
 -        tcg_temp_free_i32(dest);
 -    }
 -
 -    tcg_temp_free_ptr(fpst);
 -    return true;
 -}
 -
 -/*
 - * Table for converting the most common AArch32 encoding of
 - * rounding mode to arm_fprounding order (which matches the
 - * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
 - */
 -static const uint8_t fp_decode_rm[] = {
 -    FPROUNDING_TIEAWAY,
 -    FPROUNDING_TIEEVEN,
 -    FPROUNDING_POSINF,
 -    FPROUNDING_NEGINF,
 -};
 -
 -static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 -{
 -    uint32_t rd, rm;
 -    bool dp = a->dp;
 -    TCGv_ptr fpst;
 -    TCGv_i32 tcg_rmode;
 -    int rounding = fp_decode_rm[a->rm];
 -
 -    if (!dc_isar_feature(aa32_vrint, s)) {
 -        return false;
 -    }
 -
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 -        ((a->vm | a->vd) & 0x10)) {
 -        return false;
 -    }
 -    rd = a->vd;
 -    rm = a->vm;
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    fpst = get_fpstatus_ptr(0);
 -
 -    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -
 -    if (dp) {
 -        TCGv_i64 tcg_op;
 -        TCGv_i64 tcg_res;
 -        tcg_op = tcg_temp_new_i64();
 -        tcg_res = tcg_temp_new_i64();
 -        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 -        gen_helper_rintd(tcg_res, tcg_op, fpst);
 -        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i64(tcg_op);
 -        tcg_temp_free_i64(tcg_res);
 -    } else {
 -        TCGv_i32 tcg_op;
 -        TCGv_i32 tcg_res;
 -        tcg_op = tcg_temp_new_i32();
 -        tcg_res = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
 -        gen_helper_rints(tcg_res, tcg_op, fpst);
 -        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
 -        tcg_temp_free_i32(tcg_op);
 -        tcg_temp_free_i32(tcg_res);
 -    }
 -
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    tcg_temp_free_i32(tcg_rmode);
 -
 -    tcg_temp_free_ptr(fpst);
 -    return true;
 -}
 -
 -static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 -{
 -    uint32_t rd, rm;
 -    bool dp = a->dp;
 -    TCGv_ptr fpst;
 -    TCGv_i32 tcg_rmode, tcg_shift;
 -    int rounding = fp_decode_rm[a->rm];
 -    bool is_signed = a->op;
 -
 -    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
 -        return false;
 -    }
 -
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 -        return false;
 -    }
 -    rd = a->vd;
 -    rm = a->vm;
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    fpst = get_fpstatus_ptr(0);
 -
 -    tcg_shift = tcg_const_i32(0);
 -
 -    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -
 -    if (dp) {
 -        TCGv_i64 tcg_double, tcg_res;
 -        TCGv_i32 tcg_tmp;
 -        tcg_double = tcg_temp_new_i64();
 -        tcg_res = tcg_temp_new_i64();
 -        tcg_tmp = tcg_temp_new_i32();
 -        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
 -        if (is_signed) {
 -            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
 -        } else {
 -            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
 -        }
 -        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
 -        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
 -        tcg_temp_free_i32(tcg_tmp);
 -        tcg_temp_free_i64(tcg_res);
 -        tcg_temp_free_i64(tcg_double);
 -    } else {
 -        TCGv_i32 tcg_single, tcg_res;
 -        tcg_single = tcg_temp_new_i32();
 -        tcg_res = tcg_temp_new_i32();
 -        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
 -        if (is_signed) {
 -            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
 -        } else {
 -            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
 -        }
 -        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
 -        tcg_temp_free_i32(tcg_res);
 -        tcg_temp_free_i32(tcg_single);
 -    }
 -
 -    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    tcg_temp_free_i32(tcg_rmode);
 -
 -    tcg_temp_free_i32(tcg_shift);
 -
 -    tcg_temp_free_ptr(fpst);
 -
 -    return true;
 -}
 -
  /*
   * Disassemble a VFP instruction.  Returns nonzero if an error occurred
   * (ie. an undefined instruction).
 --
 .20.1

-[Qemu-devel] [PULL 44/48] target/arm: Convert integer-to-float insns to decodetree
+[PULL 27/39] target/arm: Convert VCMLA (scalar) to decodetree
-Convert the VCVT integer-to-float instructions to decodetree.
+Convert VCMLA (scalar) in the 2reg-scalar-ext group to decodetree.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-9-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 58 ++++++++++++++++++++++++++++++++++
+ target/arm/neon-shared.decode   |  5 +++++
- target/arm/translate.c         | 12 +------
+ target/arm/translate-neon.inc.c | 40 +++++++++++++++++++++++++++++++++
- target/arm/vfp.decode          |  6 ++++
+ target/arm/translate.c          | 26 +--------------------
-files changed, 65 insertions(+), 11 deletions(-)
+files changed, 46 insertions(+), 25 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-shared.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
+@@ -XXX,XX +XXX,XX @@ VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
-     tcg_temp_free_i64(vm);
+                vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
  VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
 +
 +VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
 +               vn=%vn_dp vd=%vd_dp size=0
 +VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFML(DisasContext *s, arg_VFML *a)
                         gen_helper_gvec_fmlal_a32);
      return true;
  }
 +
-+static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
++static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
 +{
-+    TCGv_i32 vm;
++    gen_helper_gvec_3_ptr *fn_gvec_ptr;
 +    int opr_sz;
 +    TCGv_ptr fpst;
 +
-+    if (!vfp_access_check(s)) {
++    if (!dc_isar_feature(aa32_vcma, s)) {
-+        return true;
++        return false;
 +    }
 +    if (a->size == 0 && !dc_isar_feature(aa32_fp16_arith, s)) {
 +        return false;
 +    }
 +
-+    vm = tcg_temp_new_i32();
++    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    neon_load_reg32(vm, a->vm);
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+    fpst = get_fpstatus_ptr(false);
++        ((a->vd | a->vn | a->vm) & 0x10)) {
-+    if (a->s) {
++        return false;
 +        /* i32 -> f32 */
 +        gen_helper_vfp_sitos(vm, vm, fpst);
 +    } else {
 +        /* u32 -> f32 */
 +        gen_helper_vfp_uitos(vm, vm, fpst);
 +    }
-+    neon_store_reg32(vm, a->vd);
-+    tcg_temp_free_i32(vm);
-+    tcg_temp_free_ptr(fpst);
-+    return true;
-+}
 +
-+static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
++    if ((a->vd | a->vn) & a->q) {
 +{
 +    TCGv_i32 vm;
 +    TCGv_i64 vd;
 +    TCGv_ptr fpst;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    vm = tcg_temp_new_i32();
++    fn_gvec_ptr = (a->size ? gen_helper_gvec_fcmlas_idx
-+    vd = tcg_temp_new_i64();
++                   : gen_helper_gvec_fcmlah_idx);
-+    neon_load_reg32(vm, a->vm);
++    opr_sz = (1 + a->q) * 8;
-+    fpst = get_fpstatus_ptr(false);
++    fpst = get_fpstatus_ptr(1);
-+    if (a->s) {
++    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
-+        /* i32 -> f64 */
++                       vfp_reg_offset(1, a->vn),
-+        gen_helper_vfp_sitod(vd, vm, fpst);
++                       vfp_reg_offset(1, a->vm),
-+    } else {
++                       fpst, opr_sz, opr_sz,
-+        /* u32 -> f64 */
++                       (a->index << 2) | a->rot, fn_gvec_ptr);
 +        gen_helper_vfp_uitod(vd, vm, fpst);
 +    }
 +    neon_store_reg64(vd, a->vd);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_i64(vd);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
-                 return 1;
+     bool is_long = false, q = extract32(insn, 6, 1);
-             case 15:
+     bool ptr_is_env = false;
-                 switch (rn) {
--                case 0 ... 15:
+-    if ((insn & 0xff000f10) == 0xfe000800) {
-+                case 0 ... 17:
+-        /* VCMLA (indexed) -- 1111 1110 S.RR .... .... 1000 ...0 .... */
-                     /* Already handled by decodetree */
+-        int rot = extract32(insn, 20, 2);
-                     return 1;
+-        int size = extract32(insn, 23, 1);
-                 default:
+-        int index;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-
-             if (op == 15) {
+-        if (!dc_isar_feature(aa32_vcma, s)) {
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
+-            return 1;
-                 switch (rn) {
+-        }
--                case 0x10: /* vcvt.fxx.u32 */
+-        if (size == 0) {
--                case 0x11: /* vcvt.fxx.s32 */
+-            if (!dc_isar_feature(aa32_fp16_arith, s)) {
--                    rm_is_dp = false;
+-                return 1;
--                    break;
+-            }
-                 case 0x18: /* vcvtr.u32.fxx */
+-            /* For fp16, rm is just Vm, and index is M.  */
-                 case 0x19: /* vcvtz.u32.fxx */
+-            rm = extract32(insn, 0, 4);
-                 case 0x1a: /* vcvtr.s32.fxx */
+-            index = extract32(insn, 5, 1);
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-        } else {
-                 switch (op) {
+-            /* For fp32, rm is the usual M:Vm, and index is 0.  */
-                 case 15: /* extension space */
+-            VFP_DREG_M(rm, insn);
-                     switch (rn) {
+-            index = 0;
--                    case 16: /* fuito */
+-        }
--                        gen_vfp_uito(dp, 0);
+-        data = (index << 2) | rot;
--                        break;
+-        fn_gvec_ptr = (size ? gen_helper_gvec_fcmlas_idx
--                    case 17: /* fsito */
+-                       : gen_helper_gvec_fcmlah_idx);
--                        gen_vfp_sito(dp, 0);
+-    } else if ((insn & 0xffb00f00) == 0xfe200d00) {
--                        break;
++    if ((insn & 0xffb00f00) == 0xfe200d00) {
-                     case 19: /* vjcvt */
+         /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
-                         gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
+         int u = extract32(insn, 4, 1);
-                         break;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_sp      ---- 1110 1.11 0111 .... 1010 11.0 .... \
               vd=%vd_dp vm=%vm_sp
  VCVT_dp      ---- 1110 1.11 0111 .... 1011 11.0 .... \
               vd=%vd_sp vm=%vm_dp
 +
 +# VCVT from integer to floating point: Vm always single; Vd depends on size
 +VCVT_int_sp  ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
 +             vd=%vd_dp vm=%vm_sp
 --
 .20.1

-[Qemu-devel] [PULL 47/48] target/arm: Convert float-to-integer VCVT insns to decodetree
+[PULL 28/39] target/arm: Convert V[US]DOT (scalar) to decodetree
-Convert the float-to-integer VCVT instructions to decodetree.
+Convert the V[US]DOT (scalar) insns in the 2reg-scalar-ext group
-Since these are the last unconverted instructions, we can
+to decodetree.
 delete the old decoder structure entirely now.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-10-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c |  72 ++++++++++
+ target/arm/neon-shared.decode   |  3 +++
- target/arm/translate.c         | 241 +--------------------------------
+ target/arm/translate-neon.inc.c | 35 +++++++++++++++++++++++++++++++++
- target/arm/vfp.decode          |   6 +
+ target/arm/translate.c          | 13 +-----------
-files changed, 80 insertions(+), 239 deletions(-)
+files changed, 39 insertions(+), 12 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-shared.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
+@@ -XXX,XX +XXX,XX @@ VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                 vn=%vn_dp vd=%vd_dp size=0
  VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
 +
 +VDOT_scalar    1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
      tcg_temp_free_ptr(fpst);
      return true;
  }
 +
-+static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
++static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
 +{
-+    TCGv_i32 vm;
++    gen_helper_gvec_3 *fn_gvec;
 +    int opr_sz;
 +    TCGv_ptr fpst;
 +
-+    if (!vfp_access_check(s)) {
++    if (!dc_isar_feature(aa32_dp, s)) {
-+        return true;
++        return false;
 +    }
 +
-+    fpst = get_fpstatus_ptr(false);
++    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    vm = tcg_temp_new_i32();
++    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+    neon_load_reg32(vm, a->vm);
++        ((a->vd | a->vn) & 0x10)) {
 +        return false;
 +    }
 +
-+    if (a->s) {
++    if ((a->vd | a->vn) & a->q) {
 +        if (a->rz) {
 +            gen_helper_vfp_tosizs(vm, vm, fpst);
 +        } else {
 +            gen_helper_vfp_tosis(vm, vm, fpst);
 +        }
 +    } else {
 +        if (a->rz) {
 +            gen_helper_vfp_touizs(vm, vm, fpst);
 +        } else {
 +            gen_helper_vfp_touis(vm, vm, fpst);
 +        }
 +    }
 +    neon_store_reg32(vm, a->vd);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
 +{
 +    TCGv_i32 vd;
 +    TCGv_i64 vm;
 +    TCGv_ptr fpst;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    fpst = get_fpstatus_ptr(false);
++    fn_gvec = a->u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
-+    vm = tcg_temp_new_i64();
++    opr_sz = (1 + a->q) * 8;
-+    vd = tcg_temp_new_i32();
++    fpst = get_fpstatus_ptr(1);
-+    neon_load_reg64(vm, a->vm);
++    tcg_gen_gvec_3_ool(vfp_reg_offset(1, a->vd),
-+
++                       vfp_reg_offset(1, a->vn),
-+    if (a->s) {
++                       vfp_reg_offset(1, a->rm),
-+        if (a->rz) {
++                       opr_sz, opr_sz, a->index, fn_gvec);
 +            gen_helper_vfp_tosizd(vd, vm, fpst);
 +        } else {
 +            gen_helper_vfp_tosid(vd, vm, fpst);
 +        }
 +    } else {
 +        if (a->rz) {
 +            gen_helper_vfp_touizd(vd, vm, fpst);
 +        } else {
 +            gen_helper_vfp_touid(vd, vm, fpst);
 +        }
 +    }
 +    neon_store_reg32(vd, a->vd);
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i64(vm);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp, int neon) \
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
-     tcg_temp_free_ptr(statusptr); \
+     bool is_long = false, q = extract32(insn, 6, 1);
- }
+     bool ptr_is_env = false;
--VFP_GEN_FTOI(toui)
+-    if ((insn & 0xffb00f00) == 0xfe200d00) {
- VFP_GEN_FTOI(touiz)
+-        /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
--VFP_GEN_FTOI(tosi)
+-        int u = extract32(insn, 4, 1);
  VFP_GEN_FTOI(tosiz)
  #undef VFP_GEN_FTOI
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  }
  #define tcg_gen_ld_f32 tcg_gen_ld_i32
 -#define tcg_gen_ld_f64 tcg_gen_ld_i64
  #define tcg_gen_st_f32 tcg_gen_st_i32
 -#define tcg_gen_st_f64 tcg_gen_st_i64
 -
--static inline void gen_mov_F0_vreg(int dp, int reg)
+-        if (!dc_isar_feature(aa32_dp, s)) {
 -{
 -    if (dp)
 -        tcg_gen_ld_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
 -    else
 -        tcg_gen_ld_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
 -}
 -
 -static inline void gen_mov_F1_vreg(int dp, int reg)
 -{
 -    if (dp)
 -        tcg_gen_ld_f64(cpu_F1d, cpu_env, vfp_reg_offset(dp, reg));
 -    else
 -        tcg_gen_ld_f32(cpu_F1s, cpu_env, vfp_reg_offset(dp, reg));
 -}
 -
 -static inline void gen_mov_vreg_F0(int dp, int reg)
 -{
 -    if (dp)
 -        tcg_gen_st_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
 -    else
 -        tcg_gen_st_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
 -}
  #define ARM_CP_RW_BIT   (1 << 20)
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
   */
  static int disas_vfp_insn(DisasContext *s, uint32_t insn)
  {
 -    uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
 -    int dp, veclen;
 -
      if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
          return 1;
      }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              return 0;
          }
      }
 -
 -    if (extract32(insn, 28, 4) == 0xf) {
 -        /*
 -         * Encodings with T=1 (Thumb) or unconditional (ARM): these
 -         * were all handled by the decodetree decoder, so any insn
 -         * patterns which get here must be UNDEF.
 -         */
 -        return 1;
 -    }
 -
 -    /*
 -     * FIXME: this access check should not take precedence over UNDEF
 -     * for invalid encodings; we will generate incorrect syndrome information
 -     * for attempts to execute invalid vfp/neon encodings with FP disabled.
 -     */
 -    if (!vfp_access_check(s)) {
 -        return 0;
 -    }
 -
 -    dp = ((insn & 0xf00) == 0xb00);
 -    switch ((insn >> 24) & 0xf) {
 -    case 0xe:
 -        if (insn & (1 << 4)) {
 -            /* already handled by decodetree */
 -            return 1;
--        } else {
--            /* data processing */
--            bool rd_is_dp = dp;
--            bool rm_is_dp = dp;
--            bool no_output = false;
--
--            /* The opcode is in bits 23, 21, 20 and 6.  */
--            op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
--            rn = VFP_SREG_N(insn);
--
--            switch (op) {
--            case 0 ... 14:
--                /* Already handled by decodetree */
--                return 1;
--            case 15:
--                switch (rn) {
--                case 0 ... 23:
--                case 28 ... 31:
--                    /* Already handled by decodetree */
--                    return 1;
--                default:
--                    break;
--                }
--            default:
--                break;
--            }
--
--            if (op == 15) {
--                /* rn is opcode, encoded as per VFP_SREG_N. */
--                switch (rn) {
--                case 0x18: /* vcvtr.u32.fxx */
--                case 0x19: /* vcvtz.u32.fxx */
--                case 0x1a: /* vcvtr.s32.fxx */
--                case 0x1b: /* vcvtz.s32.fxx */
--                    rd_is_dp = false;
--                    break;
--
--                default:
--                    return 1;
--                }
--            } else if (dp) {
--                /* rn is register number */
--                VFP_DREG_N(rn, insn);
--            }
--
--            if (rd_is_dp) {
--                VFP_DREG_D(rd, insn);
--            } else {
--                rd = VFP_SREG_D(insn);
--            }
--            if (rm_is_dp) {
--                VFP_DREG_M(rm, insn);
--            } else {
--                rm = VFP_SREG_M(insn);
--            }
--
--            veclen = s->vec_len;
--            if (op == 15 && rn > 3) {
--                veclen = 0;
--            }
--
--            /* Shut up compiler warnings.  */
--            delta_m = 0;
--            delta_d = 0;
--            bank_mask = 0;
--
--            if (veclen > 0) {
--                if (dp)
--                    bank_mask = 0xc;
--                else
--                    bank_mask = 0x18;
--
--                /* Figure out what type of vector operation this is.  */
--                if ((rd & bank_mask) == 0) {
--                    /* scalar */
--                    veclen = 0;
--                } else {
--                    if (dp)
--                        delta_d = (s->vec_stride >> 1) + 1;
--                    else
--                        delta_d = s->vec_stride + 1;
--
--                    if ((rm & bank_mask) == 0) {
--                        /* mixed scalar/vector */
--                        delta_m = 0;
--                    } else {
--                        /* vector */
--                        delta_m = delta_d;
--                    }
--                }
--            }
--
--            /* Load the initial operands.  */
--            if (op == 15) {
--                switch (rn) {
--                default:
--                    /* One source operand.  */
--                    gen_mov_F0_vreg(rm_is_dp, rm);
--                    break;
--                }
--            } else {
--                /* Two source operands.  */
--                gen_mov_F0_vreg(dp, rn);
--                gen_mov_F1_vreg(dp, rm);
--            }
--
--            for (;;) {
--                /* Perform the calculation.  */
--                switch (op) {
--                case 15: /* extension space */
--                    switch (rn) {
--                    case 24: /* ftoui */
--                        gen_vfp_toui(dp, 0);
--                        break;
--                    case 25: /* ftouiz */
--                        gen_vfp_touiz(dp, 0);
--                        break;
--                    case 26: /* ftosi */
--                        gen_vfp_tosi(dp, 0);
--                        break;
--                    case 27: /* ftosiz */
--                        gen_vfp_tosiz(dp, 0);
--                        break;
--                    default: /* undefined */
--                        g_assert_not_reached();
--                    }
--                    break;
--                default: /* undefined */
--                    return 1;
--                }
--
--                /* Write back the result, if any.  */
--                if (!no_output) {
--                    gen_mov_vreg_F0(rd_is_dp, rd);
--                }
--
--                /* break out of the loop if we have finished  */
--                if (veclen == 0) {
--                    break;
--                }
--
--                if (op == 15 && delta_m == 0) {
--                    /* single source one-many */
--                    while (veclen--) {
--                        rd = ((rd + delta_d) & (bank_mask - 1))
--                             | (rd & bank_mask);
--                        gen_mov_vreg_F0(dp, rd);
--                    }
--                    break;
--                }
--                /* Setup the next operands.  */
--                veclen--;
--                rd = ((rd + delta_d) & (bank_mask - 1))
--                     | (rd & bank_mask);
--
--                if (op == 15) {
--                    /* One source operand.  */
--                    rm = ((rm + delta_m) & (bank_mask - 1))
--                         | (rm & bank_mask);
--                    gen_mov_F0_vreg(dp, rm);
--                } else {
--                    /* Two source operands.  */
--                    rn = ((rn + delta_d) & (bank_mask - 1))
--                         | (rn & bank_mask);
--                    gen_mov_F0_vreg(dp, rn);
--                    if (delta_m) {
--                        rm = ((rm + delta_m) & (bank_mask - 1))
--                             | (rm & bank_mask);
--                        gen_mov_F1_vreg(dp, rm);
--                    }
--                }
--            }
 -        }
--        break;
+-        fn_gvec = u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
--    case 0xc:
+-        /* rm is just Vm, and index is M.  */
--    case 0xd:
+-        data = extract32(insn, 5, 1); /* index */
--        /* Already handled by decodetree */
+-        rm = extract32(insn, 0, 4);
--        return 1;
+-    } else if ((insn & 0xffa00f10) == 0xfe000810) {
--    default:
++    if ((insn & 0xffa00f10) == 0xfe000810) {
--        /* Should never happen.  */
+         /* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
--        return 1;
+         int is_s = extract32(insn, 20, 1);
--    }
+         int vm20 = extract32(insn, 0, 3);
 -    return 0;
 +    /* If the decodetree decoder didn't handle this insn, it must be UNDEF */
 +    return 1;
  }
  static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_fix_sp  ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
               vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
  VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
               vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
 +
 +# VCVT float to integer (VCVT and VCVTR): Vd always single; Vd depends on size
 +VCVT_sp_int  ---- 1110 1.11 110 s:1 .... 1010 rz:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VCVT_dp_int  ---- 1110 1.11 110 s:1 .... 1011 rz:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_dp
 --
 .20.1

-[Qemu-devel] [PULL 41/48] target/arm: Convert the VCVT-to-f16 insns to decodetree
+[PULL 29/39] target/arm: Convert VFM[AS]L (scalar) to decodetree
-Convert the VCVTT and VCVTB instructions which convert from
+Convert the VFM[AS]L (scalar) insns in the 2reg-scalar-ext group
-f32 and f64 to f16 to decodetree.
+to decodetree. These are the last ones in the group so we can remove
+all the legacy decode for the group.
-Since we're no longer constrained to the old decoder's style
-using cpu_F0s and cpu_F0d we can perform a direct 16 bit
+Note that in disas_thumb2_insn() the parts of this encoding space
-store of the right half of the input single-precision register
+where the decodetree decoder returns false will correctly be directed
-rather than doing a load/modify/store sequence on the full
+to illegal_op by the "(insn & (1 << 28))" check so they won't fall
-bits.
+into disas_coproc_insn() by mistake.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-11-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 62 ++++++++++++++++++++++++++
+ target/arm/neon-shared.decode   |   7 +++
- target/arm/translate.c         | 79 +---------------------------------
+ target/arm/translate-neon.inc.c |  32 ++++++++++
- target/arm/vfp.decode          |  6 +++
+ target/arm/translate.c          | 107 +-------------------------------
-files changed, 69 insertions(+), 78 deletions(-)
+files changed, 40 insertions(+), 106 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-shared.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
+@@ -XXX,XX +XXX,XX @@ VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
-     tcg_temp_free_i64(vd);
  VDOT_scalar    1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +%vfml_scalar_q0_rm 0:3 5:1
 +%vfml_scalar_q1_index 5:1 3:1
 +VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
 +               rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
 +VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
 +               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
      tcg_temp_free_ptr(fpst);
      return true;
  }
 +
-+static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
++static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
 +{
-+    TCGv_ptr fpst;
++    int opr_sz;
-+    TCGv_i32 ahp_mode;
++
-+    TCGv_i32 tmp;
++    if (!dc_isar_feature(aa32_fhm, s)) {
-+
++        return false;
-+    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
++    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd & 0x10) || (a->q && (a->vn & 0x10)))) {
 +        return false;
 +    }
 +
 +    if (a->vd & a->q) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    fpst = get_fpstatus_ptr(false);
++    opr_sz = (1 + a->q) * 8;
-+    ahp_mode = get_ahp_flag();
++    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
-+    tmp = tcg_temp_new_i32();
++                       vfp_reg_offset(a->q, a->vn),
-+
++                       vfp_reg_offset(a->q, a->rm),
-+    neon_load_reg32(tmp, a->vm);
++                       cpu_env, opr_sz, opr_sz,
-+    gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
++                       (a->index << 2) | a->s, /* is_2 == 0 */
-+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
++                       gen_helper_gvec_fmlal_idx_a32);
 +    tcg_temp_free_i32(ahp_mode);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
 +static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i32 ahp_mode;
 +    TCGv_i32 tmp;
 +    TCGv_i64 vm;
 +
 +    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm  & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(false);
 +    ahp_mode = get_ahp_flag();
 +    tmp = tcg_temp_new_i32();
 +    vm = tcg_temp_new_i64();
 +
 +    neon_load_reg64(vm, a->vm);
 +    gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
 +    tcg_temp_free_i64(vm);
 +    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
 +    tcg_temp_free_i32(ahp_mode);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
 @@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
- #define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
+ }
  #define VFP_REG_SHR(x, n) (((n) > 0) ? (x) >> (n) : (x) << -(n))
 -#define VFP_SREG(insn, bigbit, smallbit) \
 -  ((VFP_REG_SHR(insn, bigbit - 1) & 0x1e) | (((insn) >> (smallbit)) & 1))
  #define VFP_DREG(reg, insn, bigbit, smallbit) do { \
      if (dc_isar_feature(aa32_simd_r32, s)) { \
          reg = (((insn) >> (bigbit)) & 0x0f) \
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
          reg = ((insn) >> (bigbit)) & 0x0f; \
      }} while (0)
 -#define VFP_SREG_D(insn) VFP_SREG(insn, 12, 22)
  #define VFP_DREG_D(reg, insn) VFP_DREG(reg, insn, 12, 22)
 -#define VFP_SREG_N(insn) VFP_SREG(insn, 16,  7)
  #define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16,  7)
 -#define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
  #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
--/* Move between integer and VFP cores.  */
+ static void gen_neon_dup_low16(TCGv_i32 var)
--static TCGv_i32 gen_vfp_mrs(void)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
      return 0;
  }
 -/* Advanced SIMD two registers and a scalar extension.
 - *  31             24   23  22   20   16   12  11   10   9    8        3     0
 - * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
 - * | 1 1 1 1 1 1 1 0 | o1 | D | o2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
 - * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
 - *
 - */
 -
 -static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
 -{
--    TCGv_i32 tmp = tcg_temp_new_i32();
+-    gen_helper_gvec_3 *fn_gvec = NULL;
--    tcg_gen_mov_i32(tmp, cpu_F0s);
+-    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
--    return tmp;
+-    int rd, rn, rm, opr_sz, data;
 -    int off_rn, off_rm;
 -    bool is_long = false, q = extract32(insn, 6, 1);
 -    bool ptr_is_env = false;
 -
 -    if ((insn & 0xffa00f10) == 0xfe000810) {
 -        /* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
 -        int is_s = extract32(insn, 20, 1);
 -        int vm20 = extract32(insn, 0, 3);
 -        int vm3 = extract32(insn, 3, 1);
 -        int m = extract32(insn, 5, 1);
 -        int index;
 -
 -        if (!dc_isar_feature(aa32_fhm, s)) {
 -            return 1;
 -        }
 -        if (q) {
 -            rm = vm20;
 -            index = m * 2 + vm3;
 -        } else {
 -            rm = vm20 * 2 + m;
 -            index = vm3;
 -        }
 -        is_long = true;
 -        data = (index << 2) | is_s; /* is_2 == 0 */
 -        fn_gvec_ptr = gen_helper_gvec_fmlal_idx_a32;
 -        ptr_is_env = true;
 -    } else {
 -        return 1;
 -    }
 -
 -    VFP_DREG_D(rd, insn);
 -    if (rd & q) {
 -        return 1;
 -    }
 -    if (q || !is_long) {
 -        VFP_DREG_N(rn, insn);
 -        if (rn & q & !is_long) {
 -            return 1;
 -        }
 -        off_rn = vfp_reg_offset(1, rn);
 -        off_rm = vfp_reg_offset(1, rm);
 -    } else {
 -        rn = VFP_SREG_N(insn);
 -        off_rn = vfp_reg_offset(0, rn);
 -        off_rm = vfp_reg_offset(0, rm);
 -    }
 -    if (s->fp_excp_el) {
 -        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
 -                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
 -        return 0;
 -    }
 -    if (!s->vfp_enabled) {
 -        return 1;
 -    }
 -
 -    opr_sz = (1 + q) * 8;
 -    if (fn_gvec_ptr) {
 -        TCGv_ptr ptr;
 -        if (ptr_is_env) {
 -            ptr = cpu_env;
 -        } else {
 -            ptr = get_fpstatus_ptr(1);
 -        }
 -        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
 -                           opr_sz, opr_sz, data, fn_gvec_ptr);
 -        if (!ptr_is_env) {
 -            tcg_temp_free_ptr(ptr);
 -        }
 -    } else {
 -        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
 -                           opr_sz, opr_sz, data, fn_gvec);
 -    }
 -    return 0;
 -}
 -
--static void gen_vfp_msr(TCGv_i32 tmp)
+ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
 -{
 -    tcg_gen_mov_i32(cpu_F0s, tmp);
 -    tcg_temp_free_i32(tmp);
 -}
 -
  static void gen_neon_dup_low16(TCGv_i32 var)
  {
-     TCGv_i32 tmp = tcg_temp_new_i32();
+     int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
- {
+                     }
-     uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
+                 }
-     int dp, veclen;
+             }
--    TCGv_i32 tmp;
+-        } else if ((insn & 0x0f000a00) == 0x0e000800
--    TCGv_i32 tmp2;
+-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
+-            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
-     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
+-                goto illegal_op;
-         return 1;
+-            }
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-            return;
-                 return 1;
+         }
-             case 15:
+         goto illegal_op;
-                 switch (rn) {
+     }
--                case 0 ... 5:
+@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
--                case 8 ... 11:
+             }
-+                case 0 ... 11:
+             break;
-                     /* Already handled by decodetree */
+         }
-                     return 1;
+-        if ((insn & 0xff000a00) == 0xfe000800
-                 default:
+-            && arm_dc_feature(s, ARM_FEATURE_V8)) {
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-            /* The Thumb2 and ARM encodings are identical.  */
-             if (op == 15) {
+-            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
+-                goto illegal_op;
-                 switch (rn) {
+-            }
--                case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
+-        } else if (((insn >> 24) & 3) == 3) {
--                case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
++        if (((insn >> 24) & 3) == 3) {
--                    if (dp) {
+             /* Translate into the equivalent ARM encoding.  */
--                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
+             insn = (insn & 0xe2ffffff) | ((insn & (1 << 28)) >> 4) | (1 << 28);
--                            return 1;
+             if (disas_neon_data_insn(s, insn)) {
 -                        }
 -                    } else {
 -                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
 -                            return 1;
 -                        }
 -                    }
 -                    rd_is_dp = false;
 -                    break;
 -
                  case 0x0c: /* vrintr */
                  case 0x0d: /* vrintz */
                  case 0x0e: /* vrintx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(false);
 -                        TCGv_i32 ahp = get_ahp_flag();
 -                        tmp = tcg_temp_new_i32();
 -
 -                        if (dp) {
 -                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
 -                                                           fpst, ahp);
 -                        } else {
 -                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
 -                                                           fpst, ahp);
 -                        }
 -                        tcg_temp_free_i32(ahp);
 -                        tcg_temp_free_ptr(fpst);
 -                        gen_mov_F0_vreg(0, rd);
 -                        tmp2 = gen_vfp_mrs();
 -                        tcg_gen_andi_i32(tmp2, tmp2, 0xffff0000);
 -                        tcg_gen_or_i32(tmp, tmp, tmp2);
 -                        tcg_temp_free_i32(tmp2);
 -                        gen_vfp_msr(tmp);
 -                        break;
 -                    }
 -                    case 7: /* vcvtt.f16.f32, vcvtt.f16.f64 */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(false);
 -                        TCGv_i32 ahp = get_ahp_flag();
 -                        tmp = tcg_temp_new_i32();
 -                        if (dp) {
 -                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
 -                                                           fpst, ahp);
 -                        } else {
 -                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
 -                                                           fpst, ahp);
 -                        }
 -                        tcg_temp_free_i32(ahp);
 -                        tcg_temp_free_ptr(fpst);
 -                        tcg_gen_shli_i32(tmp, tmp, 16);
 -                        gen_mov_F0_vreg(0, rd);
 -                        tmp2 = gen_vfp_mrs();
 -                        tcg_gen_ext16u_i32(tmp2, tmp2);
 -                        tcg_gen_or_i32(tmp, tmp, tmp2);
 -                        tcg_temp_free_i32(tmp2);
 -                        gen_vfp_msr(tmp);
 -                        break;
 -                    }
                      case 12: /* vrintr */
                      {
                          TCGv_ptr fpst = get_fpstatus_ptr(0);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
               vd=%vd_dp vm=%vm_sp
 +
 +# VCVTB and VCVTT to f16: Vd format is always vd_sp; Vm format depends on size bit
 +VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_dp
 --
 .20.1

-[Qemu-devel] [PULL 22/48] target/arm: Convert the VFP load/store multiple insns to decodetree
+[PULL 30/39] target/arm: Convert Neon load/store multiple structures to decodetree
-Convert the VFP load/store multiple insns to decodetree.
+Convert the Neon "load/store multiple structures" insns to decodetree.
 This includes tightening up the UNDEF checking for pre-VFPv3
 CPUs which only have D0-D15 : they now UNDEF for any access
 to D16-D31, not merely when the smallest register in the
 transfer list is in D16-D31.
 This conversion does not try to share code between the single
 precision and the double precision versions; this looks a bit
 duplicative of code, but it leaves the door open for a future
 refactoring which gets rid of the use of the "F0" registers
 by inlining the various functions like gen_vfp_ld() and
 gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
 conditionalisation.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-12-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 162 +++++++++++++++++++++++++++++++++
+ target/arm/neon-ls.decode       |   7 ++
- target/arm/translate.c         |  97 +-------------------
+ target/arm/translate-neon.inc.c | 124 ++++++++++++++++++++++++++++++++
- target/arm/vfp.decode          |  18 ++++
+ target/arm/translate.c          |  91 +----------------------
-files changed, 183 insertions(+), 94 deletions(-)
+files changed, 133 insertions(+), 89 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-ls.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-ls.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
+@@ -XXX,XX +XXX,XX @@
+ #   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
  # This file works on the A32 encoding only; calling code for T32 has to
  # transform the insn into the A32 version first.
 +
 +%vd_dp  22:1 12:4
 +
 +# Neon load/store multiple structures
 +
 +VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
 +               vd=%vd_dp
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
                         gen_helper_gvec_fmlal_idx_a32);
      return true;
  }
 +
-+static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
++static struct {
 +    int nregs;
 +    int interleave;
 +    int spacing;
 +} const neon_ls_element_type[11] = {
 +    {1, 4, 1},
 +    {1, 4, 2},
 +    {4, 1, 1},
 +    {2, 2, 2},
 +    {1, 3, 1},
 +    {1, 3, 2},
 +    {3, 1, 1},
 +    {1, 1, 1},
 +    {1, 2, 1},
 +    {1, 2, 2},
 +    {2, 1, 1}
 +};
 +
 +static void gen_neon_ldst_base_update(DisasContext *s, int rm, int rn,
 +                                      int stride)
 +{
-+    uint32_t offset;
++    if (rm != 15) {
-+    TCGv_i32 addr;
++        TCGv_i32 base;
-+    int i, n;
++
-+
++        base = load_reg(s, rn);
-+    n = a->imm;
++        if (rm == 13) {
-+
++            tcg_gen_addi_i32(base, base, stride);
-+    if (n == 0 || (a->vd + n) > 32) {
++        } else {
-+        /*
++            TCGv_i32 index;
-+         * UNPREDICTABLE cases for bad immediates: we choose to
++            index = load_reg(s, rm);
-+         * UNDEF to avoid generating huge numbers of TCG ops
++            tcg_gen_add_i32(base, base, index);
-+         */
++            tcg_temp_free_i32(index);
-+        return false;
++        }
-+    }
++        store_reg(s, rn, base);
-+    if (a->rn == 15 && a->w) {
++    }
-+        /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
++}
 +
 +static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
 +{
 +    /* Neon load/store multiple structures */
 +    int nregs, interleave, spacing, reg, n;
 +    MemOp endian = s->be_data;
 +    int mmu_idx = get_mem_index(s);
 +    int size = a->size;
 +    TCGv_i64 tmp64;
 +    TCGv_i32 addr, tmp;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +    if (a->itype > 10) {
 +        return false;
 +    }
 +    /* Catch UNDEF cases for bad values of align field */
 +    switch (a->itype & 0xc) {
 +    case 4:
 +        if (a->align >= 2) {
 +            return false;
 +        }
 +        break;
 +    case 8:
 +        if (a->align == 3) {
 +            return false;
 +        }
 +        break;
 +    default:
 +        break;
 +    }
 +    nregs = neon_ls_element_type[a->itype].nregs;
 +    interleave = neon_ls_element_type[a->itype].interleave;
 +    spacing = neon_ls_element_type[a->itype].spacing;
 +    if (size == 3 && (interleave | spacing) != 1) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    if (s->thumb && a->rn == 15) {
++    /* For our purposes, bytes are always little-endian.  */
-+        /* This is actually UNPREDICTABLE */
++    if (size == 0) {
-+        addr = tcg_temp_new_i32();
++        endian = MO_LE;
-+        tcg_gen_movi_i32(addr, s->pc & ~2);
++    }
-+    } else {
++    /*
-+        addr = load_reg(s, a->rn);
++     * Consecutive little-endian elements from a single register
-+    }
++     * can be promoted to a larger little-endian operation.
-+    if (a->p) {
++     */
-+        /* pre-decrement */
++    if (interleave == 1 && endian == MO_LE) {
-+        tcg_gen_addi_i32(addr, addr, -(a->imm << 2));
++        size = 3;
 +    }
-+
++    tmp64 = tcg_temp_new_i64();
-+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
++    addr = tcg_temp_new_i32();
-+        /*
++    tmp = tcg_const_i32(1 << size);
-+         * Here 'addr' is the lowest address we will store to,
++    load_reg_var(s, addr, a->rn);
-+         * and is either the old SP (if post-increment) or
++    for (reg = 0; reg < nregs; reg++) {
-+         * the new SP (if pre-decrement). For post-increment
++        for (n = 0; n < 8 >> size; n++) {
-+         * where the old value is below the limit and the new
++            int xs;
-+         * value is above, it is UNKNOWN whether the limit check
++            for (xs = 0; xs < interleave; xs++) {
-+         * triggers; we choose to trigger.
++                int tt = a->vd + reg + spacing * xs;
-+         */
++
-+        gen_helper_v8m_stackcheck(cpu_env, addr);
++                if (a->l) {
-+    }
++                    gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
-+
++                    neon_store_element64(tt, n, size, tmp64);
-+    offset = 4;
++                } else {
-+    for (i = 0; i < n; i++) {
++                    neon_load_element64(tmp64, tt, n, size);
-+        if (a->l) {
++                    gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
-+            /* load */
++                }
-+            gen_vfp_ld(s, false, addr);
++                tcg_gen_add_i32(addr, addr, tmp);
-+            gen_mov_vreg_F0(false, a->vd + i);
++            }
-+        } else {
++        }
-+            /* store */
++    }
-+            gen_mov_F0_vreg(false, a->vd + i);
++    tcg_temp_free_i32(addr);
-+            gen_vfp_st(s, false, addr);
++    tcg_temp_free_i32(tmp);
-+        }
++    tcg_temp_free_i64(tmp64);
-+        tcg_gen_addi_i32(addr, addr, offset);
++
-+    }
++    gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
 +    if (a->w) {
 +        /* writeback */
 +        if (a->p) {
 +            offset = -offset * n;
 +            tcg_gen_addi_i32(addr, addr, offset);
 +        }
 +        store_reg(s, a->rn, addr);
 +    } else {
 +        tcg_temp_free_i32(addr);
 +    }
 +
 +    return true;
 +}
 +
 +static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
 +{
 +    uint32_t offset;
 +    TCGv_i32 addr;
 +    int i, n;
 +
 +    n = a->imm >> 1;
 +
 +    if (n == 0 || (a->vd + n) > 32 || n > 16) {
 +        /*
 +         * UNPREDICTABLE cases for bad immediates: we choose to
 +         * UNDEF to avoid generating huge numbers of TCG ops
 +         */
 +        return false;
 +    }
 +    if (a->rn == 15 && a->w) {
 +        /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd + n) > 16) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (s->thumb && a->rn == 15) {
 +        /* This is actually UNPREDICTABLE */
 +        addr = tcg_temp_new_i32();
 +        tcg_gen_movi_i32(addr, s->pc & ~2);
 +    } else {
 +        addr = load_reg(s, a->rn);
 +    }
 +    if (a->p) {
 +        /* pre-decrement */
 +        tcg_gen_addi_i32(addr, addr, -(a->imm << 2));
 +    }
 +
 +    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
 +        /*
 +         * Here 'addr' is the lowest address we will store to,
 +         * and is either the old SP (if post-increment) or
 +         * the new SP (if pre-decrement). For post-increment
 +         * where the old value is below the limit and the new
 +         * value is above, it is UNKNOWN whether the limit check
 +         * triggers; we choose to trigger.
 +         */
 +        gen_helper_v8m_stackcheck(cpu_env, addr);
 +    }
 +
 +    offset = 8;
 +    for (i = 0; i < n; i++) {
 +        if (a->l) {
 +            /* load */
 +            gen_vfp_ld(s, true, addr);
 +            gen_mov_vreg_F0(true, a->vd + i);
 +        } else {
 +            /* store */
 +            gen_mov_F0_vreg(true, a->vd + i);
 +            gen_vfp_st(s, true, addr);
 +        }
 +        tcg_gen_addi_i32(addr, addr, offset);
 +    }
 +    if (a->w) {
 +        /* writeback */
 +        if (a->p) {
 +            offset = -offset * n;
 +        } else if (a->imm & 1) {
 +            offset = 4;
 +        } else {
 +            offset = 0;
 +        }
 +
 +        if (offset != 0) {
 +            tcg_gen_addi_i32(addr, addr, offset);
 +        }
 +        store_reg(s, a->rn, addr);
 +    } else {
 +        tcg_temp_free_i32(addr);
 +    }
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
+@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
-  */
+ }
- static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 -static struct {
 -    int nregs;
 -    int interleave;
 -    int spacing;
 -} const neon_ls_element_type[11] = {
 -    {1, 4, 1},
 -    {1, 4, 2},
 -    {4, 1, 1},
 -    {2, 2, 2},
 -    {1, 3, 1},
 -    {1, 3, 2},
 -    {3, 1, 1},
 -    {1, 1, 1},
 -    {1, 2, 1},
 -    {1, 2, 2},
 -    {2, 1, 1}
 -};
 -
  /* Translate a NEON load/store element instruction.  Return nonzero if the
     instruction is invalid.  */
  static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
  {
--    uint32_t rd, rn, rm, op, i, n, offset, delta_d, delta_m, bank_mask;
+     int rd, rn, rm;
-+    uint32_t rd, rn, rm, op, i, n, delta_d, delta_m, bank_mask;
+-    int op;
-     int dp, veclen;
+     int nregs;
--    TCGv_i32 addr;
+-    int interleave;
 -    int spacing;
      int stride;
      int size;
      int reg;
      int load;
 -    int n;
      int vec_size;
 -    int mmu_idx;
 -    MemOp endian;
      TCGv_i32 addr;
      TCGv_i32 tmp;
-     TCGv_i32 tmp2;
+-    TCGv_i32 tmp2;
+-    TCGv_i64 tmp64;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-         break;
+     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-     case 0xc:
+         return 1;
-     case 0xd:
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
--        if ((insn & 0x03e00000) == 0x00400000) {
+     rn = (insn >> 16) & 0xf;
--            /* Already handled by decodetree */
+     rm = insn & 0xf;
      load = (insn & (1 << 21)) != 0;
 -    endian = s->be_data;
 -    mmu_idx = get_mem_index(s);
      if ((insn & (1 << 23)) == 0) {
 -        /* Load store all elements.  */
 -        op = (insn >> 8) & 0xf;
 -        size = (insn >> 6) & 3;
 -        if (op > 10)
 -            return 1;
--        } else {
+-        /* Catch UNDEF cases for bad values of align field */
--            /* Load/store */
+-        switch (op & 0xc) {
--            rn = (insn >> 16) & 0xf;
+-        case 4:
--            if (dp)
+-            if (((insn >> 5) & 1) == 1) {
 -                VFP_DREG_D(rd, insn);
 -            else
 -                rd = VFP_SREG_D(insn);
 -            if ((insn & 0x01200000) == 0x01000000) {
 -                /* Already handled by decodetree */
 -                return 1;
--            } else {
+-            }
--                /* load/store multiple */
+-            break;
--                int w = insn & (1 << 21);
+-        case 8:
--                if (dp)
+-            if (((insn >> 4) & 3) == 3) {
--                    n = (insn >> 1) & 0x7f;
+-                return 1;
--                else
+-            }
--                    n = insn & 0xff;
+-            break;
 -        default:
 -            break;
 -        }
 -        nregs = neon_ls_element_type[op].nregs;
 -        interleave = neon_ls_element_type[op].interleave;
 -        spacing = neon_ls_element_type[op].spacing;
 -        if (size == 3 && (interleave | spacing) != 1) {
 -            return 1;
 -        }
 -        /* For our purposes, bytes are always little-endian.  */
 -        if (size == 0) {
 -            endian = MO_LE;
 -        }
 -        /* Consecutive little-endian elements from a single register
 -         * can be promoted to a larger little-endian operation.
 -         */
 -        if (interleave == 1 && endian == MO_LE) {
 -            size = 3;
 -        }
 -        tmp64 = tcg_temp_new_i64();
 -        addr = tcg_temp_new_i32();
 -        tmp2 = tcg_const_i32(1 << size);
 -        load_reg_var(s, addr, rn);
 -        for (reg = 0; reg < nregs; reg++) {
 -            for (n = 0; n < 8 >> size; n++) {
 -                int xs;
 -                for (xs = 0; xs < interleave; xs++) {
 -                    int tt = rd + reg + spacing * xs;
 -
--                if (w && !(((insn >> 23) ^ (insn >> 24)) & 1)) {
+-                    if (load) {
--                    /* P == U , W == 1  => UNDEF */
+-                        gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
--                    return 1;
+-                        neon_store_element64(tt, n, size, tmp64);
 -                }
 -                if (n == 0 || (rd + n) > 32 || (dp && n > 16)) {
 -                    /* UNPREDICTABLE cases for bad immediates: we choose to
 -                     * UNDEF to avoid generating huge numbers of TCG ops
 -                     */
 -                    return 1;
 -                }
 -                if (rn == 15 && w) {
 -                    /* writeback to PC is UNPREDICTABLE, we choose to UNDEF */
 -                    return 1;
 -                }
 -
 -                if (s->thumb && rn == 15) {
 -                    /* This is actually UNPREDICTABLE */
 -                    addr = tcg_temp_new_i32();
 -                    tcg_gen_movi_i32(addr, s->pc & ~2);
 -                } else {
 -                    addr = load_reg(s, rn);
 -                }
 -                if (insn & (1 << 24)) /* pre-decrement */
 -                    tcg_gen_addi_i32(addr, addr, -((insn & 0xff) << 2));
 -
 -                if (s->v8m_stackcheck && rn == 13 && w) {
 -                    /*
 -                     * Here 'addr' is the lowest address we will store to,
 -                     * and is either the old SP (if post-increment) or
 -                     * the new SP (if pre-decrement). For post-increment
 -                     * where the old value is below the limit and the new
 -                     * value is above, it is UNKNOWN whether the limit check
 -                     * triggers; we choose to trigger.
 -                     */
 -                    gen_helper_v8m_stackcheck(cpu_env, addr);
 -                }
 -
 -                if (dp)
 -                    offset = 8;
 -                else
 -                    offset = 4;
 -                for (i = 0; i < n; i++) {
 -                    if (insn & ARM_CP_RW_BIT) {
 -                        /* load */
 -                        gen_vfp_ld(s, dp, addr);
 -                        gen_mov_vreg_F0(dp, rd + i);
 -                    } else {
--                        /* store */
+-                        neon_load_element64(tmp64, tt, n, size);
--                        gen_mov_F0_vreg(dp, rd + i);
+-                        gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
 -                        gen_vfp_st(s, dp, addr);
 -                    }
--                    tcg_gen_addi_i32(addr, addr, offset);
+-                    tcg_gen_add_i32(addr, addr, tmp2);
 -                }
 -                if (w) {
 -                    /* writeback */
 -                    if (insn & (1 << 24))
 -                        offset = -offset * n;
 -                    else if (dp && (insn & 1))
 -                        offset = 4;
 -                    else
 -                        offset = 0;
 -
 -                    if (offset != 0)
 -                        tcg_gen_addi_i32(addr, addr, offset);
 -                    store_reg(s, rn, addr);
 -                } else {
 -                    tcg_temp_free_i32(addr);
 -                }
 -            }
 -        }
--        break;
+-        tcg_temp_free_i32(addr);
-+        /* Already handled by decodetree */
+-        tcg_temp_free_i32(tmp2);
 -        tcg_temp_free_i64(tmp64);
 -        stride = nregs * interleave * 8;
 +        /* Load store all elements -- handled already by decodetree */
 +        return 1;
-     default:
+     } else {
-         /* Should never happen.  */
+         size = (insn >> 10) & 3;
-         return 1;
+         if (size == 3) {
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8 \
               vd=%vd_sp
  VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8 \
               vd=%vd_dp
 +
 +# We split the load/store multiple up into two patterns to avoid
 +# overlap with other insns in the "Advanced SIMD load/store and 64-bit move"
 +# grouping:
 +#   P=0 U=0 W=0 is 64-bit VMOV
 +#   P=1 W=0 is VLDR/VSTR
 +#   P=U W=1 is UNDEF
 +# leaving P=0 U=1 W=x and P=1 U=0 W=1 for load/store multiple.
 +# These include FSTM/FLDM.
 +VLDM_VSTM_sp ---- 1100 1 . w:1 l:1 rn:4 .... 1010 imm:8 \
 +             vd=%vd_sp p=0 u=1
 +VLDM_VSTM_dp ---- 1100 1 . w:1 l:1 rn:4 .... 1011 imm:8 \
 +             vd=%vd_dp p=0 u=1
 +
 +VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
 +             vd=%vd_sp p=1 u=0 w=1
 +VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
 +             vd=%vd_dp p=1 u=0 w=1
 --
 .20.1

-[Qemu-devel] [PULL 20/48] target/arm: Convert VFP two-register transfer insns to decodetree
+[PULL 31/39] target/arm: Convert Neon 'load single structure to all lanes' to decodetree
-Convert the VFP two-register transfer instructions to decodetree
+Convert the Neon "load single structure to all lanes" insns to
-(in the v8 Arm ARM these are the "Advanced SIMD and floating-point
+decodetree.
 -bit move" encoding group).
 Again, we expand out the sequences involving gen_vfp_msr() and
 gen_msr_vfp().
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-13-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 70 ++++++++++++++++++++++++++++++++++
+ target/arm/neon-ls.decode       |  5 +++
- target/arm/translate.c         | 46 +---------------------
+ target/arm/translate-neon.inc.c | 73 +++++++++++++++++++++++++++++++++
- target/arm/vfp.decode          |  5 +++
+ target/arm/translate.c          | 55 +------------------------
-files changed, 77 insertions(+), 44 deletions(-)
+files changed, 80 insertions(+), 53 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-ls.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-ls.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
+@@ -XXX,XX +XXX,XX @@
  VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
                 vd=%vd_dp
 +
 +# Neon load single element to all lanes
 +
 +VLD_all_lanes  1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
 +               vd=%vd_dp
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
      gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
      return true;
  }
 +
-+static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
++static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
 +{
-+    TCGv_i32 tmp;
++    /* Neon load single structure to all lanes */
 +    int reg, stride, vec_size;
 +    int vd = a->vd;
 +    int size = a->size;
 +    int nregs = a->n + 1;
 +    TCGv_i32 addr, tmp;
 +
-+    /*
++    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+     * VMOV between two general-purpose registers and two single precision
++        return false;
 +     * floating point registers
 +     */
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    if (a->op) {
++    /* UNDEF accesses to D16-D31 if they don't exist */
-+        /* fpreg to gpreg */
++    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
-+        tmp = tcg_temp_new_i32();
++        return false;
 +        neon_load_reg32(tmp, a->vm);
 +        store_reg(s, a->rt, tmp);
 +        tmp = tcg_temp_new_i32();
 +        neon_load_reg32(tmp, a->vm + 1);
 +        store_reg(s, a->rt2, tmp);
 +    } else {
 +        /* gpreg to fpreg */
 +        tmp = load_reg(s, a->rt);
 +        neon_store_reg32(tmp, a->vm);
 +        tmp = load_reg(s, a->rt2);
 +        neon_store_reg32(tmp, a->vm + 1);
 +    }
 +
-+    return true;
++    if (size == 3) {
-+}
++        if (nregs != 4 || a->a == 0) {
-+
++            return false;
-+static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
++        }
-+{
++        /* For VLD4 size == 3 a == 1 means 32 bits at 16 byte alignment */
-+    TCGv_i32 tmp;
++        size = 2;
-+
++    }
-+    /*
++    if (nregs == 1 && a->a == 1 && size == 0) {
-+     * VMOV between two general-purpose registers and one double precision
++        return false;
-+     * floating point register
++    }
-+     */
++    if (nregs == 3 && a->a == 1) {
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    if (a->op) {
++    /*
-+        /* fpreg to gpreg */
++     * VLD1 to all lanes: T bit indicates how many Dregs to write.
-+        tmp = tcg_temp_new_i32();
++     * VLD2/3/4 to all lanes: T bit indicates register stride.
-+        neon_load_reg32(tmp, a->vm * 2);
++     */
-+        store_reg(s, a->rt, tmp);
++    stride = a->t ? 2 : 1;
-+        tmp = tcg_temp_new_i32();
++    vec_size = nregs == 1 ? stride * 8 : 8;
-+        neon_load_reg32(tmp, a->vm * 2 + 1);
++
-+        store_reg(s, a->rt2, tmp);
++    tmp = tcg_temp_new_i32();
-+    } else {
++    addr = tcg_temp_new_i32();
-+        /* gpreg to fpreg */
++    load_reg_var(s, addr, a->rn);
-+        tmp = load_reg(s, a->rt);
++    for (reg = 0; reg < nregs; reg++) {
-+        neon_store_reg32(tmp, a->vm * 2);
++        gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
-+        tcg_temp_free_i32(tmp);
++                        s->be_data | size);
-+        tmp = load_reg(s, a->rt2);
++        if ((vd & 1) && vec_size == 16) {
-+        neon_store_reg32(tmp, a->vm * 2 + 1);
++            /*
-+        tcg_temp_free_i32(tmp);
++             * We cannot write 16 bytes at once because the
 +             * destination is unaligned.
 +             */
 +            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
 +                                 8, 8, tmp);
 +            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
 +                             neon_reg_offset(vd, 0), 8, 8);
 +        } else {
 +            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
 +                                 vec_size, vec_size, tmp);
 +        }
 +        tcg_gen_addi_i32(addr, addr, 1 << size);
 +        vd += stride;
 +    }
++    tcg_temp_free_i32(tmp);
++    tcg_temp_free_i32(addr);
++
++    gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << size) * nregs);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
-     case 0xc:
+     int size;
-     case 0xd:
+     int reg;
-         if ((insn & 0x03e00000) == 0x00400000) {
+     int load;
--            /* two-register transfer */
+-    int vec_size;
--            rn = (insn >> 16) & 0xf;
+     TCGv_i32 addr;
--            rd = (insn >> 12) & 0xf;
+     TCGv_i32 tmp;
--            if (dp) {
--                VFP_DREG_M(rm, insn);
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
--            } else {
+     } else {
--                rm = VFP_SREG_M(insn);
+         size = (insn >> 10) & 3;
          if (size == 3) {
 -            /* Load single element to all lanes.  */
 -            int a = (insn >> 4) & 1;
 -            if (!load) {
 -                return 1;
 -            }
+-            size = (insn >> 6) & 3;
+-            nregs = ((insn >> 8) & 3) + 1;
 -
--            if (insn & ARM_CP_RW_BIT) {
+-            if (size == 3) {
--                /* vfp->arm */
+-                if (nregs != 4 || a == 0) {
--                if (dp) {
+-                    return 1;
--                    gen_mov_F0_vreg(0, rm * 2);
+-                }
--                    tmp = gen_vfp_mrs();
+-                /* For VLD4 size==3 a == 1 means 32 bits at 16 byte alignment */
--                    store_reg(s, rd, tmp);
+-                size = 2;
--                    gen_mov_F0_vreg(0, rm * 2 + 1);
+-            }
--                    tmp = gen_vfp_mrs();
+-            if (nregs == 1 && a == 1 && size == 0) {
--                    store_reg(s, rn, tmp);
+-                return 1;
 -            }
 -            if (nregs == 3 && a == 1) {
 -                return 1;
 -            }
 -            addr = tcg_temp_new_i32();
 -            load_reg_var(s, addr, rn);
 -
 -            /* VLD1 to all lanes: bit 5 indicates how many Dregs to write.
 -             * VLD2/3/4 to all lanes: bit 5 indicates register stride.
 -             */
 -            stride = (insn & (1 << 5)) ? 2 : 1;
 -            vec_size = nregs == 1 ? stride * 8 : 8;
 -
 -            tmp = tcg_temp_new_i32();
 -            for (reg = 0; reg < nregs; reg++) {
 -                gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
 -                                s->be_data | size);
 -                if ((rd & 1) && vec_size == 16) {
 -                    /* We cannot write 16 bytes at once because the
 -                     * destination is unaligned.
 -                     */
 -                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
 -                                         8, 8, tmp);
 -                    tcg_gen_gvec_mov(0, neon_reg_offset(rd + 1, 0),
 -                                     neon_reg_offset(rd, 0), 8, 8);
 -                } else {
--                    gen_mov_F0_vreg(0, rm);
+-                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
--                    tmp = gen_vfp_mrs();
+-                                         vec_size, vec_size, tmp);
 -                    store_reg(s, rd, tmp);
 -                    gen_mov_F0_vreg(0, rm + 1);
 -                    tmp = gen_vfp_mrs();
 -                    store_reg(s, rn, tmp);
 -                }
--            } else {
+-                tcg_gen_addi_i32(addr, addr, 1 << size);
--                /* arm->vfp */
+-                rd += stride;
 -                if (dp) {
 -                    tmp = load_reg(s, rd);
 -                    gen_vfp_msr(tmp);
 -                    gen_mov_vreg_F0(0, rm * 2);
 -                    tmp = load_reg(s, rn);
 -                    gen_vfp_msr(tmp);
 -                    gen_mov_vreg_F0(0, rm * 2 + 1);
 -                } else {
 -                    tmp = load_reg(s, rd);
 -                    gen_vfp_msr(tmp);
 -                    gen_mov_vreg_F0(0, rm);
 -                    tmp = load_reg(s, rn);
 -                    gen_vfp_msr(tmp);
 -                    gen_mov_vreg_F0(0, rm + 1);
 -                }
 -            }
-+            /* Already handled by decodetree */
+-            tcg_temp_free_i32(tmp);
 -            tcg_temp_free_i32(addr);
 -            stride = (1 << size) * nregs;
 +            /* Load single element to all lanes -- handled by decodetree  */
 +            return 1;
          } else {
-             /* Load/store */
+             /* Single element.  */
-             rn = (insn >> 16) & 0xf;
+             int idx = (insn >> 4) & 0xf;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
  VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
  VMOV_single  ---- 1110 000 l:1 .... rt:4 1010 . 001 0000 \
               vn=%vn_sp
 +
 +VMOV_64_sp   ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 .... \
 +             vm=%vm_sp
 +VMOV_64_dp   ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 .... \
 +             vm=%vm_dp
 --
 .20.1

-[Qemu-devel] [PULL 19/48] target/arm: Convert "single-precision" register moves to decodetree
+[PULL 32/39] target/arm: Convert Neon 'load/store single structure' to decodetree
-Convert the "single-precision" register moves to decodetree:
+Convert the Neon "load/store single structure to one lane" insns to
- * VMSR
+decodetree.
- * VMRS
- * VMOV between general purpose register and single precision
+As this is the last set of insns in the neon load/store group,
+we can remove the whole disas_neon_ls_insn() function.
 Note that the VMSR/VMRS conversions make our handling of
 the "should this UNDEF?" checks consistent between the two
 instructions:
  * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0
    (previously was a nop)
  * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better
    (previously was a nop)
  * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better
    (previously would write to the register, which had no
    guest-visible effect because we always UNDEF reads)
 We also tighten up the decode: we were previously underdecoding
 some SBZ or SBO bits.
 The conversion of VMOV_single includes the expansion out of the
 gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr()
 sequences into the simpler direct load/store of the TCG temp via
 neon_{load,store}_reg32(): we know in the new function that we're
 always single-precision, we don't need to use the old-and-deprecated
 cpu_F0* TCG globals, and we don't happen to have the declaration of
 gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the
 new function is.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-14-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 161 +++++++++++++++++++++++++++++++++
+ target/arm/neon-ls.decode       |  11 +++
- target/arm/translate.c         | 148 +-----------------------------
+ target/arm/translate-neon.inc.c |  89 +++++++++++++++++++
- target/arm/vfp.decode          |   4 +
+ target/arm/translate.c          | 147 --------------------------------
-files changed, 168 insertions(+), 145 deletions(-)
+files changed, 100 insertions(+), 147 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-ls.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-ls.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
+@@ -XXX,XX +XXX,XX @@ VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
  VLD_all_lanes  1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
                 vd=%vd_dp
 +
 +# Neon load/store single structure to one lane
 +%imm1_5_p1 5:1 !function=plus1
 +%imm1_6_p1 6:1 !function=plus1
 +
 +VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 00 n:2 reg_idx:3 align:1 rm:4 \
 +               vd=%vd_dp size=0 stride=1
 +VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 01 n:2 reg_idx:2 align:2 rm:4 \
 +               vd=%vd_dp size=1 stride=%imm1_5_p1
 +VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 10 n:2 reg_idx:1 align:3 rm:4 \
 +               vd=%vd_dp size=2 stride=%imm1_6_p1
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
   * It might be possible to convert it to a standalone .c file eventually.
   */
 +static inline int plus1(DisasContext *s, int x)
 +{
 +    return x + 1;
 +}
 +
  /* Include the generated Neon decoder */
  #include "decode-neon-dp.inc.c"
  #include "decode-neon-ls.inc.c"
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
      return true;
  }
 +
-+static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
++static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
 +{
-+    TCGv_i32 tmp;
++    /* Neon load/store single structure to one lane */
-+    bool ignore_vfp_enabled = false;
++    int reg;
-+
++    int nregs = a->n + 1;
-+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
++    int vd = a->vd;
-+        /*
++    TCGv_i32 addr, tmp;
-+         * The only M-profile VFP vmrs/vmsr sysreg is FPSCR.
++
-+         * Writes to R15 are UNPREDICTABLE; we choose to undef.
++    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+         */
++        return false;
-+        if (a->rt == 15 || a->reg != ARM_VFP_FPSCR) {
++    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +
 +    /* Catch the UNDEF cases. This is unavoidably a bit messy. */
 +    switch (nregs) {
 +    case 1:
 +        if (((a->align & (1 << a->size)) != 0) ||
 +            (a->size == 2 && ((a->align & 3) == 1 || (a->align & 3) == 2))) {
 +            return false;
 +        }
-+    }
++        break;
-+
++    case 3:
-+    switch (a->reg) {
++        if ((a->align & 1) != 0) {
 +    case ARM_VFP_FPSID:
 +        /*
 +         * VFPv2 allows access to FPSID from userspace; VFPv3 restricts
 +         * all ID registers to privileged access only.
 +         */
 +        if (IS_USER(s) && arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 +            return false;
 +        }
-+        ignore_vfp_enabled = true;
++        /* fall through */
 +    case 2:
 +        if (a->size == 2 && (a->align & 2) != 0) {
 +            return false;
 +        }
 +        break;
-+    case ARM_VFP_MVFR0:
++    case 4:
-+    case ARM_VFP_MVFR1:
++        if ((a->size == 2) && ((a->align & 3) == 3)) {
 +        if (IS_USER(s) || !arm_dc_feature(s, ARM_FEATURE_MVFR)) {
 +            return false;
 +        }
 +        ignore_vfp_enabled = true;
 +        break;
 +    case ARM_VFP_MVFR2:
 +        if (IS_USER(s) || !arm_dc_feature(s, ARM_FEATURE_V8)) {
 +            return false;
 +        }
 +        ignore_vfp_enabled = true;
 +        break;
 +    case ARM_VFP_FPSCR:
 +        break;
 +    case ARM_VFP_FPEXC:
 +        if (IS_USER(s)) {
 +            return false;
 +        }
 +        ignore_vfp_enabled = true;
 +        break;
 +    case ARM_VFP_FPINST:
 +    case ARM_VFP_FPINST2:
 +        /* Not present in VFPv3 */
 +        if (IS_USER(s) || arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 +            return false;
 +        }
 +        break;
 +    default:
++        abort();
++    }
++    if ((vd + a->stride * (nregs - 1)) > 31) {
++        /*
++         * Attempts to write off the end of the register file are
++         * UNPREDICTABLE; we choose to UNDEF because otherwise we would
++         * access off the end of the array that holds the register data.
++         */
 +        return false;
 +    }
-+
-+    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
-+        return true;
-+    }
-+
-+    if (a->l) {
-+        /* VMRS, move VFP special register to gp register */
-+        switch (a->reg) {
-+        case ARM_VFP_FPSID:
-+        case ARM_VFP_FPEXC:
-+        case ARM_VFP_FPINST:
-+        case ARM_VFP_FPINST2:
-+        case ARM_VFP_MVFR0:
-+        case ARM_VFP_MVFR1:
-+        case ARM_VFP_MVFR2:
-+            tmp = load_cpu_field(vfp.xregs[a->reg]);
-+            break;
-+        case ARM_VFP_FPSCR:
-+            if (a->rt == 15) {
-+                tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
-+                tcg_gen_andi_i32(tmp, tmp, 0xf0000000);
-+            } else {
-+                tmp = tcg_temp_new_i32();
-+                gen_helper_vfp_get_fpscr(tmp, cpu_env);
-+            }
-+            break;
-+        default:
-+            g_assert_not_reached();
-+        }
-+
-+        if (a->rt == 15) {
-+            /* Set the 4 flag bits in the CPSR.  */
-+            gen_set_nzcv(tmp);
-+            tcg_temp_free_i32(tmp);
-+        } else {
-+            store_reg(s, a->rt, tmp);
-+        }
-+    } else {
-+        /* VMSR, move gp register to VFP special register */
-+        switch (a->reg) {
-+        case ARM_VFP_FPSID:
-+        case ARM_VFP_MVFR0:
-+        case ARM_VFP_MVFR1:
-+        case ARM_VFP_MVFR2:
-+            /* Writes are ignored.  */
-+            break;
-+        case ARM_VFP_FPSCR:
-+            tmp = load_reg(s, a->rt);
-+            gen_helper_vfp_set_fpscr(cpu_env, tmp);
-+            tcg_temp_free_i32(tmp);
-+            gen_lookup_tb(s);
-+            break;
-+        case ARM_VFP_FPEXC:
-+            /*
-+             * TODO: VFP subarchitecture support.
-+             * For now, keep the EN bit only
-+             */
-+            tmp = load_reg(s, a->rt);
-+            tcg_gen_andi_i32(tmp, tmp, 1 << 30);
-+            store_cpu_field(tmp, vfp.xregs[a->reg]);
-+            gen_lookup_tb(s);
-+            break;
-+        case ARM_VFP_FPINST:
-+        case ARM_VFP_FPINST2:
-+            tmp = load_reg(s, a->rt);
-+            store_cpu_field(tmp, vfp.xregs[a->reg]);
-+            break;
-+        default:
-+            g_assert_not_reached();
-+        }
-+    }
-+
-+    return true;
-+}
-+
-+static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
-+{
-+    TCGv_i32 tmp;
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    if (a->l) {
++    tmp = tcg_temp_new_i32();
-+        /* VFP to general purpose register */
++    addr = tcg_temp_new_i32();
-+        tmp = tcg_temp_new_i32();
++    load_reg_var(s, addr, a->rn);
-+        neon_load_reg32(tmp, a->vn);
++    /*
-+        if (a->rt == 15) {
++     * TODO: if we implemented alignment exceptions, we should check
-+            /* Set the 4 flag bits in the CPSR.  */
++     * addr against the alignment encoded in a->align here.
-+            gen_set_nzcv(tmp);
++     */
-+            tcg_temp_free_i32(tmp);
++    for (reg = 0; reg < nregs; reg++) {
-+        } else {
++        if (a->l) {
-+            store_reg(s, a->rt, tmp);
++            gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
-+        }
++                            s->be_data | a->size);
-+    } else {
++            neon_store_element(vd, a->reg_idx, a->size, tmp);
-+        /* general purpose register to VFP */
++        } else { /* Store */
-+        tmp = load_reg(s, a->rt);
++            neon_load_element(tmp, vd, a->reg_idx, a->size);
-+        neon_store_reg32(tmp, a->vn);
++            gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
-+        tcg_temp_free_i32(tmp);
++                            s->be_data | a->size);
-+    }
++        }
 +        vd += a->stride;
 +        tcg_gen_addi_i32(addr, addr, 1 << a->size);
 +    }
 +    tcg_temp_free_i32(addr);
 +    tcg_temp_free_i32(tmp);
 +
 +    gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << a->size) * nregs);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
-     TCGv_i32 addr;
+     tcg_temp_free_i32(rd);
-     TCGv_i32 tmp;
+ }
-     TCGv_i32 tmp2;
--    bool ignore_vfp_enabled = false;
+-
+-/* Translate a NEON load/store element instruction.  Return nonzero if the
-     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
+-   instruction is invalid.  */
-         return 1;
+-static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-{
-      * for invalid encodings; we will generate incorrect syndrome information
+-    int rd, rn, rm;
-      * for attempts to execute invalid vfp/neon encodings with FP disabled.
+-    int nregs;
-      */
+-    int stride;
--    if ((insn & 0x0fe00fff) == 0x0ee00a10) {
+-    int size;
--        rn = (insn >> 16) & 0xf;
+-    int reg;
--        if (rn == ARM_VFP_FPSID || rn == ARM_VFP_FPEXC || rn == ARM_VFP_MVFR2
+-    int load;
--            || rn == ARM_VFP_MVFR1 || rn == ARM_VFP_MVFR0) {
+-    TCGv_i32 addr;
--            ignore_vfp_enabled = true;
+-    TCGv_i32 tmp;
 -
 -    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -        return 1;
 -    }
 -
 -    /* FIXME: this access check should not take precedence over UNDEF
 -     * for invalid encodings; we will generate incorrect syndrome information
 -     * for attempts to execute invalid vfp/neon encodings with FP disabled.
 -     */
 -    if (s->fp_excp_el) {
 -        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
 -                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
 -        return 0;
 -    }
 -
 -    if (!s->vfp_enabled)
 -      return 1;
 -    VFP_DREG_D(rd, insn);
 -    rn = (insn >> 16) & 0xf;
 -    rm = insn & 0xf;
 -    load = (insn & (1 << 21)) != 0;
 -    if ((insn & (1 << 23)) == 0) {
 -        /* Load store all elements -- handled already by decodetree */
 -        return 1;
 -    } else {
 -        size = (insn >> 10) & 3;
 -        if (size == 3) {
 -            /* Load single element to all lanes -- handled by decodetree  */
 -            return 1;
 -        } else {
 -            /* Single element.  */
 -            int idx = (insn >> 4) & 0xf;
 -            int reg_idx;
 -            switch (size) {
 -            case 0:
 -                reg_idx = (insn >> 5) & 7;
 -                stride = 1;
 -                break;
 -            case 1:
 -                reg_idx = (insn >> 6) & 3;
 -                stride = (insn & (1 << 5)) ? 2 : 1;
 -                break;
 -            case 2:
 -                reg_idx = (insn >> 7) & 1;
 -                stride = (insn & (1 << 6)) ? 2 : 1;
 -                break;
 -            default:
 -                abort();
 -            }
 -            nregs = ((insn >> 8) & 3) + 1;
 -            /* Catch the UNDEF cases. This is unavoidably a bit messy. */
 -            switch (nregs) {
 -            case 1:
 -                if (((idx & (1 << size)) != 0) ||
 -                    (size == 2 && ((idx & 3) == 1 || (idx & 3) == 2))) {
 -                    return 1;
 -                }
 -                break;
 -            case 3:
 -                if ((idx & 1) != 0) {
 -                    return 1;
 -                }
 -                /* fall through */
 -            case 2:
 -                if (size == 2 && (idx & 2) != 0) {
 -                    return 1;
 -                }
 -                break;
 -            case 4:
 -                if ((size == 2) && ((idx & 3) == 3)) {
 -                    return 1;
 -                }
 -                break;
 -            default:
 -                abort();
 -            }
 -            if ((rd + stride * (nregs - 1)) > 31) {
 -                /* Attempts to write off the end of the register file
 -                 * are UNPREDICTABLE; we choose to UNDEF because otherwise
 -                 * the neon_load_reg() would write off the end of the array.
 -                 */
 -                return 1;
 -            }
 -            tmp = tcg_temp_new_i32();
 -            addr = tcg_temp_new_i32();
 -            load_reg_var(s, addr, rn);
 -            for (reg = 0; reg < nregs; reg++) {
 -                if (load) {
 -                    gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
 -                                    s->be_data | size);
 -                    neon_store_element(rd, reg_idx, size, tmp);
 -                } else { /* Store */
 -                    neon_load_element(tmp, rd, reg_idx, size);
 -                    gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
 -                                    s->be_data | size);
 -                }
 -                rd += stride;
 -                tcg_gen_addi_i32(addr, addr, 1 << size);
 -            }
 -            tcg_temp_free_i32(addr);
 -            tcg_temp_free_i32(tmp);
 -            stride = nregs * (1 << size);
 -        }
 -    }
--    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
+-    if (rm != 15) {
-+    if (!vfp_access_check(s)) {
+-        TCGv_i32 base;
-         return 0;
+-
-     }
+-        base = load_reg(s, rn);
+-        if (rm == 13) {
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-            tcg_gen_addi_i32(base, base, stride);
-     switch ((insn >> 24) & 0xf) {
+-        } else {
-     case 0xe:
+-            TCGv_i32 index;
-         if (insn & (1 << 4)) {
+-            index = load_reg(s, rm);
--            /* single register transfer */
+-            tcg_gen_add_i32(base, base, index);
--            rd = (insn >> 12) & 0xf;
+-            tcg_temp_free_i32(index);
--            if (dp) {
+-        }
--                /* already handled by decodetree */
+-        store_reg(s, rn, base);
--                return 1;
+-    }
--            } else { /* !dp */
+-    return 0;
--                bool is_sysreg;
+-}
 -
--                if ((insn & 0x6f) != 0x00)
+ static inline void gen_neon_narrow(int size, TCGv_i32 dest, TCGv_i64 src)
--                    return 1;
+ {
--                rn = VFP_SREG_N(insn);
+     switch (size) {
--
+@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
--                is_sysreg = extract32(insn, 21, 1);
+             }
--
+             return;
--                if (arm_dc_feature(s, ARM_FEATURE_M)) {
+         }
--                    /*
+-        if ((insn & 0x0f100000) == 0x04000000) {
--                     * The only M-profile VFP vmrs/vmsr sysreg is FPSCR.
+-            /* NEON load/store.  */
--                     * Writes to R15 are UNPREDICTABLE; we choose to undef.
+-            if (disas_neon_ls_insn(s, insn)) {
--                     */
+-                goto illegal_op;
--                    if (is_sysreg && (rd == 15 || (rn >> 1) != ARM_VFP_FPSCR)) {
+-            }
--                        return 1;
+-            return;
--                    }
+-        }
--                }
+         if ((insn & 0x0e000f00) == 0x0c000100) {
--
+             if (arm_dc_feature(s, ARM_FEATURE_IWMMXT)) {
--                if (insn & ARM_CP_RW_BIT) {
+                 /* iWMMXt register transfer.  */
--                    /* vfp->arm */
+@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
--                    if (is_sysreg) {
+         }
--                        /* system register */
+         break;
--                        rn >>= 1;
+     case 12:
--
+-        if ((insn & 0x01100000) == 0x01000000) {
--                        switch (rn) {
+-            if (disas_neon_ls_insn(s, insn)) {
--                        case ARM_VFP_FPSID:
+-                goto illegal_op;
--                            /* VFP2 allows access to FSID from userspace.
+-            }
--                               VFP3 restricts all id registers to privileged
+-            break;
--                               accesses.  */
+-        }
--                            if (IS_USER(s)
+         goto illegal_op;
--                                && arm_dc_feature(s, ARM_FEATURE_VFP3)) {
+     default:
--                                return 1;
+     illegal_op:
 -                            }
 -                            tmp = load_cpu_field(vfp.xregs[rn]);
 -                            break;
 -                        case ARM_VFP_FPEXC:
 -                            if (IS_USER(s))
 -                                return 1;
 -                            tmp = load_cpu_field(vfp.xregs[rn]);
 -                            break;
 -                        case ARM_VFP_FPINST:
 -                        case ARM_VFP_FPINST2:
 -                            /* Not present in VFP3.  */
 -                            if (IS_USER(s)
 -                                || arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                                return 1;
 -                            }
 -                            tmp = load_cpu_field(vfp.xregs[rn]);
 -                            break;
 -                        case ARM_VFP_FPSCR:
 -                            if (rd == 15) {
 -                                tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
 -                                tcg_gen_andi_i32(tmp, tmp, 0xf0000000);
 -                            } else {
 -                                tmp = tcg_temp_new_i32();
 -                                gen_helper_vfp_get_fpscr(tmp, cpu_env);
 -                            }
 -                            break;
 -                        case ARM_VFP_MVFR2:
 -                            if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
 -                                return 1;
 -                            }
 -                            /* fall through */
 -                        case ARM_VFP_MVFR0:
 -                        case ARM_VFP_MVFR1:
 -                            if (IS_USER(s)
 -                                || !arm_dc_feature(s, ARM_FEATURE_MVFR)) {
 -                                return 1;
 -                            }
 -                            tmp = load_cpu_field(vfp.xregs[rn]);
 -                            break;
 -                        default:
 -                            return 1;
 -                        }
 -                    } else {
 -                        gen_mov_F0_vreg(0, rn);
 -                        tmp = gen_vfp_mrs();
 -                    }
 -                    if (rd == 15) {
 -                        /* Set the 4 flag bits in the CPSR.  */
 -                        gen_set_nzcv(tmp);
 -                        tcg_temp_free_i32(tmp);
 -                    } else {
 -                        store_reg(s, rd, tmp);
 -                    }
 -                } else {
 -                    /* arm->vfp */
 -                    if (is_sysreg) {
 -                        rn >>= 1;
 -                        /* system register */
 -                        switch (rn) {
 -                        case ARM_VFP_FPSID:
 -                        case ARM_VFP_MVFR0:
 -                        case ARM_VFP_MVFR1:
 -                            /* Writes are ignored.  */
 -                            break;
 -                        case ARM_VFP_FPSCR:
 -                            tmp = load_reg(s, rd);
 -                            gen_helper_vfp_set_fpscr(cpu_env, tmp);
 -                            tcg_temp_free_i32(tmp);
 -                            gen_lookup_tb(s);
 -                            break;
 -                        case ARM_VFP_FPEXC:
 -                            if (IS_USER(s))
 -                                return 1;
 -                            /* TODO: VFP subarchitecture support.
 -                             * For now, keep the EN bit only */
 -                            tmp = load_reg(s, rd);
 -                            tcg_gen_andi_i32(tmp, tmp, 1 << 30);
 -                            store_cpu_field(tmp, vfp.xregs[rn]);
 -                            gen_lookup_tb(s);
 -                            break;
 -                        case ARM_VFP_FPINST:
 -                        case ARM_VFP_FPINST2:
 -                            if (IS_USER(s)) {
 -                                return 1;
 -                            }
 -                            tmp = load_reg(s, rd);
 -                            store_cpu_field(tmp, vfp.xregs[rn]);
 -                            break;
 -                        default:
 -                            return 1;
 -                        }
 -                    } else {
 -                        tmp = load_reg(s, rd);
 -                        gen_vfp_msr(tmp);
 -                        gen_mov_vreg_F0(0, rn);
 -                    }
 -                }
 -            }
 +            /* already handled by decodetree */
 +            return 1;
          } else {
              /* data processing */
              bool rd_is_dp = dp;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_from_gp ---- 1110 0 0 index:1 0 .... rt:4 1011 .00 1 0000 \
  VDUP         ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
               vn=%vn_dp
 +
 +VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
 +VMOV_single  ---- 1110 000 l:1 .... rt:4 1010 . 001 0000 \
 +             vn=%vn_sp
 --
 .20.1

-[Qemu-devel] [PULL 34/48] target/arm: Convert VMOV (imm) to decodetree
+[PULL 33/39] target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
-Convert the VFP VMOV (immediate) instruction to decodetree.
+Convert the Neon 3-reg-same VADD and VSUB insns to decodetree.
 Note that we don't need the neon_3r_sizes[op] check here because all
 size values are OK for VADD and VSUB; we'll add this when we convert
 the first insn that has size restrictions.
 For this we need one of the GVecGen*Fn typedefs currently in
 translate-a64.h; move them all to translate.h as a block so they
 are visible to the 32-bit decoder.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-15-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 129 +++++++++++++++++++++++++++++++++
+ target/arm/translate-a64.h      |  9 --------
- target/arm/translate.c         |  27 +------
+ target/arm/translate.h          |  9 ++++++++
- target/arm/vfp.decode          |   5 ++
+ target/arm/neon-dp.decode       | 17 +++++++++++++++
-files changed, 136 insertions(+), 25 deletions(-)
+ target/arm/translate-neon.inc.c | 38 +++++++++++++++++++++++++++++++++
  target/arm/translate.c          | 14 ++++--------
 files changed, 68 insertions(+), 19 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/translate-a64.h
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/translate-a64.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
+@@ -XXX,XX +XXX,XX @@ static inline int vec_full_reg_size(DisasContext *s)
  bool disas_sve(DisasContext *, uint32_t);
 -/* Note that the gvec expanders operate on offsets + sizes.  */
 -typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
 -typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
 -                         uint32_t, uint32_t);
 -typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
 -                        uint32_t, uint32_t, uint32_t);
 -typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
 -                        uint32_t, uint32_t, uint32_t);
 -
  #endif /* TARGET_ARM_TRANSLATE_A64_H */
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
  #define dc_isar_feature(name, ctx) \
      ({ DisasContext *ctx_ = (ctx); isar_feature_##name(ctx_->isar); })
 +/* Note that the gvec expanders operate on offsets + sizes.  */
 +typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
 +typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
 +                         uint32_t, uint32_t);
 +typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
 +                        uint32_t, uint32_t, uint32_t);
 +typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
 +                        uint32_t, uint32_t, uint32_t);
 +
  #endif /* TARGET_ARM_TRANSLATE_H */
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-dp.decode
 +++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
  #
  # This file is processed by scripts/decodetree.py
  #
 +# VFP/Neon register fields; same as vfp.decode
 +%vm_dp  5:1 0:4
 +%vn_dp  7:1 16:4
 +%vd_dp  22:1 12:4
  # Encodings for Neon data processing instructions where the T32 encoding
  # is a simple transformation of the A32 encoding.
@@ -XXX,XX +XXX,XX @@
  #   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
  # This file works on the A32 encoding only; calling code for T32 has to
  # transform the insn into the A32 version first.
 +
 +######################################################################
 +# 3-reg-same grouping:
 +# 1111 001 U 0 D sz:2 Vn:4 Vd:4 opc:4 N Q M op Vm:4
 +######################################################################
 +
 +&3same vm vn vd q size
 +
 +@3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
 +                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 +VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
      return true;
  }
 +
-+static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
++static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
 +{
-+    uint32_t delta_d = 0;
++    int vec_size = a->q ? 16 : 8;
-+    uint32_t bank_mask = 0;
++    int rd_ofs = neon_reg_offset(a->vd, 0);
-+    int veclen = s->vec_len;
++    int rn_ofs = neon_reg_offset(a->vn, 0);
-+    TCGv_i32 fd;
++    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    uint32_t n, i, vd;
 +
-+    vd = a->vd;
++    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +
 +    if (!dc_isar_feature(aa32_fpshvec, s) &&
 +        (veclen != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
-+    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
++    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if ((a->vn | a->vm | a->vd) & a->q) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    if (veclen > 0) {
++    fn(a->size, rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
 +        bank_mask = 0x18;
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = s->vec_stride + 1;
 +        }
 +    }
 +
 +    n = (a->imm4h << 28) & 0x80000000;
 +    i = ((a->imm4h << 4) & 0x70) | a->imm4l;
 +    if (i & 0x40) {
 +        i |= 0x780;
 +    } else {
 +        i |= 0x800;
 +    }
 +    n |= i << 19;
 +
 +    fd = tcg_temp_new_i32();
 +    tcg_gen_movi_i32(fd, n);
 +
 +    for (;;) {
 +        neon_store_reg32(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +    }
 +
 +    tcg_temp_free_i32(fd);
 +    return true;
 +}
 +
-+static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
++#define DO_3SAME(INSN, FUNC)                                            \
-+{
++    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
-+    uint32_t delta_d = 0;
++    {                                                                   \
-+    uint32_t bank_mask = 0;
++        return do_3same(s, a, FUNC);                                    \
 +    int veclen = s->vec_len;
 +    TCGv_i64 fd;
 +    uint32_t n, i, vd;
 +
 +    vd = a->vd;
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (vd & 0x10)) {
 +        return false;
 +    }
 +
-+    if (!dc_isar_feature(aa32_fpshvec, s) &&
++DO_3SAME(VADD, tcg_gen_gvec_add)
-+        (veclen != 0 || s->vec_stride != 0)) {
++DO_3SAME(VSUB, tcg_gen_gvec_sub)
 +        return false;
 +    }
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (veclen > 0) {
 +        bank_mask = 0xc;
 +        /* Figure out what type of vector operation this is.  */
 +        if ((vd & bank_mask) == 0) {
 +            /* scalar */
 +            veclen = 0;
 +        } else {
 +            delta_d = (s->vec_stride >> 1) + 1;
 +        }
 +    }
 +
 +    n = (a->imm4h << 28) & 0x80000000;
 +    i = ((a->imm4h << 4) & 0x70) | a->imm4l;
 +    if (i & 0x40) {
 +        i |= 0x3f80;
 +    } else {
 +        i |= 0x4000;
 +    }
 +    n |= i << 16;
 +
 +    fd = tcg_temp_new_i64();
 +    tcg_gen_movi_i64(fd, ((uint64_t)n) << 32);
 +
 +    for (;;) {
 +        neon_store_reg64(fd, vd);
 +
 +        if (veclen == 0) {
 +            break;
 +        }
 +
 +        /* Set up the operands for the next iteration */
 +        veclen--;
 +        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
 +    }
 +
 +    tcg_temp_free_i64(fd);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-  */
+             }
- static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+             return 0;
- {
--    uint32_t rd, rn, rm, op, i, n, delta_d, delta_m, bank_mask;
+-        case NEON_3R_VADD_VSUB:
-+    uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
+-            if (u) {
-     int dp, veclen;
+-                tcg_gen_gvec_sub(size, rd_ofs, rn_ofs, rm_ofs,
-     TCGv_i32 tmp;
+-                                 vec_size, vec_size);
-     TCGv_i32 tmp2;
+-            } else {
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-                tcg_gen_gvec_add(size, rd_ofs, rn_ofs, rm_ofs,
-             rn = VFP_SREG_N(insn);
+-                                 vec_size, vec_size);
+-            }
-             switch (op) {
+-            return 0;
 -            case 0 ... 13:
 +            case 0 ... 14:
                  /* Already handled by decodetree */
                  return 1;
              default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              for (;;) {
                  /* Perform the calculation.  */
                  switch (op) {
 -                case 14: /* fconst */
 -                    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                        return 1;
 -                    }
 -
--                    n = (insn << 12) & 0x80000000;
+         case NEON_3R_VQADD:
--                    i = ((insn >> 12) & 0x70) | (insn & 0xf);
+             tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
--                    if (dp) {
+                            rn_ofs, rm_ofs, vec_size, vec_size,
--                        if (i & 0x40)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
--                            i |= 0x3f80;
+             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
--                        else
+                            u ? &ushl_op[size] : &sshl_op[size]);
--                            i |= 0x4000;
+             return 0;
 -                        n |= i << 16;
 -                        tcg_gen_movi_i64(cpu_F0d, ((uint64_t)n) << 32);
 -                    } else {
 -                        if (i & 0x40)
 -                            i |= 0x780;
 -                        else
 -                            i |= 0x800;
 -                        n |= i << 19;
 -                        tcg_gen_movi_i32(cpu_F0s, n);
 -                    }
 -                    break;
                  case 15: /* extension space */
                      switch (rn) {
                      case 0: /* cpy */
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VFM_sp       ---- 1110 1.10 .... .... 1010 . o2:1 . 0 .... \
               vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=2
  VFM_dp       ---- 1110 1.10 .... .... 1011 . o2:1 . 0 .... \
               vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=2
 +
-+VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
++        case NEON_3R_VADD_VSUB:
-+             vd=%vd_sp
++            /* Already handled by decodetree */
-+VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
++            return 1;
-+             vd=%vd_dp
+         }
          if (size == 3) {
 --
 .20.1

-[Qemu-devel] [PULL 33/48] target/arm: Convert VFP fused multiply-add insns to decodetree
+[PULL 34/39] target/arm: Convert Neon 3-reg-same logic ops to decodetree
-Convert the VFP fused multiply-add instructions (VFNMA, VFNMS,
+Convert the Neon logic ops in the 3-reg-same grouping to decodetree.
-VFMA, VFMS) to decodetree.
+Note that for the logic ops the 'size' field forms part of their
+decode and the actual operations are always bitwise.
 Note that in the old decode structure we were implementing
 these to honour the VFP vector stride/length. These instructions
 were introduced in VFPv4, and in the v7A architecture they
 are UNPREDICTABLE if the vector stride or length are non-zero.
 In v8A they must UNDEF if stride or length are non-zero, like
 all VFP instructions; we choose to UNDEF always.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-16-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 121 +++++++++++++++++++++++++++++++++
+ target/arm/neon-dp.decode       | 12 +++++++++++
- target/arm/translate.c         |  53 +--------------
+ target/arm/translate-neon.inc.c | 19 +++++++++++++++++
- target/arm/vfp.decode          |   9 +++
+ target/arm/translate.c          | 38 +--------------------------------
-files changed, 131 insertions(+), 52 deletions(-)
+files changed, 32 insertions(+), 37 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-dp.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
+@@ -XXX,XX +XXX,XX @@
- {
+ @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
-     return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
+                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
- }
 +@3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
 +                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
 +
-+static bool trans_VFM_sp(DisasContext *s, arg_VFM_sp *a)
++VAND_3s          1111 001 0 0 . 00 .... .... 0001 ... 1 .... @3same_logic
-+{
++VBIC_3s          1111 001 0 0 . 01 .... .... 0001 ... 1 .... @3same_logic
-+    /*
++VORR_3s          1111 001 0 0 . 10 .... .... 0001 ... 1 .... @3same_logic
-+     * VFNMA : fd = muladd(-fd,  fn, fm)
++VORN_3s          1111 001 0 0 . 11 .... .... 0001 ... 1 .... @3same_logic
-+     * VFNMS : fd = muladd(-fd, -fn, fm)
++VEOR_3s          1111 001 1 0 . 00 .... .... 0001 ... 1 .... @3same_logic
-+     * VFMA  : fd = muladd( fd,  fn, fm)
++VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
-+     * VFMS  : fd = muladd( fd, -fn, fm)
++VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
-+     *
++VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 +     * These are fused multiply-add, and must be done as one floating
 +     * point operation with no rounding between the multiplication and
 +     * addition steps.  NB that doing the negations here as separate
 +     * steps is correct : an input NaN should come out with its sign
 +     * bit flipped if it is a negated-input.
 +     */
 +    TCGv_ptr fpst;
 +    TCGv_i32 vn, vm, vd;
 +
-+    /*
+ VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
-+     * Present in VFPv4 only.
+ VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
-+     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-+     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
+index XXXXXXX..XXXXXXX 100644
-+     */
+--- a/target/arm/translate-neon.inc.c
-+    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
++++ b/target/arm/translate-neon.inc.c
-+        (s->vec_len != 0 || s->vec_stride != 0)) {
+@@ -XXX,XX +XXX,XX @@ static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
-+        return false;
-+    }
+ DO_3SAME(VADD, tcg_gen_gvec_add)
  DO_3SAME(VSUB, tcg_gen_gvec_sub)
 +DO_3SAME(VAND, tcg_gen_gvec_and)
 +DO_3SAME(VBIC, tcg_gen_gvec_andc)
 +DO_3SAME(VORR, tcg_gen_gvec_or)
 +DO_3SAME(VORN, tcg_gen_gvec_orc)
 +DO_3SAME(VEOR, tcg_gen_gvec_xor)
 +
-+    if (!vfp_access_check(s)) {
++/* These insns are all gvec_bitsel but with the inputs in various orders. */
-+        return true;
++#define DO_3SAME_BITSEL(INSN, O1, O2, O3)                               \
-+    }
++    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
 +                                uint32_t rn_ofs, uint32_t rm_ofs,       \
 +                                uint32_t oprsz, uint32_t maxsz)         \
 +    {                                                                   \
 +        tcg_gen_gvec_bitsel(vece, rd_ofs, O1, O2, O3, oprsz, maxsz);    \
 +    }                                                                   \
 +    DO_3SAME(INSN, gen_##INSN##_3s)
 +
-+    vn = tcg_temp_new_i32();
++DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
-+    vm = tcg_temp_new_i32();
++DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
-+    vd = tcg_temp_new_i32();
++DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
 +
 +    neon_load_reg32(vn, a->vn);
 +    neon_load_reg32(vm, a->vm);
 +    if (a->o2) {
 +        /* VFNMS, VFMS */
 +        gen_helper_vfp_negs(vn, vn);
 +    }
 +    neon_load_reg32(vd, a->vd);
 +    if (a->o1 & 1) {
 +        /* VFNMA, VFNMS */
 +        gen_helper_vfp_negs(vd, vd);
 +    }
 +    fpst = get_fpstatus_ptr(0);
 +    gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
 +    neon_store_reg32(vd, a->vd);
 +
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(vn);
 +    tcg_temp_free_i32(vm);
 +    tcg_temp_free_i32(vd);
 +
 +    return true;
 +}
 +
 +static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
 +{
 +    /*
 +     * VFNMA : fd = muladd(-fd,  fn, fm)
 +     * VFNMS : fd = muladd(-fd, -fn, fm)
 +     * VFMA  : fd = muladd( fd,  fn, fm)
 +     * VFMS  : fd = muladd( fd, -fn, fm)
 +     *
 +     * These are fused multiply-add, and must be done as one floating
 +     * point operation with no rounding between the multiplication and
 +     * addition steps.  NB that doing the negations here as separate
 +     * steps is correct : an input NaN should come out with its sign
 +     * bit flipped if it is a negated-input.
 +     */
 +    TCGv_ptr fpst;
 +    TCGv_i64 vn, vm, vd;
 +
 +    /*
 +     * Present in VFPv4 only.
 +     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
 +     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
 +     */
 +    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
 +        (s->vec_len != 0 || s->vec_stride != 0)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vn = tcg_temp_new_i64();
 +    vm = tcg_temp_new_i64();
 +    vd = tcg_temp_new_i64();
 +
 +    neon_load_reg64(vn, a->vn);
 +    neon_load_reg64(vm, a->vm);
 +    if (a->o2) {
 +        /* VFNMS, VFMS */
 +        gen_helper_vfp_negd(vn, vn);
 +    }
 +    neon_load_reg64(vd, a->vd);
 +    if (a->o1 & 1) {
 +        /* VFNMA, VFNMS */
 +        gen_helper_vfp_negd(vd, vd);
 +    }
 +    fpst = get_fpstatus_ptr(0);
 +    gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
 +    neon_store_reg64(vd, a->vd);
 +
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i64(vn);
 +    tcg_temp_free_i64(vm);
 +    tcg_temp_free_i64(vd);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
+             }
+             return 1;
-             switch (op) {
--            case 0 ... 8:
+-        case NEON_3R_LOGIC: /* Logic ops.  */
-+            case 0 ... 13:
+-            switch ((u << 2) | size) {
-                 /* Already handled by decodetree */
+-            case 0: /* VAND */
-                 return 1;
+-                tcg_gen_gvec_and(0, rd_ofs, rn_ofs, rm_ofs,
-             default:
+-                                 vec_size, vec_size);
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-                break;
-             for (;;) {
+-            case 1: /* VBIC */
-                 /* Perform the calculation.  */
+-                tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
-                 switch (op) {
+-                                  vec_size, vec_size);
--                case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
+-                break;
--                case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
+-            case 2: /* VORR */
--                case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
+-                tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
--                case 13: /* VFMS  : fd = muladd( fd, -fn, fm) */
+-                                vec_size, vec_size);
--                    /* These are fused multiply-add, and must be done as one
+-                break;
--                     * floating point operation with no rounding between the
+-            case 3: /* VORN */
--                     * multiplication and addition steps.
+-                tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,
--                     * NB that doing the negations here as separate steps is
+-                                 vec_size, vec_size);
--                     * correct : an input NaN should come out with its sign bit
+-                break;
--                     * flipped if it is a negated-input.
+-            case 4: /* VEOR */
--                     */
+-                tcg_gen_gvec_xor(0, rd_ofs, rn_ofs, rm_ofs,
--                    if (!arm_dc_feature(s, ARM_FEATURE_VFP4)) {
+-                                 vec_size, vec_size);
--                        return 1;
+-                break;
--                    }
+-            case 5: /* VBSL */
--                    if (dp) {
+-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
--                        TCGv_ptr fpst;
+-                                    vec_size, vec_size);
--                        TCGv_i64 frd;
+-                break;
--                        if (op & 1) {
+-            case 6: /* VBIT */
--                            /* VFNMS, VFMS */
+-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
--                            gen_helper_vfp_negd(cpu_F0d, cpu_F0d);
+-                                    vec_size, vec_size);
--                        }
+-                break;
--                        frd = tcg_temp_new_i64();
+-            case 7: /* VBIF */
--                        tcg_gen_ld_f64(frd, cpu_env, vfp_reg_offset(dp, rd));
+-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
--                        if (op & 2) {
+-                                    vec_size, vec_size);
--                            /* VFNMA, VFNMS */
+-                break;
--                            gen_helper_vfp_negd(frd, frd);
+-            }
--                        }
+-            return 0;
--                        fpst = get_fpstatus_ptr(0);
+-
--                        gen_helper_vfp_muladdd(cpu_F0d, cpu_F0d,
+         case NEON_3R_VQADD:
--                                               cpu_F1d, frd, fpst);
+             tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
--                        tcg_temp_free_ptr(fpst);
+                            rn_ofs, rm_ofs, vec_size, vec_size,
--                        tcg_temp_free_i64(frd);
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
--                    } else {
+             return 0;
--                        TCGv_ptr fpst;
--                        TCGv_i32 frd;
+         case NEON_3R_VADD_VSUB:
--                        if (op & 1) {
++        case NEON_3R_LOGIC:
--                            /* VFNMS, VFMS */
+             /* Already handled by decodetree */
--                            gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
+             return 1;
--                        }
+         }
 -                        frd = tcg_temp_new_i32();
 -                        tcg_gen_ld_f32(frd, cpu_env, vfp_reg_offset(dp, rd));
 -                        if (op & 2) {
 -                            gen_helper_vfp_negs(frd, frd);
 -                        }
 -                        fpst = get_fpstatus_ptr(0);
 -                        gen_helper_vfp_muladds(cpu_F0s, cpu_F0s,
 -                                               cpu_F1s, frd, fpst);
 -                        tcg_temp_free_ptr(fpst);
 -                        tcg_temp_free_i32(frd);
 -                    }
 -                    break;
                  case 14: /* fconst */
                      if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
                          return 1;
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
               vm=%vm_sp vn=%vn_sp vd=%vd_sp
  VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +VFM_sp       ---- 1110 1.01 .... .... 1010 . o2:1 . 0 .... \
 +             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=1
 +VFM_dp       ---- 1110 1.01 .... .... 1011 . o2:1 . 0 .... \
 +             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=1
 +VFM_sp       ---- 1110 1.10 .... .... 1010 . o2:1 . 0 .... \
 +             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=2
 +VFM_dp       ---- 1110 1.10 .... .... 1011 . o2:1 . 0 .... \
 +             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=2
 --
 .20.1

-[Qemu-devel] [PULL 39/48] target/arm: Convert VFP comparison insns to decodetree
+[PULL 35/39] target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
-Convert the VFP comparison instructions to decodetree.
+Convert the Neon 3-reg-same VMAX and VMIN insns to decodetree.
 Note that comparison instructions should not honour the VFP
 short-vector length and stride information: they are scalar-only
 operations.  This applies to all the 2-operand instructions except
 for VMOV, VABS, VNEG and VSQRT.  (In the old decoder this is
 implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-17-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 75 ++++++++++++++++++++++++++++++++++
+ target/arm/neon-dp.decode       |  5 +++++
- target/arm/translate.c         | 51 +----------------------
+ target/arm/translate-neon.inc.c | 14 ++++++++++++++
- target/arm/vfp.decode          |  5 +++
+ target/arm/translate.c          | 21 ++-------------------
-files changed, 81 insertions(+), 50 deletions(-)
+files changed, 21 insertions(+), 19 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-dp.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
+@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
- {
+ VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
-     return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
+ VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
- }
 +VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
 +VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
 +VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
 +VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
 +
-+static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
+ VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
-+{
+ VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
-+    TCGv_i32 vd, vm;
+diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_3SAME(VEOR, tcg_gen_gvec_xor)
  DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
  DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
  DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
 +
-+    /* Vm/M bits must be zero for the Z variant */
++#define DO_3SAME_NO_SZ_3(INSN, FUNC)                                    \
-+    if (a->z && a->vm != 0) {
++    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
-+        return false;
++    {                                                                   \
 +        if (a->size == 3) {                                             \
 +            return false;                                               \
 +        }                                                               \
 +        return do_3same(s, a, FUNC);                                    \
 +    }
 +
-+    if (!vfp_access_check(s)) {
++DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
-+        return true;
++DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
-+    }
++DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
-+
++DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
 +    vd = tcg_temp_new_i32();
 +    vm = tcg_temp_new_i32();
 +
 +    neon_load_reg32(vd, a->vd);
 +    if (a->z) {
 +        tcg_gen_movi_i32(vm, 0);
 +    } else {
 +        neon_load_reg32(vm, a->vm);
 +    }
 +
 +    if (a->e) {
 +        gen_helper_vfp_cmpes(vd, vm, cpu_env);
 +    } else {
 +        gen_helper_vfp_cmps(vd, vm, cpu_env);
 +    }
 +
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i32(vm);
 +
 +    return true;
 +}
 +
 +static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
 +{
 +    TCGv_i64 vd, vm;
 +
 +    /* Vm/M bits must be zero for the Z variant */
 +    if (a->z && a->vm != 0) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    vd = tcg_temp_new_i64();
 +    vm = tcg_temp_new_i64();
 +
 +    neon_load_reg64(vd, a->vd);
 +    if (a->z) {
 +        tcg_gen_movi_i64(vm, 0);
 +    } else {
 +        neon_load_reg64(vm, a->vm);
 +    }
 +
 +    if (a->e) {
 +        gen_helper_vfp_cmped(vd, vm, cpu_env);
 +    } else {
 +        gen_helper_vfp_cmpd(vd, vm, cpu_env);
 +    }
 +
 +    tcg_temp_free_i64(vd);
 +    tcg_temp_free_i64(vm);
 +
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-         gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
+                              rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
- }
+             return 0;
--static inline void gen_vfp_cmp(int dp)
+-        case NEON_3R_VMAX:
--{
+-            if (u) {
--    if (dp)
+-                tcg_gen_gvec_umax(size, rd_ofs, rn_ofs, rm_ofs,
--        gen_helper_vfp_cmpd(cpu_F0d, cpu_F1d, cpu_env);
+-                                  vec_size, vec_size);
--    else
+-            } else {
--        gen_helper_vfp_cmps(cpu_F0s, cpu_F1s, cpu_env);
+-                tcg_gen_gvec_smax(size, rd_ofs, rn_ofs, rm_ofs,
--}
+-                                  vec_size, vec_size);
 -            }
 -            return 0;
 -        case NEON_3R_VMIN:
 -            if (u) {
 -                tcg_gen_gvec_umin(size, rd_ofs, rn_ofs, rm_ofs,
 -                                  vec_size, vec_size);
 -            } else {
 -                tcg_gen_gvec_smin(size, rd_ofs, rn_ofs, rm_ofs,
 -                                  vec_size, vec_size);
 -            }
 -            return 0;
 -
--static inline void gen_vfp_cmpe(int dp)
+         case NEON_3R_VSHL:
--{
+             /* Note the operation is vshl vd,vm,vn */
--    if (dp)
+             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
--        gen_helper_vfp_cmped(cpu_F0d, cpu_F1d, cpu_env);
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
--    else
--        gen_helper_vfp_cmpes(cpu_F0s, cpu_F1s, cpu_env);
+         case NEON_3R_VADD_VSUB:
--}
+         case NEON_3R_LOGIC:
--
++        case NEON_3R_VMAX:
--static inline void gen_vfp_F1_ld0(int dp)
++        case NEON_3R_VMIN:
--{
+             /* Already handled by decodetree */
--    if (dp)
+             return 1;
--        tcg_gen_movi_i64(cpu_F1d, 0);
+         }
 -    else
 -        tcg_gen_movi_i32(cpu_F1s, 0);
 -}
 -
  #define VFP_GEN_ITOF(name) \
  static inline void gen_vfp_##name(int dp, int neon) \
  { \
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              case 15:
                  switch (rn) {
                  case 0 ... 3:
 +                case 8 ... 11:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                      rd_is_dp = false;
                      break;
 -                case 0x08: case 0x0a: /* vcmp, vcmpz */
 -                case 0x09: case 0x0b: /* vcmpe, vcmpez */
 -                    no_output = true;
 -                    break;
 -
                  case 0x0c: /* vrintr */
                  case 0x0d: /* vrintz */
                  case 0x0e: /* vrintx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              /* Load the initial operands.  */
              if (op == 15) {
                  switch (rn) {
 -                case 0x08: case 0x09: /* Compare */
 -                    gen_mov_F0_vreg(dp, rd);
 -                    gen_mov_F1_vreg(dp, rm);
 -                    break;
 -                case 0x0a: case 0x0b: /* Compare with zero */
 -                    gen_mov_F0_vreg(dp, rd);
 -                    gen_vfp_F1_ld0(dp);
 -                    break;
                  case 0x14: /* vcvt fp <-> fixed */
                  case 0x15:
                  case 0x16:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                          gen_vfp_msr(tmp);
                          break;
                      }
 -                    case 8: /* cmp */
 -                        gen_vfp_cmp(dp);
 -                        break;
 -                    case 9: /* cmpe */
 -                        gen_vfp_cmpe(dp);
 -                        break;
 -                    case 10: /* cmpz */
 -                        gen_vfp_cmp(dp);
 -                        break;
 -                    case 11: /* cmpez */
 -                        gen_vfp_F1_ld0(dp);
 -                        gen_vfp_cmpe(dp);
 -                        break;
                      case 12: /* vrintr */
                      {
                          TCGv_ptr fpst = get_fpstatus_ptr(0);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
               vd=%vd_sp vm=%vm_sp
  VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
               vd=%vd_dp vm=%vm_dp
 +
 +VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
 +             vd=%vd_dp vm=%vm_dp
 --
 .20.1

-[Qemu-devel] [PULL 32/48] target/arm: Convert VDIV to decodetree
+[PULL 36/39] target/arm: Convert Neon 3-reg-same comparisons to decodetree
-Convert the VDIV instruction to decodetree.
+Convert the Neon comparison ops in the 3-reg-same grouping
 to decodetree.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-18-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 10 ++++++++++
+ target/arm/neon-dp.decode       |  8 ++++++++
- target/arm/translate.c         | 21 +--------------------
+ target/arm/translate-neon.inc.c | 22 ++++++++++++++++++++++
- target/arm/vfp.decode          |  5 +++++
+ target/arm/translate.c          | 23 +++--------------------
-files changed, 16 insertions(+), 20 deletions(-)
+files changed, 33 insertions(+), 20 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-dp.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
+@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
- {
+ VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
-     return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
+ VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
- }
 +VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
 +VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
 +VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
 +VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
 +
-+static bool trans_VDIV_sp(DisasContext *s, arg_VDIV_sp *a)
+ VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
  VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
  VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
@@ -XXX,XX +XXX,XX @@ VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
  VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
  VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
 +
 +VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
 +VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
  DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
  DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
  DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
 +
 +#define DO_3SAME_CMP(INSN, COND)                                        \
 +    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
 +                                uint32_t rn_ofs, uint32_t rm_ofs,       \
 +                                uint32_t oprsz, uint32_t maxsz)         \
 +    {                                                                   \
 +        tcg_gen_gvec_cmp(COND, vece, rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz); \
 +    }                                                                   \
 +    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
 +
 +DO_3SAME_CMP(VCGT_S, TCG_COND_GT)
 +DO_3SAME_CMP(VCGT_U, TCG_COND_GTU)
 +DO_3SAME_CMP(VCGE_S, TCG_COND_GE)
 +DO_3SAME_CMP(VCGE_U, TCG_COND_GEU)
 +DO_3SAME_CMP(VCEQ, TCG_COND_EQ)
 +
 +static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
 +                         uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
 +{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_divs, a->vd, a->vn, a->vm, false);
++    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
 +}
-+
++DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
 +static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
 +{
 +    return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_fpstatus_ptr(int neon)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-     return statusptr;
+                            u ? &mls_op[size] : &mla_op[size]);
- }
+             return 0;
--#define VFP_OP2(name)                                                 \
+-        case NEON_3R_VTST_VCEQ:
--static inline void gen_vfp_##name(int dp)                             \
+-            if (u) { /* VCEQ */
--{                                                                     \
+-                tcg_gen_gvec_cmp(TCG_COND_EQ, size, rd_ofs, rn_ofs, rm_ofs,
--    TCGv_ptr fpst = get_fpstatus_ptr(0);                              \
+-                                 vec_size, vec_size);
--    if (dp) {                                                         \
+-            } else { /* VTST */
--        gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, cpu_F1d, fpst);    \
+-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
--    } else {                                                          \
+-                               vec_size, vec_size, &cmtst_op[size]);
--        gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, cpu_F1s, fpst);    \
+-            }
--    }                                                                 \
+-            return 0;
 -    tcg_temp_free_ptr(fpst);                                          \
 -}
 -
--VFP_OP2(div)
+-        case NEON_3R_VCGT:
 -            tcg_gen_gvec_cmp(u ? TCG_COND_GTU : TCG_COND_GT, size,
 -                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
 -            return 0;
 -
--#undef VFP_OP2
+-        case NEON_3R_VCGE:
 -            tcg_gen_gvec_cmp(u ? TCG_COND_GEU : TCG_COND_GE, size,
 -                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
 -            return 0;
 -
- static inline void gen_vfp_abs(int dp)
+         case NEON_3R_VSHL:
- {
+             /* Note the operation is vshl vd,vm,vn */
-     if (dp)
+             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
+         case NEON_3R_LOGIC:
+         case NEON_3R_VMAX:
-             switch (op) {
+         case NEON_3R_VMIN:
--            case 0 ... 7:
++        case NEON_3R_VTST_VCEQ:
-+            case 0 ... 8:
++        case NEON_3R_VCGT:
-                 /* Already handled by decodetree */
++        case NEON_3R_VCGE:
-                 return 1;
+             /* Already handled by decodetree */
-             default:
+             return 1;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+         }
              for (;;) {
                  /* Perform the calculation.  */
                  switch (op) {
 -                case 8: /* div: fn / fm */
 -                    gen_vfp_div(dp);
 -                    break;
                  case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
                  case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
                  case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
               vm=%vm_sp vn=%vn_sp vd=%vd_sp
  VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
 +             vm=%vm_sp vn=%vn_sp vd=%vd_sp
 +VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
 +             vm=%vm_dp vn=%vn_dp vd=%vd_dp
 --
 .20.1

-[Qemu-devel] [PULL 46/48] target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree
+[PULL 37/39] target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
-Convert the VCVT (between floating-point and fixed-point) instructions
+Convert the Neon VQADD/VQSUB insns in the 3-reg-same grouping
 to decodetree.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-19-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 124 +++++++++++++++++++++++++++++++++
+ target/arm/neon-dp.decode       |  6 ++++++
- target/arm/translate.c         |  57 +--------------
+ target/arm/translate-neon.inc.c | 15 +++++++++++++++
- target/arm/vfp.decode          |  10 +++
+ target/arm/translate.c          | 14 ++------------
-files changed, 136 insertions(+), 55 deletions(-)
+files changed, 23 insertions(+), 12 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-dp.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
+@@ -XXX,XX +XXX,XX @@
-     tcg_temp_free_i32(vd);
+ @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
-     return true;
+                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
 +VQADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 1 .... @3same
 +
  @3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
                   &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
  VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
  VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 +VQSUB_S_3s       1111 001 0 0 . .. .... .... 0010 . . . 1 .... @3same
 +VQSUB_U_3s       1111 001 1 0 . .. .... .... 0010 . . . 1 .... @3same
 +
  VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
  VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
  VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
      tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
  }
+ DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
 +
-+static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
++#define DO_3SAME_GVEC4(INSN, OPARRAY)                                   \
-+{
++    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
-+    TCGv_i32 vd, shift;
++                                uint32_t rn_ofs, uint32_t rm_ofs,       \
-+    TCGv_ptr fpst;
++                                uint32_t oprsz, uint32_t maxsz)         \
-+    int frac_bits;
++    {                                                                   \
 +        tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),           \
 +                       rn_ofs, rm_ofs, oprsz, maxsz, &OPARRAY[vece]);   \
 +    }                                                                   \
 +    DO_3SAME(INSN, gen_##INSN##_3s)
 +
-+    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
++DO_3SAME_GVEC4(VQADD_S, sqadd_op)
-+        return false;
++DO_3SAME_GVEC4(VQADD_U, uqadd_op)
-+    }
++DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
-+
++DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 +
 +    vd = tcg_temp_new_i32();
 +    neon_load_reg32(vd, a->vd);
 +
 +    fpst = get_fpstatus_ptr(false);
 +    shift = tcg_const_i32(frac_bits);
 +
 +    /* Switch on op:U:sx bits */
 +    switch (a->opc) {
 +    case 0:
 +        gen_helper_vfp_shtos(vd, vd, shift, fpst);
 +        break;
 +    case 1:
 +        gen_helper_vfp_sltos(vd, vd, shift, fpst);
 +        break;
 +    case 2:
 +        gen_helper_vfp_uhtos(vd, vd, shift, fpst);
 +        break;
 +    case 3:
 +        gen_helper_vfp_ultos(vd, vd, shift, fpst);
 +        break;
 +    case 4:
 +        gen_helper_vfp_toshs_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 5:
 +        gen_helper_vfp_tosls_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 6:
 +        gen_helper_vfp_touhs_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 7:
 +        gen_helper_vfp_touls_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    neon_store_reg32(vd, a->vd);
 +    tcg_temp_free_i32(vd);
 +    tcg_temp_free_i32(shift);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
 +static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
 +{
 +    TCGv_i64 vd;
 +    TCGv_i32 shift;
 +    TCGv_ptr fpst;
 +    int frac_bits;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 +
 +    vd = tcg_temp_new_i64();
 +    neon_load_reg64(vd, a->vd);
 +
 +    fpst = get_fpstatus_ptr(false);
 +    shift = tcg_const_i32(frac_bits);
 +
 +    /* Switch on op:U:sx bits */
 +    switch (a->opc) {
 +    case 0:
 +        gen_helper_vfp_shtod(vd, vd, shift, fpst);
 +        break;
 +    case 1:
 +        gen_helper_vfp_sltod(vd, vd, shift, fpst);
 +        break;
 +    case 2:
 +        gen_helper_vfp_uhtod(vd, vd, shift, fpst);
 +        break;
 +    case 3:
 +        gen_helper_vfp_ultod(vd, vd, shift, fpst);
 +        break;
 +    case 4:
 +        gen_helper_vfp_toshd_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 5:
 +        gen_helper_vfp_tosld_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 6:
 +        gen_helper_vfp_touhd_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    case 7:
 +        gen_helper_vfp_tould_round_to_zero(vd, vd, shift, fpst);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    neon_store_reg64(vd, a->vd);
 +    tcg_temp_free_i64(vd);
 +    tcg_temp_free_i32(shift);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp, int shift, int neon) \
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-     tcg_temp_free_i32(tmp_shift); \
+             }
-     tcg_temp_free_ptr(statusptr); \
+             return 1;
- }
--VFP_GEN_FIX(tosh, _round_to_zero)
+-        case NEON_3R_VQADD:
- VFP_GEN_FIX(tosl, _round_to_zero)
+-            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
--VFP_GEN_FIX(touh, _round_to_zero)
+-                           rn_ofs, rm_ofs, vec_size, vec_size,
- VFP_GEN_FIX(toul, _round_to_zero)
+-                           (u ? uqadd_op : sqadd_op) + size);
--VFP_GEN_FIX(shto, )
+-            return 0;
  VFP_GEN_FIX(slto, )
 -VFP_GEN_FIX(uhto, )
  VFP_GEN_FIX(ulto, )
  #undef VFP_GEN_FIX
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  return 1;
              case 15:
                  switch (rn) {
 -                case 0 ... 19:
 +                case 0 ... 23:
 +                case 28 ... 31:
                      /* Already handled by decodetree */
                      return 1;
                  default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                      rd_is_dp = false;
                      break;
 -                case 0x14: /* vcvt fp <-> fixed */
 -                case 0x15:
 -                case 0x16:
 -                case 0x17:
 -                case 0x1c:
 -                case 0x1d:
 -                case 0x1e:
 -                case 0x1f:
 -                    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                        return 1;
 -                    }
 -                    /* Immediate frac_bits has same format as SREG_M.  */
 -                    rm_is_dp = false;
 -                    break;
 -
-                 default:
+-        case NEON_3R_VQSUB:
-                     return 1;
+-            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
-                 }
+-                           rn_ofs, rm_ofs, vec_size, vec_size,
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-                           (u ? uqsub_op : sqsub_op) + size);
-             /* Load the initial operands.  */
+-            return 0;
-             if (op == 15) {
+-
-                 switch (rn) {
+         case NEON_3R_VMUL: /* VMUL */
--                case 0x14: /* vcvt fp <-> fixed */
+             if (u) {
--                case 0x15:
+                 /* Polynomial case allows only P8.  */
--                case 0x16:
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
--                case 0x17:
+         case NEON_3R_VTST_VCEQ:
--                case 0x1c:
+         case NEON_3R_VCGT:
--                case 0x1d:
+         case NEON_3R_VCGE:
--                case 0x1e:
++        case NEON_3R_VQADD:
--                case 0x1f:
++        case NEON_3R_VQSUB:
--                    /* Source and destination the same.  */
+             /* Already handled by decodetree */
--                    gen_mov_F0_vreg(dp, rd);
+             return 1;
--                    break;
+         }
                  default:
                      /* One source operand.  */
                      gen_mov_F0_vreg(rm_is_dp, rm);
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                  switch (op) {
                  case 15: /* extension space */
                      switch (rn) {
 -                    case 20: /* fshto */
 -                        gen_vfp_shto(dp, 16 - rm, 0);
 -                        break;
 -                    case 21: /* fslto */
 -                        gen_vfp_slto(dp, 32 - rm, 0);
 -                        break;
 -                    case 22: /* fuhto */
 -                        gen_vfp_uhto(dp, 16 - rm, 0);
 -                        break;
 -                    case 23: /* fulto */
 -                        gen_vfp_ulto(dp, 32 - rm, 0);
 -                        break;
                      case 24: /* ftoui */
                          gen_vfp_toui(dp, 0);
                          break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                      case 27: /* ftosiz */
                          gen_vfp_tosiz(dp, 0);
                          break;
 -                    case 28: /* ftosh */
 -                        gen_vfp_tosh(dp, 16 - rm, 0);
 -                        break;
 -                    case 29: /* ftosl */
 -                        gen_vfp_tosl(dp, 32 - rm, 0);
 -                        break;
 -                    case 30: /* ftouh */
 -                        gen_vfp_touh(dp, 16 - rm, 0);
 -                        break;
 -                    case 31: /* ftoul */
 -                        gen_vfp_toul(dp, 32 - rm, 0);
 -                        break;
                      default: /* undefined */
                          g_assert_not_reached();
                      }
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
  # VJCVT is always dp to sp
  VJCVT        ---- 1110 1.11 1001 .... 1011 11.0 .... \
               vd=%vd_sp vm=%vm_dp
 +
 +# VCVT between floating-point and fixed-point. The immediate value
 +# is in the same format as a Vm single-precision register number.
 +# We assemble bits 18 (op), 16 (u) and 7 (sx) into a single opc field
 +# for the convenience of the trans_VCVT_fix functions.
 +%vcvt_fix_op 18:1 16:1 7:1
 +VCVT_fix_sp  ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
 +             vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
 +VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
 +             vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
 --
 .20.1

-[Qemu-devel] [PULL 40/48] target/arm: Convert the VCVT-from-f16 insns to decodetree
+[PULL 38/39] target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
-Convert the VCVTT, VCVTB instructions that deal with conversion
+Convert the Neon VMUL, VMLA, VMLS and VSHL insns in the
-from half-precision floats to f32 or 64 to decodetree.
+-reg-same grouping to decodetree.
 Since we're no longer constrained to the old decoder's style
 using cpu_F0s and cpu_F0d we can perform a direct 16 bit
 load of the right half of the input single-precision register
 rather than loading the full 32 bits and then doing a
 separate shift or sign-extension.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200430181003.21682-20-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 82 ++++++++++++++++++++++++++++++++++
+ target/arm/neon-dp.decode       |  9 +++++++
- target/arm/translate.c         | 56 +----------------------
+ target/arm/translate-neon.inc.c | 44 +++++++++++++++++++++++++++++++++
- target/arm/vfp.decode          |  6 +++
+ target/arm/translate.c          | 28 +++------------------
-files changed, 89 insertions(+), 55 deletions(-)
+files changed, 56 insertions(+), 25 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/neon-dp.decode
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
- #include "decode-vfp.inc.c"
+ VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
- #include "decode-vfp-uncond.inc.c"
+ VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
-+/*
++VSHL_S_3s        1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same
-+ * Return the offset of a 16-bit half of the specified VFP single-precision
++VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same
-+ * register. If top is true, returns the top 16 bits; otherwise the bottom
++
-+ * 16 bits.
+ VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
-+ */
+ VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
-+static inline long vfp_f16_offset(unsigned reg, bool top)
+ VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
@@ -XXX,XX +XXX,XX @@ VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
  VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
  VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
 +
 +VMLA_3s          1111 001 0 0 . .. .... .... 1001 . . . 0 .... @3same
 +VMLS_3s          1111 001 1 0 . .. .... .... 1001 . . . 0 .... @3same
 +
 +VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
 +VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
  DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
  DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
  DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
 +DO_3SAME_NO_SZ_3(VMUL, tcg_gen_gvec_mul)
  #define DO_3SAME_CMP(INSN, COND)                                        \
      static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
@@ -XXX,XX +XXX,XX @@ DO_3SAME_GVEC4(VQADD_S, sqadd_op)
  DO_3SAME_GVEC4(VQADD_U, uqadd_op)
  DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
  DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
 +
 +static void gen_VMUL_p_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
 +                           uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
 +{
-+    long offs = vfp_reg_offset(false, reg);
++    tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz,
-+#ifdef HOST_WORDS_BIGENDIAN
++                       0, gen_helper_gvec_pmul_b);
 +    if (!top) {
 +        offs += 2;
 +    }
 +#else
 +    if (top) {
 +        offs += 2;
 +    }
 +#endif
 +    return offs;
 +}
 +
- /*
++static bool trans_VMUL_p_3s(DisasContext *s, arg_3same *a)
   * Check that VFP access is enabled. If it is, do the necessary
   * M-profile lazy-FP handling and then return true.
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
      return true;
  }
 +
 +static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
 +{
-+    TCGv_ptr fpst;
++    if (a->size != 0) {
 +    TCGv_i32 ahp_mode;
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
 +        return false;
 +    }
-+
++    return do_3same(s, a, gen_VMUL_p_3s);
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = get_fpstatus_ptr(false);
 +    ahp_mode = get_ahp_flag();
 +    tmp = tcg_temp_new_i32();
 +    /* The T bit tells us if we want the low or high 16 bits of Vm */
 +    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
 +    gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
 +    neon_store_reg32(tmp, a->vd);
 +    tcg_temp_free_i32(ahp_mode);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 +
-+static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
++#define DO_3SAME_GVEC3_NO_SZ_3(INSN, OPARRAY)                           \
-+{
++    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
-+    TCGv_ptr fpst;
++                                uint32_t rn_ofs, uint32_t rm_ofs,       \
-+    TCGv_i32 ahp_mode;
++                                uint32_t oprsz, uint32_t maxsz)         \
-+    TCGv_i32 tmp;
++    {                                                                   \
-+    TCGv_i64 vd;
++        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,                          \
 +                       oprsz, maxsz, &OPARRAY[vece]);                   \
 +    }                                                                   \
 +    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
 +
-+    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
-+        return false;
-+    }
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++DO_3SAME_GVEC3_NO_SZ_3(VMLA, mla_op)
-+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd  & 0x10)) {
++DO_3SAME_GVEC3_NO_SZ_3(VMLS, mls_op)
 +        return false;
 +    }
 +
-+    if (!vfp_access_check(s)) {
++#define DO_3SAME_GVEC3_SHIFT(INSN, OPARRAY)                             \
-+        return true;
++    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
-+    }
++                                uint32_t rn_ofs, uint32_t rm_ofs,       \
 +                                uint32_t oprsz, uint32_t maxsz)         \
 +    {                                                                   \
 +        /* Note the operation is vshl vd,vm,vn */                       \
 +        tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs,                          \
 +                       oprsz, maxsz, &OPARRAY[vece]);                   \
 +    }                                                                   \
 +    DO_3SAME(INSN, gen_##INSN##_3s)
 +
-+    fpst = get_fpstatus_ptr(false);
++DO_3SAME_GVEC3_SHIFT(VSHL_S, sshl_op)
-+    ahp_mode = get_ahp_flag();
++DO_3SAME_GVEC3_SHIFT(VSHL_U, ushl_op)
 +    tmp = tcg_temp_new_i32();
 +    /* The T bit tells us if we want the low or high 16 bits of Vm */
 +    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
 +    vd = tcg_temp_new_i64();
 +    gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
 +    neon_store_reg64(vd, a->vd);
 +    tcg_temp_free_i32(ahp_mode);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i64(vd);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                 return 1;
+             }
-             case 15:
+             return 1;
-                 switch (rn) {
--                case 0 ... 3:
+-        case NEON_3R_VMUL: /* VMUL */
-+                case 0 ... 5:
+-            if (u) {
-                 case 8 ... 11:
+-                /* Polynomial case allows only P8.  */
-                     /* Already handled by decodetree */
+-                if (size != 0) {
-                     return 1;
+-                    return 1;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+-                }
-             if (op == 15) {
+-                tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
-                 /* rn is opcode, encoded as per VFP_SREG_N. */
+-                                   0, gen_helper_gvec_pmul_b);
-                 switch (rn) {
+-            } else {
--                case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
+-                tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs,
--                case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
+-                                 vec_size, vec_size);
--                    /*
+-            }
--                     * VCVTB, VCVTT: only present with the halfprec extension
+-            return 0;
--                     * UNPREDICTABLE if bit 8 is set prior to ARMv8
+-
--                     * (we choose to UNDEF)
+-        case NEON_3R_VML: /* VMLA, VMLS */
--                     */
+-            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
--                    if (dp) {
+-                           u ? &mls_op[size] : &mla_op[size]);
--                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
+-            return 0;
--                            return 1;
+-
--                        }
+-        case NEON_3R_VSHL:
--                    } else {
+-            /* Note the operation is vshl vd,vm,vn */
--                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
+-            tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
--                            return 1;
+-                           u ? &ushl_op[size] : &sshl_op[size]);
--                        }
+-            return 0;
--                    }
+-
--                    rm_is_dp = false;
+         case NEON_3R_VADD_VSUB:
--                    break;
+         case NEON_3R_LOGIC:
-                 case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
+         case NEON_3R_VMAX:
-                 case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     if (dp) {
+         case NEON_3R_VCGE:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+         case NEON_3R_VQADD:
-                 switch (op) {
+         case NEON_3R_VQSUB:
-                 case 15: /* extension space */
++        case NEON_3R_VMUL:
-                     switch (rn) {
++        case NEON_3R_VML:
--                    case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
++        case NEON_3R_VSHL:
--                    {
+             /* Already handled by decodetree */
--                        TCGv_ptr fpst = get_fpstatus_ptr(false);
+             return 1;
--                        TCGv_i32 ahp_mode = get_ahp_flag();
+         }
 -                        tmp = gen_vfp_mrs();
 -                        tcg_gen_ext16u_i32(tmp, tmp);
 -                        if (dp) {
 -                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
 -                                                           fpst, ahp_mode);
 -                        } else {
 -                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
 -                                                           fpst, ahp_mode);
 -                        }
 -                        tcg_temp_free_i32(ahp_mode);
 -                        tcg_temp_free_ptr(fpst);
 -                        tcg_temp_free_i32(tmp);
 -                        break;
 -                    }
 -                    case 5: /* vcvtt.f32.f16, vcvtt.f64.f16 */
 -                    {
 -                        TCGv_ptr fpst = get_fpstatus_ptr(false);
 -                        TCGv_i32 ahp = get_ahp_flag();
 -                        tmp = gen_vfp_mrs();
 -                        tcg_gen_shri_i32(tmp, tmp, 16);
 -                        if (dp) {
 -                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
 -                                                           fpst, ahp);
 -                        } else {
 -                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
 -                                                           fpst, ahp);
 -                        }
 -                        tcg_temp_free_i32(tmp);
 -                        tcg_temp_free_i32(ahp);
 -                        tcg_temp_free_ptr(fpst);
 -                        break;
 -                    }
                      case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
                      {
                          TCGv_ptr fpst = get_fpstatus_ptr(false);
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
               vd=%vd_dp vm=%vm_dp
 +
 +# VCVTT and VCVTB from f16: Vd format depends on size bit; Vm is always vm_sp
 +VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
 +VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
 +             vd=%vd_dp vm=%vm_sp
 --
 .20.1

-[Qemu-devel] [PULL 02/48] target/arm: Use tcg_gen_gvec_bitsel
+[PULL 39/39] target/arm: Move gen_ function typedefs to translate.h
-From: Richard Henderson <richard.henderson@linaro.org>
+We're going to want at least some of the NeonGen* typedefs
 for the refactored 32-bit Neon decoder, so move them all
 to translate.h since it makes more sense to keep them in
 one group.
-This replaces 3 target-specific implementations for BIT, BIF, and BSL.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200430181003.21682-23-peter.maydell@linaro.org
 ---
  target/arm/translate.h     | 17 +++++++++++++++++
  target/arm/translate-a64.c | 17 -----------------
 files changed, 17 insertions(+), 17 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20190518191934.21887-3-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/translate-a64.h |  2 +
- target/arm/translate.h     |  3 --
- target/arm/translate-a64.c | 15 ++++++--
- target/arm/translate.c     | 78 +++-----------------------------------
-files changed, 20 insertions(+), 78 deletions(-)
-diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.h
-+++ b/target/arm/translate-a64.h
-@@ -XXX,XX +XXX,XX @@ typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
-                          uint32_t, uint32_t);
- typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
-                         uint32_t, uint32_t, uint32_t);
-+typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
-+                        uint32_t, uint32_t, uint32_t);
- #endif /* TARGET_ARM_TRANSLATE_A64_H */
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@ static inline void gen_ss_advance(DisasContext *s)
+@@ -XXX,XX +XXX,XX @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
- }
+ typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
+                         uint32_t, uint32_t, uint32_t);
- /* Vector operations shared between ARM and AArch64.  */
--extern const GVecGen3 bsl_op;
++/* Function prototype for gen_ functions for calling Neon helpers */
--extern const GVecGen3 bit_op;
++typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
--extern const GVecGen3 bif_op;
++typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
- extern const GVecGen3 mla_op[4];
++typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
- extern const GVecGen3 mls_op[4];
++typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
- extern const GVecGen3 cmtst_op[4];
++typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
 +typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
 +typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
 +typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
 +typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
 +typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 +typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
 +typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
 +typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 +typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
 +
  #endif /* TARGET_ARM_TRANSLATE_H */
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void gen_gvec_fn3(DisasContext *s, bool is_q, int rd, int rn, int rm,
+@@ -XXX,XX +XXX,XX @@ typedef struct AArch64DecodeTable {
-             vec_full_reg_offset(s, rm), is_q ? 16 : 8, vec_full_reg_size(s));
+     AArch64DecodeFn *disas_fn;
- }
+ } AArch64DecodeTable;
-+/* Expand a 4-operand AdvSIMD vector operation using an expander function.  */
+-/* Function prototype for gen_ functions for calling Neon helpers */
-+static void gen_gvec_fn4(DisasContext *s, bool is_q, int rd, int rn, int rm,
+-typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
-+                         int rx, GVecGen4Fn *gvec_fn, int vece)
+-typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
-+{
+-typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
-+    gvec_fn(vece, vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn),
+-typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
-+            vec_full_reg_offset(s, rm), vec_full_reg_offset(s, rx),
+-typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
-+            is_q ? 16 : 8, vec_full_reg_size(s));
+-typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
-+}
+-typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
-+
+-typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
- /* Expand a 2-operand + immediate AdvSIMD vector operation using
+-typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
-  * an op descriptor.
+-typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
-  */
+-typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
+-typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
-         return;
+-typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+-typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-     case 5: /* BSL bitwise select */
+-typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
 -        gen_gvec_op3(s, is_q, rd, rn, rm, &bsl_op);
 +        gen_gvec_fn4(s, is_q, rd, rd, rn, rm, tcg_gen_gvec_bitsel, 0);
          return;
      case 6: /* BIT, bitwise insert if true */
 -        gen_gvec_op3(s, is_q, rd, rn, rm, &bit_op);
 +        gen_gvec_fn4(s, is_q, rd, rm, rn, rd, tcg_gen_gvec_bitsel, 0);
          return;
      case 7: /* BIF, bitwise insert if false */
 -        gen_gvec_op3(s, is_q, rd, rn, rm, &bif_op);
 +        gen_gvec_fn4(s, is_q, rd, rm, rd, rn, tcg_gen_gvec_bitsel, 0);
          return;
      default:
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
      return 1;
  }
 -/*
 - * Expanders for VBitOps_VBIF, VBIT, VBSL.
 - */
 -static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
 -{
 -    tcg_gen_xor_i64(rn, rn, rm);
 -    tcg_gen_and_i64(rn, rn, rd);
 -    tcg_gen_xor_i64(rd, rm, rn);
 -}
 -
--static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+ /* initialize TCG globals.  */
--{
+ void a64_translate_init(void)
 -    tcg_gen_xor_i64(rn, rn, rd);
 -    tcg_gen_and_i64(rn, rn, rm);
 -    tcg_gen_xor_i64(rd, rd, rn);
 -}
 -
 -static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
 -{
 -    tcg_gen_xor_i64(rn, rn, rd);
 -    tcg_gen_andc_i64(rn, rn, rm);
 -    tcg_gen_xor_i64(rd, rd, rn);
 -}
 -
 -static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
 -{
 -    tcg_gen_xor_vec(vece, rn, rn, rm);
 -    tcg_gen_and_vec(vece, rn, rn, rd);
 -    tcg_gen_xor_vec(vece, rd, rm, rn);
 -}
 -
 -static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
 -{
 -    tcg_gen_xor_vec(vece, rn, rn, rd);
 -    tcg_gen_and_vec(vece, rn, rn, rm);
 -    tcg_gen_xor_vec(vece, rd, rd, rn);
 -}
 -
 -static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
 -{
 -    tcg_gen_xor_vec(vece, rn, rn, rd);
 -    tcg_gen_andc_vec(vece, rn, rn, rm);
 -    tcg_gen_xor_vec(vece, rd, rd, rn);
 -}
 -
 -const GVecGen3 bsl_op = {
 -    .fni8 = gen_bsl_i64,
 -    .fniv = gen_bsl_vec,
 -    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -    .load_dest = true
 -};
 -
 -const GVecGen3 bit_op = {
 -    .fni8 = gen_bit_i64,
 -    .fniv = gen_bit_vec,
 -    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -    .load_dest = true
 -};
 -
 -const GVecGen3 bif_op = {
 -    .fni8 = gen_bif_i64,
 -    .fniv = gen_bif_vec,
 -    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -    .load_dest = true
 -};
 -
  static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
  {
-     tcg_gen_vec_sar8i_i64(a, a, shift);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                                  vec_size, vec_size);
-                 break;
-             case 5: /* VBSL */
--                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
--                               vec_size, vec_size, &bsl_op);
-+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
-+                                    vec_size, vec_size);
-                 break;
-             case 6: /* VBIT */
--                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
--                               vec_size, vec_size, &bit_op);
-+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
-+                                    vec_size, vec_size);
-                 break;
-             case 7: /* VBIF */
--                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
--                               vec_size, vec_size, &bif_op);
-+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
-+                                    vec_size, vec_size);
-                 break;
-             }
-             return 0;
 --
 .20.1

-[Qemu-devel] [PULL 10/48] target/arm: Fix Cortex-R5F MVFR values
+Deleted patch
-The Cortex-R5F initfn was not correctly setting up the MVFR
-ID register values. Fill these in, since some subsequent patches
-will use ID register checks rather than CPU feature bit checks.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/cpu.c | 2 ++
-file changed, 2 insertions(+)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
-+++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void cortex_r5f_initfn(Object *obj)
-     cortex_r5_initfn(obj);
-     set_feature(&cpu->env, ARM_FEATURE_VFP3);
-+    cpu->isar.mvfr0 = 0x10110221;
-+    cpu->isar.mvfr1 = 0x00000011;
- }
- static const ARMCPRegInfo cortexa8_cp_reginfo[] = {
---
-.20.1

-[Qemu-devel] [PULL 23/48] target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
+Deleted patch
-Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans
-functions which perform the memory accesses by going via the TCG
-globals cpu_F0s and cpu_F0d, to use local TCG temps instead.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 46 +++++++++++++++++++++-------------
- target/arm/translate.c         | 18 -------------
-files changed, 28 insertions(+), 36 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
- static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
- {
-     uint32_t offset;
--    TCGv_i32 addr;
-+    TCGv_i32 addr, tmp;
-     if (!vfp_access_check(s)) {
-         return true;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-         addr = load_reg(s, a->rn);
-     }
-     tcg_gen_addi_i32(addr, addr, offset);
-+    tmp = tcg_temp_new_i32();
-     if (a->l) {
--        gen_vfp_ld(s, false, addr);
--        gen_mov_vreg_F0(false, a->vd);
-+        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-+        neon_store_reg32(tmp, a->vd);
-     } else {
--        gen_mov_F0_vreg(false, a->vd);
--        gen_vfp_st(s, false, addr);
-+        neon_load_reg32(tmp, a->vd);
-+        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
-     }
-+    tcg_temp_free_i32(tmp);
-     tcg_temp_free_i32(addr);
-     return true;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
- {
-     uint32_t offset;
-     TCGv_i32 addr;
-+    TCGv_i64 tmp;
-     /* UNDEF accesses to D16-D31 if they don't exist */
-     if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-         addr = load_reg(s, a->rn);
-     }
-     tcg_gen_addi_i32(addr, addr, offset);
-+    tmp = tcg_temp_new_i64();
-     if (a->l) {
--        gen_vfp_ld(s, true, addr);
--        gen_mov_vreg_F0(true, a->vd);
-+        gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-+        neon_store_reg64(tmp, a->vd);
-     } else {
--        gen_mov_F0_vreg(true, a->vd);
--        gen_vfp_st(s, true, addr);
-+        neon_load_reg64(tmp, a->vd);
-+        gen_aa32_st64(s, tmp, addr, get_mem_index(s));
-     }
-+    tcg_temp_free_i64(tmp);
-     tcg_temp_free_i32(addr);
-     return true;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
- static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
- {
-     uint32_t offset;
--    TCGv_i32 addr;
-+    TCGv_i32 addr, tmp;
-     int i, n;
-     n = a->imm;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
-     }
-     offset = 4;
-+    tmp = tcg_temp_new_i32();
-     for (i = 0; i < n; i++) {
-         if (a->l) {
-             /* load */
--            gen_vfp_ld(s, false, addr);
--            gen_mov_vreg_F0(false, a->vd + i);
-+            gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-+            neon_store_reg32(tmp, a->vd + i);
-         } else {
-             /* store */
--            gen_mov_F0_vreg(false, a->vd + i);
--            gen_vfp_st(s, false, addr);
-+            neon_load_reg32(tmp, a->vd + i);
-+            gen_aa32_st32(s, tmp, addr, get_mem_index(s));
-         }
-         tcg_gen_addi_i32(addr, addr, offset);
-     }
-+    tcg_temp_free_i32(tmp);
-     if (a->w) {
-         /* writeback */
-         if (a->p) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
- {
-     uint32_t offset;
-     TCGv_i32 addr;
-+    TCGv_i64 tmp;
-     int i, n;
-     n = a->imm >> 1;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
-     }
-     offset = 8;
-+    tmp = tcg_temp_new_i64();
-     for (i = 0; i < n; i++) {
-         if (a->l) {
-             /* load */
--            gen_vfp_ld(s, true, addr);
--            gen_mov_vreg_F0(true, a->vd + i);
-+            gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-+            neon_store_reg64(tmp, a->vd + i);
-         } else {
-             /* store */
--            gen_mov_F0_vreg(true, a->vd + i);
--            gen_vfp_st(s, true, addr);
-+            neon_load_reg64(tmp, a->vd + i);
-+            gen_aa32_st64(s, tmp, addr, get_mem_index(s));
-         }
-         tcg_gen_addi_i32(addr, addr, offset);
-     }
-+    tcg_temp_free_i64(tmp);
-     if (a->w) {
-         /* writeback */
-         if (a->p) {
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ VFP_GEN_FIX(uhto, )
- VFP_GEN_FIX(ulto, )
- #undef VFP_GEN_FIX
--static inline void gen_vfp_ld(DisasContext *s, int dp, TCGv_i32 addr)
--{
--    if (dp) {
--        gen_aa32_ld64(s, cpu_F0d, addr, get_mem_index(s));
--    } else {
--        gen_aa32_ld32u(s, cpu_F0s, addr, get_mem_index(s));
--    }
--}
--
--static inline void gen_vfp_st(DisasContext *s, int dp, TCGv_i32 addr)
--{
--    if (dp) {
--        gen_aa32_st64(s, cpu_F0d, addr, get_mem_index(s));
--    } else {
--        gen_aa32_st32(s, cpu_F0s, addr, get_mem_index(s));
--    }
--}
--
- static inline long vfp_reg_offset(bool dp, unsigned reg)
- {
-     if (dp) {
---
-.20.1

-[Qemu-devel] [PULL 25/48] target/arm: Convert VFP VMLS to decodetree
+Deleted patch
-Convert the VFP VMLS instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 38 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         |  8 +------
- target/arm/vfp.decode          |  5 +++++
-files changed, 44 insertions(+), 7 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
- }
-+
-+static void gen_VMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
-+{
-+    /*
-+     * VMLS: vd = vd + -(vn * vm)
-+     * Note that order of inputs to the add matters for NaNs.
-+     */
-+    TCGv_i32 tmp = tcg_temp_new_i32();
-+
-+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negs(tmp, tmp);
-+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
-+    tcg_temp_free_i32(tmp);
-+}
-+
-+static bool trans_VMLS_sp(DisasContext *s, arg_VMLS_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_VMLS_sp, a->vd, a->vn, a->vm, true);
-+}
-+
-+static void gen_VMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
-+{
-+    /*
-+     * VMLS: vd = vd + -(vn * vm)
-+     * Note that order of inputs to the add matters for NaNs.
-+     */
-+    TCGv_i64 tmp = tcg_temp_new_i64();
-+
-+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negd(tmp, tmp);
-+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
-+    tcg_temp_free_i64(tmp);
-+}
-+
-+static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0:
-+            case 0 ... 1:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 1: /* VMLS: fd + -(fn * fm) */
--                    gen_vfp_mul(dp);
--                    gen_vfp_F1_neg(dp);
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_vfp_add(dp);
--                    break;
-                 case 2: /* VNMLS: -fd + (fn * fm) */
-                     /* Note that it isn't valid to replace (-A + B) with (B - A)
-                      * or similar plausible looking simplifications
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 26/48] target/arm: Convert VFP VNMLS to decodetree
+Deleted patch
-Convert the VFP VNMLS instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 42 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         | 24 +------------------
- target/arm/vfp.decode          |  5 ++++
-files changed, 48 insertions(+), 23 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
- }
-+
-+static void gen_VNMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
-+{
-+    /*
-+     * VNMLS: -fd + (fn * fm)
-+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
-+     * plausible looking simplifications because this will give wrong results
-+     * for NaNs.
-+     */
-+    TCGv_i32 tmp = tcg_temp_new_i32();
-+
-+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negs(vd, vd);
-+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
-+    tcg_temp_free_i32(tmp);
-+}
-+
-+static bool trans_VNMLS_sp(DisasContext *s, arg_VNMLS_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_VNMLS_sp, a->vd, a->vn, a->vm, true);
-+}
-+
-+static void gen_VNMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
-+{
-+    /*
-+     * VNMLS: -fd + (fn * fm)
-+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
-+     * plausible looking simplifications because this will give wrong results
-+     * for NaNs.
-+     */
-+    TCGv_i64 tmp = tcg_temp_new_i64();
-+
-+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negd(vd, vd);
-+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
-+    tcg_temp_free_i64(tmp);
-+}
-+
-+static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
- #undef VFP_OP2
--static inline void gen_vfp_F1_mul(int dp)
--{
--    /* Like gen_vfp_mul() but put result in F1 */
--    TCGv_ptr fpst = get_fpstatus_ptr(0);
--    if (dp) {
--        gen_helper_vfp_muld(cpu_F1d, cpu_F0d, cpu_F1d, fpst);
--    } else {
--        gen_helper_vfp_muls(cpu_F1s, cpu_F0s, cpu_F1s, fpst);
--    }
--    tcg_temp_free_ptr(fpst);
--}
--
- static inline void gen_vfp_F1_neg(int dp)
- {
-     /* Like gen_vfp_neg() but put result in F1 */
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 1:
-+            case 0 ... 2:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 2: /* VNMLS: -fd + (fn * fm) */
--                    /* Note that it isn't valid to replace (-A + B) with (B - A)
--                     * or similar plausible looking simplifications
--                     * because this will give wrong results for NaNs.
--                     */
--                    gen_vfp_F1_mul(dp);
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_vfp_neg(dp);
--                    gen_vfp_add(dp);
--                    break;
-                 case 3: /* VNMLA: -fd + -(fn * fm) */
-                     gen_vfp_mul(dp);
-                     gen_vfp_F1_neg(dp);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 27/48] target/arm: Convert VFP VNMLA to decodetree
+Deleted patch
-Convert the VFP VNMLA instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 34 ++++++++++++++++++++++++++++++++++
- target/arm/translate.c         | 19 +------------------
- target/arm/vfp.decode          |  5 +++++
-files changed, 40 insertions(+), 18 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
- }
-+
-+static void gen_VNMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
-+{
-+    /* VNMLA: -fd + -(fn * fm) */
-+    TCGv_i32 tmp = tcg_temp_new_i32();
-+
-+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negs(tmp, tmp);
-+    gen_helper_vfp_negs(vd, vd);
-+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
-+    tcg_temp_free_i32(tmp);
-+}
-+
-+static bool trans_VNMLA_sp(DisasContext *s, arg_VNMLA_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_VNMLA_sp, a->vd, a->vn, a->vm, true);
-+}
-+
-+static void gen_VNMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
-+{
-+    /* VNMLA: -fd + (fn * fm) */
-+    TCGv_i64 tmp = tcg_temp_new_i64();
-+
-+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
-+    gen_helper_vfp_negd(tmp, tmp);
-+    gen_helper_vfp_negd(vd, vd);
-+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
-+    tcg_temp_free_i64(tmp);
-+}
-+
-+static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
- #undef VFP_OP2
--static inline void gen_vfp_F1_neg(int dp)
--{
--    /* Like gen_vfp_neg() but put result in F1 */
--    if (dp) {
--        gen_helper_vfp_negd(cpu_F1d, cpu_F0d);
--    } else {
--        gen_helper_vfp_negs(cpu_F1s, cpu_F0s);
--    }
--}
--
- static inline void gen_vfp_abs(int dp)
- {
-     if (dp)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 2:
-+            case 0 ... 3:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 3: /* VNMLA: -fd + -(fn * fm) */
--                    gen_vfp_mul(dp);
--                    gen_vfp_F1_neg(dp);
--                    gen_mov_F0_vreg(dp, rd);
--                    gen_vfp_neg(dp);
--                    gen_vfp_add(dp);
--                    break;
-                 case 4: /* mul: fn * fm */
-                     gen_vfp_mul(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 28/48] target/arm: Convert VMUL to decodetree
+Deleted patch
-Convert the VMUL instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  5 +----
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 4 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
- }
-+
-+static bool trans_VMUL_sp(DisasContext *s, arg_VMUL_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_muls, a->vd, a->vn, a->vm, false);
-+}
-+
-+static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 3:
-+            case 0 ... 4:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 4: /* mul: fn * fm */
--                    gen_vfp_mul(dp);
--                    break;
-                 case 5: /* nmul: -(fn * fm) */
-                     gen_vfp_mul(dp);
-                     gen_vfp_neg(dp);
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 29/48] target/arm: Convert VNMUL to decodetree
+Deleted patch
-Convert the VNMUL instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 24 ++++++++++++++++++++++++
- target/arm/translate.c         |  7 +------
- target/arm/vfp.decode          |  5 +++++
-files changed, 30 insertions(+), 6 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
- }
-+
-+static void gen_VNMUL_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
-+{
-+    /* VNMUL: -(fn * fm) */
-+    gen_helper_vfp_muls(vd, vn, vm, fpst);
-+    gen_helper_vfp_negs(vd, vd);
-+}
-+
-+static bool trans_VNMUL_sp(DisasContext *s, arg_VNMUL_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_VNMUL_sp, a->vd, a->vn, a->vm, false);
-+}
-+
-+static void gen_VNMUL_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
-+{
-+    /* VNMUL: -(fn * fm) */
-+    gen_helper_vfp_muld(vd, vn, vm, fpst);
-+    gen_helper_vfp_negd(vd, vd);
-+}
-+
-+static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
- VFP_OP2(add)
- VFP_OP2(sub)
--VFP_OP2(mul)
- VFP_OP2(div)
- #undef VFP_OP2
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 4:
-+            case 0 ... 5:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 5: /* nmul: -(fn * fm) */
--                    gen_vfp_mul(dp);
--                    gen_vfp_neg(dp);
--                    break;
-                 case 6: /* add: fn + fm */
-                     gen_vfp_add(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 30/48] target/arm: Convert VADD to decodetree
+Deleted patch
-Convert the VADD instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  6 +-----
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 5 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
- }
-+
-+static bool trans_VADD_sp(DisasContext *s, arg_VADD_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_adds, a->vd, a->vn, a->vm, false);
-+}
-+
-+static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
-     tcg_temp_free_ptr(fpst);                                          \
- }
--VFP_OP2(add)
- VFP_OP2(sub)
- VFP_OP2(div)
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 5:
-+            case 0 ... 6:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 6: /* add: fn + fm */
--                    gen_vfp_add(dp);
--                    break;
-                 case 7: /* sub: fn - fm */
-                     gen_vfp_sub(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

-[Qemu-devel] [PULL 31/48] target/arm: Convert VSUB to decodetree
+Deleted patch
-Convert the VSUB instruction to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-vfp.inc.c | 10 ++++++++++
- target/arm/translate.c         |  6 +-----
- target/arm/vfp.decode          |  5 +++++
-files changed, 16 insertions(+), 5 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
-+++ b/target/arm/translate-vfp.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
- {
-     return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
- }
-+
-+static bool trans_VSUB_sp(DisasContext *s, arg_VSUB_sp *a)
-+{
-+    return do_vfp_3op_sp(s, gen_helper_vfp_subs, a->vd, a->vn, a->vm, false);
-+}
-+
-+static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
-+{
-+    return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
-     tcg_temp_free_ptr(fpst);                                          \
- }
--VFP_OP2(sub)
- VFP_OP2(div)
- #undef VFP_OP2
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             rn = VFP_SREG_N(insn);
-             switch (op) {
--            case 0 ... 6:
-+            case 0 ... 7:
-                 /* Already handled by decodetree */
-                 return 1;
-             default:
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-             for (;;) {
-                 /* Perform the calculation.  */
-                 switch (op) {
--                case 7: /* sub: fn - fm */
--                    gen_vfp_sub(dp);
--                    break;
-                 case 8: /* div: fn / fm */
-                     gen_vfp_div(dp);
-                     break;
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
-              vm=%vm_sp vn=%vn_sp vd=%vd_sp
- VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
-              vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
-+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
-+VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
-+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
---
-.20.1

Arm queue; the bulk of this is the VFP decodetree conversion...

thanks
-- PMM

The following changes since commit 4747524f9f243ca5ff1f146d37e423c00e923ee1:

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2019-06-12' into staging (2019-06-13 11:58:00 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190613

for you to fetch changes up to 07e4c7f769120c9a5bd6a26c2dc1421f2f838d80:

target/arm: Fix short-vector increment behaviour (2019-06-13 12:57:37 +0100)

----------------------------------------------------------------
target-arm queue:
 * convert aarch32 VFP decoder to decodetree
   (includes tightening up decode in a few places)
 * fix minor bugs in VFP short-vector handling
 * hw/core/bus.c: Only the main system bus can have no parent
 * smmuv3: Fix decoding of ID register range
 * Implement NSACR gating of floating point
 * Use tcg_gen_gvec_bitsel
 * Vectorize USHL and SSHL

----------------------------------------------------------------
Peter Maydell (44):
      target/arm: Implement NSACR gating of floating point
      hw/arm/smmuv3: Fix decoding of ID register range
      hw/core/bus.c: Only the main system bus can have no parent
      target/arm: Add stubs for AArch32 VFP decodetree
      target/arm: Factor out VFP access checking code
      target/arm: Fix Cortex-R5F MVFR values
      target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max
      target/arm: Convert the VSEL instructions to decodetree
      target/arm: Convert VMINNM, VMAXNM to decodetree
      target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM to decodetree
      target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM to decodetree
      target/arm: Move the VFP trans_* functions to translate-vfp.inc.c
      target/arm: Add helpers for VFP register loads and stores
      target/arm: Convert "double-precision" register moves to decodetree
      target/arm: Convert "single-precision" register moves to decodetree
      target/arm: Convert VFP two-register transfer insns to decodetree
      target/arm: Convert VFP VLDR and VSTR to decodetree
      target/arm: Convert the VFP load/store multiple insns to decodetree
      target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d
      target/arm: Convert VFP VMLA to decodetree
      target/arm: Convert VFP VMLS to decodetree
      target/arm: Convert VFP VNMLS to decodetree
      target/arm: Convert VFP VNMLA to decodetree
      target/arm: Convert VMUL to decodetree
      target/arm: Convert VNMUL to decodetree
      target/arm: Convert VADD to decodetree
      target/arm: Convert VSUB to decodetree
      target/arm: Convert VDIV to decodetree
      target/arm: Convert VFP fused multiply-add insns to decodetree
      target/arm: Convert VMOV (imm) to decodetree
      target/arm: Convert VABS to decodetree
      target/arm: Convert VNEG to decodetree
      target/arm: Convert VSQRT to decodetree
      target/arm: Convert VMOV (register) to decodetree
      target/arm: Convert VFP comparison insns to decodetree
      target/arm: Convert the VCVT-from-f16 insns to decodetree
      target/arm: Convert the VCVT-to-f16 insns to decodetree
      target/arm: Convert VFP round insns to decodetree
      target/arm: Convert double-single precision conversion insns to decodetree
      target/arm: Convert integer-to-float insns to decodetree
      target/arm: Convert VJCVT to decodetree
      target/arm: Convert VCVT fp/fixed-point conversion insns to decodetree
      target/arm: Convert float-to-integer VCVT insns to decodetree
      target/arm: Fix short-vector increment behaviour

Richard Henderson (4):
      target/arm: Vectorize USHL and SSHL
      target/arm: Use tcg_gen_gvec_bitsel
      target/arm: Fix output of PAuth Auth
      decodetree: Fix comparison of Field

target/arm/Makefile.objs          |   13 +
 tests/tcg/aarch64/Makefile.target |    2 +-
 target/arm/cpu.h                  |   11 +
 target/arm/helper.h               |   11 +-
 target/arm/translate-a64.h        |    2 +
 target/arm/translate.h            |    9 +-
 hw/arm/smmuv3.c                   |    2 +-
 hw/core/bus.c                     |   21 +-
 target/arm/cpu.c                  |    6 +
 target/arm/helper.c               |   75 +-
 target/arm/neon_helper.c          |   33 -
 target/arm/pauth_helper.c         |    4 +-
 target/arm/translate-a64.c        |   33 +-
 target/arm/translate-vfp.inc.c    | 2672 +++++++++++++++++++++++++++++++++++++
 target/arm/translate.c            | 1881 +++++---------------------
 target/arm/vec_helper.c           |   88 ++
 tests/tcg/aarch64/pauth-2.c       |   61 +
 scripts/decodetree.py             |    2 +-
 target/arm/vfp-uncond.decode      |   63 +
 target/arm/vfp.decode             |  242 ++++
 20 files changed, 3593 insertions(+), 1638 deletions(-)
 create mode 100644 target/arm/translate-vfp.inc.c
 create mode 100644 tests/tcg/aarch64/pauth-2.c
 create mode 100644 target/arm/vfp-uncond.decode
 create mode 100644 target/arm/vfp.decode

From: Richard Henderson <richard.henderson@linaro.org>

These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift.  This
requires several masks and selects in addition to the actual
shifts to form the complete answer.

That said, the operation is still a small improvement even for
two 64-bit elements -- 13 vector operations instead of 2 * 7
integer operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190603232209.20704-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  11 +-
 target/arm/translate.h     |   6 +
 target/arm/neon_helper.c   |  33 ----
 target/arm/translate-a64.c |  18 +--
 target/arm/translate.c     | 300 +++++++++++++++++++++++++++++++++++--
 target/arm/vec_helper.c    |  88 +++++++++++
 6 files changed, 390 insertions(+), 66 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32)
 DEF_HELPER_2(neon_abd_u32, i32, i32, i32)
 DEF_HELPER_2(neon_abd_s32, i32, i32, i32)
 
-DEF_HELPER_2(neon_shl_u8, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_shl_u16, i32, i32, i32)
 DEF_HELPER_2(neon_shl_s16, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u64, i64, i64, i64)
-DEF_HELPER_2(neon_shl_s64, i64, i64, i64)
 DEF_HELPER_2(neon_rshl_u8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_u16, i32, i32, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 
+DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ extern const GVecGen3 bif_op;
 extern const GVecGen3 mla_op[4];
 extern const GVecGen3 mls_op[4];
 extern const GVecGen3 cmtst_op[4];
+extern const GVecGen3 sshl_op[4];
+extern const GVecGen3 ushl_op[4];
 extern const GVecGen2i ssra_op[4];
 extern const GVecGen2i usra_op[4];
 extern const GVecGen2i sri_op[4];
@@ -XXX,XX +XXX,XX @@ extern const GVecGen4 sqadd_op[4];
 extern const GVecGen4 uqsub_op[4];
 extern const GVecGen4 sqsub_op[4];
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -XXX,XX +XXX,XX @@ NEON_VOP(abd_u32, neon_u32, 1)
     } else { \
         dest = src1 << tmp; \
     }} while (0)
-NEON_VOP(shl_u8, neon_u8, 4)
 NEON_VOP(shl_u16, neon_u16, 2)
-NEON_VOP(shl_u32, neon_u32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
-{
-    int8_t shift = (int8_t)shiftop;
-    if (shift >= 64 || shift <= -64) {
-        val = 0;
-    } else if (shift < 0) {
-        val >>= -shift;
-    } else {
-        val <<= shift;
-    }
-    return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
     } else { \
         dest = src1 << tmp; \
     }} while (0)
-NEON_VOP(shl_s8, neon_s8, 4)
 NEON_VOP(shl_s16, neon_s16, 2)
-NEON_VOP(shl_s32, neon_s32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
-{
-    int8_t shift = (int8_t)shiftop;
-    int64_t val = valop;
-    if (shift >= 64) {
-        val = 0;
-    } else if (shift <= -64) {
-        val >>= 63;
-    } else if (shift < 0) {
-        val >>= -shift;
-    } else {
-        val <<= shift;
-    }
-    return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
         break;
     case 0x8: /* SSHL, USHL */
         if (u) {
-            gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm);
+            gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm);
         } else {
-            gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm);
+            gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm);
         }
         break;
     case 0x9: /* SQSHL, UQSHL */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                        is_q ? 16 : 8, vec_full_reg_size(s),
                        (u ? uqsub_op : sqsub_op) + size);
         return;
+    case 0x08: /* SSHL, USHL */
+        gen_gvec_op3(s, is_q, rd, rn, rm,
+                     u ? &ushl_op[size] : &sshl_op[size]);
+        return;
     case 0x0c: /* SMAX, UMAX */
         if (u) {
             gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genfn = fns[size][u];
                 break;
             }
-            case 0x8: /* SSHL, USHL */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
-                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
-                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
             case 0x9: /* SQSHL, UQSHL */
             {
                 static NeonGenTwoOpEnvFn * const fns[3][2] = {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_shift_narrow(int size, TCGv_i32 var, TCGv_i32 shift,
         if (u) {
             switch (size) {
             case 1: gen_helper_neon_shl_u16(var, var, shift); break;
-            case 2: gen_helper_neon_shl_u32(var, var, shift); break;
+            case 2: gen_ushl_i32(var, var, shift); break;
             default: abort();
             }
         } else {
             switch (size) {
             case 1: gen_helper_neon_shl_s16(var, var, shift); break;
-            case 2: gen_helper_neon_shl_s32(var, var, shift); break;
+            case 2: gen_sshl_i32(var, var, shift); break;
             default: abort();
             }
         }
@@ -XXX,XX +XXX,XX @@ const GVecGen3 cmtst_op[4] = {
       .vece = MO_64 },
 };
 
+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 lval = tcg_temp_new_i32();
+    TCGv_i32 rval = tcg_temp_new_i32();
+    TCGv_i32 lsh = tcg_temp_new_i32();
+    TCGv_i32 rsh = tcg_temp_new_i32();
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 max = tcg_const_i32(32);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i32(lsh, b);
+    tcg_gen_neg_i32(rsh, lsh);
+    tcg_gen_shl_i32(lval, a, lsh);
+    tcg_gen_shr_i32(rval, a, rsh);
+    tcg_gen_movcond_i32(TCG_COND_LTU, d, lsh, max, lval, zero);
+    tcg_gen_movcond_i32(TCG_COND_LTU, d, rsh, max, rval, d);
+
+    tcg_temp_free_i32(lval);
+    tcg_temp_free_i32(rval);
+    tcg_temp_free_i32(lsh);
+    tcg_temp_free_i32(rsh);
+    tcg_temp_free_i32(zero);
+    tcg_temp_free_i32(max);
+}
+
+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 lval = tcg_temp_new_i64();
+    TCGv_i64 rval = tcg_temp_new_i64();
+    TCGv_i64 lsh = tcg_temp_new_i64();
+    TCGv_i64 rsh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 max = tcg_const_i64(64);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i64(lsh, b);
+    tcg_gen_neg_i64(rsh, lsh);
+    tcg_gen_shl_i64(lval, a, lsh);
+    tcg_gen_shr_i64(rval, a, rsh);
+    tcg_gen_movcond_i64(TCG_COND_LTU, d, lsh, max, lval, zero);
+    tcg_gen_movcond_i64(TCG_COND_LTU, d, rsh, max, rval, d);
+
+    tcg_temp_free_i64(lval);
+    tcg_temp_free_i64(rval);
+    tcg_temp_free_i64(lsh);
+    tcg_temp_free_i64(rsh);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(max);
+}
+
+static void gen_ushl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec lval = tcg_temp_new_vec_matching(d);
+    TCGv_vec rval = tcg_temp_new_vec_matching(d);
+    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec msk, max;
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_neg_vec(vece, rsh, b);
+    if (vece == MO_8) {
+        tcg_gen_mov_vec(lsh, b);
+    } else {
+        msk = tcg_temp_new_vec_matching(d);
+        tcg_gen_dupi_vec(vece, msk, 0xff);
+        tcg_gen_and_vec(vece, lsh, b, msk);
+        tcg_gen_and_vec(vece, rsh, rsh, msk);
+        tcg_temp_free_vec(msk);
+    }
+
+    /*
+     * Perform possibly out of range shifts, trusting that the operation
+     * does not trap.  Discard unused results after the fact.
+     */
+    tcg_gen_shlv_vec(vece, lval, a, lsh);
+    tcg_gen_shrv_vec(vece, rval, a, rsh);
+
+    max = tcg_temp_new_vec_matching(d);
+    tcg_gen_dupi_vec(vece, max, 8 << vece);
+
+    /*
+     * The choice of LT (signed) and GEU (unsigned) are biased toward
+     * the instructions of the x86_64 host.  For MO_8, the whole byte
+     * is significant so we must use an unsigned compare; otherwise we
+     * have already masked to a byte and so a signed compare works.
+     * Other tcg hosts have a full set of comparisons and do not care.
+     */
+    if (vece == MO_8) {
+        tcg_gen_cmp_vec(TCG_COND_GEU, vece, lsh, lsh, max);
+        tcg_gen_cmp_vec(TCG_COND_GEU, vece, rsh, rsh, max);
+        tcg_gen_andc_vec(vece, lval, lval, lsh);
+        tcg_gen_andc_vec(vece, rval, rval, rsh);
+    } else {
+        tcg_gen_cmp_vec(TCG_COND_LT, vece, lsh, lsh, max);
+        tcg_gen_cmp_vec(TCG_COND_LT, vece, rsh, rsh, max);
+        tcg_gen_and_vec(vece, lval, lval, lsh);
+        tcg_gen_and_vec(vece, rval, rval, rsh);
+    }
+    tcg_gen_or_vec(vece, d, lval, rval);
+
+    tcg_temp_free_vec(max);
+    tcg_temp_free_vec(lval);
+    tcg_temp_free_vec(rval);
+    tcg_temp_free_vec(lsh);
+    tcg_temp_free_vec(rsh);
+}
+
+static const TCGOpcode ushl_list[] = {
+    INDEX_op_neg_vec, INDEX_op_shlv_vec,
+    INDEX_op_shrv_vec, INDEX_op_cmp_vec, 0
+};
+
+const GVecGen3 ushl_op[4] = {
+    { .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_b,
+      .opt_opc = ushl_list,
+      .vece = MO_8 },
+    { .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_h,
+      .opt_opc = ushl_list,
+      .vece = MO_16 },
+    { .fni4 = gen_ushl_i32,
+      .fniv = gen_ushl_vec,
+      .opt_opc = ushl_list,
+      .vece = MO_32 },
+    { .fni8 = gen_ushl_i64,
+      .fniv = gen_ushl_vec,
+      .opt_opc = ushl_list,
+      .vece = MO_64 },
+};
+
+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 lval = tcg_temp_new_i32();
+    TCGv_i32 rval = tcg_temp_new_i32();
+    TCGv_i32 lsh = tcg_temp_new_i32();
+    TCGv_i32 rsh = tcg_temp_new_i32();
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 max = tcg_const_i32(31);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i32(lsh, b);
+    tcg_gen_neg_i32(rsh, lsh);
+    tcg_gen_shl_i32(lval, a, lsh);
+    tcg_gen_umin_i32(rsh, rsh, max);
+    tcg_gen_sar_i32(rval, a, rsh);
+    tcg_gen_movcond_i32(TCG_COND_LEU, lval, lsh, max, lval, zero);
+    tcg_gen_movcond_i32(TCG_COND_LT, d, lsh, zero, rval, lval);
+
+    tcg_temp_free_i32(lval);
+    tcg_temp_free_i32(rval);
+    tcg_temp_free_i32(lsh);
+    tcg_temp_free_i32(rsh);
+    tcg_temp_free_i32(zero);
+    tcg_temp_free_i32(max);
+}
+
+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 lval = tcg_temp_new_i64();
+    TCGv_i64 rval = tcg_temp_new_i64();
+    TCGv_i64 lsh = tcg_temp_new_i64();
+    TCGv_i64 rsh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 max = tcg_const_i64(63);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i64(lsh, b);
+    tcg_gen_neg_i64(rsh, lsh);
+    tcg_gen_shl_i64(lval, a, lsh);
+    tcg_gen_umin_i64(rsh, rsh, max);
+    tcg_gen_sar_i64(rval, a, rsh);
+    tcg_gen_movcond_i64(TCG_COND_LEU, lval, lsh, max, lval, zero);
+    tcg_gen_movcond_i64(TCG_COND_LT, d, lsh, zero, rval, lval);
+
+    tcg_temp_free_i64(lval);
+    tcg_temp_free_i64(rval);
+    tcg_temp_free_i64(lsh);
+    tcg_temp_free_i64(rsh);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(max);
+}
+
+static void gen_sshl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec lval = tcg_temp_new_vec_matching(d);
+    TCGv_vec rval = tcg_temp_new_vec_matching(d);
+    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec tmp = tcg_temp_new_vec_matching(d);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_neg_vec(vece, rsh, b);
+    if (vece == MO_8) {
+        tcg_gen_mov_vec(lsh, b);
+    } else {
+        tcg_gen_dupi_vec(vece, tmp, 0xff);
+        tcg_gen_and_vec(vece, lsh, b, tmp);
+        tcg_gen_and_vec(vece, rsh, rsh, tmp);
+    }
+
+    /* Bound rsh so out of bound right shift gets -1.  */
+    tcg_gen_dupi_vec(vece, tmp, (8 << vece) - 1);
+    tcg_gen_umin_vec(vece, rsh, rsh, tmp);
+    tcg_gen_cmp_vec(TCG_COND_GT, vece, tmp, lsh, tmp);
+
+    tcg_gen_shlv_vec(vece, lval, a, lsh);
+    tcg_gen_sarv_vec(vece, rval, a, rsh);
+
+    /* Select in-bound left shift.  */
+    tcg_gen_andc_vec(vece, lval, lval, tmp);
+
+    /* Select between left and right shift.  */
+    if (vece == MO_8) {
+        tcg_gen_dupi_vec(vece, tmp, 0);
+        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, rval, lval);
+    } else {
+        tcg_gen_dupi_vec(vece, tmp, 0x80);
+        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, d, lsh, tmp, lval, rval);
+    }
+
+    tcg_temp_free_vec(lval);
+    tcg_temp_free_vec(rval);
+    tcg_temp_free_vec(lsh);
+    tcg_temp_free_vec(rsh);
+    tcg_temp_free_vec(tmp);
+}
+
+static const TCGOpcode sshl_list[] = {
+    INDEX_op_neg_vec, INDEX_op_umin_vec, INDEX_op_shlv_vec,
+    INDEX_op_sarv_vec, INDEX_op_cmp_vec, INDEX_op_cmpsel_vec, 0
+};
+
+const GVecGen3 sshl_op[4] = {
+    { .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_b,
+      .opt_opc = sshl_list,
+      .vece = MO_8 },
+    { .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_h,
+      .opt_opc = sshl_list,
+      .vece = MO_16 },
+    { .fni4 = gen_sshl_i32,
+      .fniv = gen_sshl_vec,
+      .opt_opc = sshl_list,
+      .vece = MO_32 },
+    { .fni8 = gen_sshl_i64,
+      .fniv = gen_sshl_vec,
+      .opt_opc = sshl_list,
+      .vece = MO_64 },
+};
+
 static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
                           TCGv_vec a, TCGv_vec b)
 {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                   vec_size, vec_size);
             }
             return 0;
+
+        case NEON_3R_VSHL:
+            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
+                           u ? &ushl_op[size] : &sshl_op[size]);
+            return 0;
         }
 
         if (size == 3) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 neon_load_reg64(cpu_V0, rn + pass);
                 neon_load_reg64(cpu_V1, rm + pass);
                 switch (op) {
-                case NEON_3R_VSHL:
-                    if (u) {
-                        gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0);
-                    } else {
-                        gen_helper_neon_shl_s64(cpu_V0, cpu_V1, cpu_V0);
-                    }
-                    break;
                 case NEON_3R_VQSHL:
                     if (u) {
                         gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
         pairwise = 0;
         switch (op) {
-        case NEON_3R_VSHL:
         case NEON_3R_VQSHL:
         case NEON_3R_VRSHL:
         case NEON_3R_VQRSHL:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VHSUB:
             GEN_NEON_INTEGER_OP(hsub);
             break;
-        case NEON_3R_VSHL:
-            GEN_NEON_INTEGER_OP(shl);
-            break;
         case NEON_3R_VQSHL:
             GEN_NEON_INTEGER_OP_ENV(qshl);
             break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             }
                         } else {
                             if (input_unsigned) {
-                                gen_helper_neon_shl_u64(cpu_V0, in, tmp64);
+                                gen_ushl_i64(cpu_V0, in, tmp64);
                             } else {
-                                gen_helper_neon_shl_s64(cpu_V0, in, tmp64);
+                                gen_sshl_i64(cpu_V0, in, tmp64);
                             }
                         }
                         tmp = tcg_temp_new_i32();
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
 }
+
+void HELPER(gvec_sshl_b)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int8_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        int8_t mm = m[i];
+        int8_t nn = n[i];
+        int8_t res = 0;
+        if (mm >= 0) {
+            if (mm < 8) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -8 ? -mm : 7);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_sshl_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        int16_t nn = n[i];
+        int16_t res = 0;
+        if (mm >= 0) {
+            if (mm < 16) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -16 ? -mm : 15);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_b)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint8_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        int8_t mm = m[i];
+        uint8_t nn = n[i];
+        uint8_t res = 0;
+        if (mm >= 0) {
+            if (mm < 8) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -8) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        uint16_t nn = n[i];
+        uint16_t res = 0;
+        if (mm >= 0) {
+            if (mm < 16) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -16) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This replaces 3 target-specific implementations for BIT, BIF, and BSL.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190518191934.21887-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.h |  2 +
 target/arm/translate.h     |  3 --
 target/arm/translate-a64.c | 15 ++++++--
 target/arm/translate.c     | 78 +++-----------------------------------
 4 files changed, 20 insertions(+), 78 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -XXX,XX +XXX,XX @@ typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
                          uint32_t, uint32_t);
 typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
+typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
+                        uint32_t, uint32_t, uint32_t);
 
 #endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline void gen_ss_advance(DisasContext *s)
 }
 
 /* Vector operations shared between ARM and AArch64.  */
-extern const GVecGen3 bsl_op;
-extern const GVecGen3 bit_op;
-extern const GVecGen3 bif_op;
 extern const GVecGen3 mla_op[4];
 extern const GVecGen3 mls_op[4];
 extern const GVecGen3 cmtst_op[4];
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_gvec_fn3(DisasContext *s, bool is_q, int rd, int rn, int rm,
             vec_full_reg_offset(s, rm), is_q ? 16 : 8, vec_full_reg_size(s));
 }
 
+/* Expand a 4-operand AdvSIMD vector operation using an expander function.  */
+static void gen_gvec_fn4(DisasContext *s, bool is_q, int rd, int rn, int rm,
+                         int rx, GVecGen4Fn *gvec_fn, int vece)
+{
+    gvec_fn(vece, vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn),
+            vec_full_reg_offset(s, rm), vec_full_reg_offset(s, rx),
+            is_q ? 16 : 8, vec_full_reg_size(s));
+}
+
 /* Expand a 2-operand + immediate AdvSIMD vector operation using
  * an op descriptor.
  */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
         return;
 
     case 5: /* BSL bitwise select */
-        gen_gvec_op3(s, is_q, rd, rn, rm, &bsl_op);
+        gen_gvec_fn4(s, is_q, rd, rd, rn, rm, tcg_gen_gvec_bitsel, 0);
         return;
     case 6: /* BIT, bitwise insert if true */
-        gen_gvec_op3(s, is_q, rd, rn, rm, &bit_op);
+        gen_gvec_fn4(s, is_q, rd, rm, rn, rd, tcg_gen_gvec_bitsel, 0);
         return;
     case 7: /* BIF, bitwise insert if false */
-        gen_gvec_op3(s, is_q, rd, rn, rm, &bif_op);
+        gen_gvec_fn4(s, is_q, rd, rm, rd, rn, tcg_gen_gvec_bitsel, 0);
         return;
 
     default:
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
     return 1;
 }
 
-/*
- * Expanders for VBitOps_VBIF, VBIT, VBSL.
- */
-static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rm);
-    tcg_gen_and_i64(rn, rn, rd);
-    tcg_gen_xor_i64(rd, rm, rn);
-}
-
-static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rd);
-    tcg_gen_and_i64(rn, rn, rm);
-    tcg_gen_xor_i64(rd, rd, rn);
-}
-
-static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rd);
-    tcg_gen_andc_i64(rn, rn, rm);
-    tcg_gen_xor_i64(rd, rd, rn);
-}
-
-static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rm);
-    tcg_gen_and_vec(vece, rn, rn, rd);
-    tcg_gen_xor_vec(vece, rd, rm, rn);
-}
-
-static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rd);
-    tcg_gen_and_vec(vece, rn, rn, rm);
-    tcg_gen_xor_vec(vece, rd, rd, rn);
-}
-
-static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rd);
-    tcg_gen_andc_vec(vece, rn, rn, rm);
-    tcg_gen_xor_vec(vece, rd, rd, rn);
-}
-
-const GVecGen3 bsl_op = {
-    .fni8 = gen_bsl_i64,
-    .fniv = gen_bsl_vec,
-    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-    .load_dest = true
-};
-
-const GVecGen3 bit_op = {
-    .fni8 = gen_bit_i64,
-    .fniv = gen_bit_vec,
-    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-    .load_dest = true
-};
-
-const GVecGen3 bif_op = {
-    .fni8 = gen_bif_i64,
-    .fniv = gen_bif_vec,
-    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-    .load_dest = true
-};
-
 static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 {
     tcg_gen_vec_sar8i_i64(a, a, shift);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                  vec_size, vec_size);
                 break;
             case 5: /* VBSL */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &bsl_op);
+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
+                                    vec_size, vec_size);
                 break;
             case 6: /* VBIT */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &bit_op);
+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
+                                    vec_size, vec_size);
                 break;
             case 7: /* VBIF */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &bif_op);
+                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
+                                    vec_size, vec_size);
                 break;
             }
             return 0;
-- 
2.20.1

The NSACR register allows secure code to configure the FPU
to be inaccessible to non-secure code. If the NSACR.CP10
bit is set then:
 * NS accesses to the FPU trap as UNDEF (ie to NS EL1 or EL2)
 * CPACR.{CP10,CP11} behave as if RAZ/WI
 * HCPTR.{TCP11,TCP10} behave as if RAO/WI

Note that we do not implement the NSACR.NSASEDIS bit which
gates only access to Advanced SIMD, in the same way that
we don't implement the equivalent CPACR.ASEDIS and HCPTR.TASE.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190510110357.18825-1-peter.maydell@linaro.org
---
 target/arm/helper.c | 75 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
         }
         value &= mask;
     }
+
+    /*
+     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
+     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
+     */
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value &= ~(0xf << 20);
+        value |= env->cp15.cpacr_el1 & (0xf << 20);
+    }
+
     env->cp15.cpacr_el1 = value;
 }
 
+static uint64_t cpacr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    /*
+     * For A-profile AArch32 EL3 (but not M-profile secure mode), if NSACR.CP10
+     * is 0 then CPACR.{CP11,CP10} ignore writes and read as 0b00.
+     */
+    uint64_t value = env->cp15.cpacr_el1;
+
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value &= ~(0xf << 20);
+    }
+    return value;
+}
+
+
 static void cpacr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
 {
     /* Call cpacr_write() so that we reset with the correct RAO bits set
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
     { .name = "CPACR", .state = ARM_CP_STATE_BOTH, .opc0 = 3,
       .crn = 1, .crm = 0, .opc1 = 0, .opc2 = 2, .accessfn = cpacr_access,
       .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.cpacr_el1),
-      .resetfn = cpacr_reset, .writefn = cpacr_write },
+      .resetfn = cpacr_reset, .writefn = cpacr_write, .readfn = cpacr_read },
     REGINFO_SENTINEL
 };
 
@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
     return ret;
 }
 
+static void cptr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                           uint64_t value)
+{
+    /*
+     * For A-profile AArch32 EL3, if NSACR.CP10
+     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
+     */
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value &= ~(0x3 << 10);
+        value |= env->cp15.cptr_el[2] & (0x3 << 10);
+    }
+    env->cp15.cptr_el[2] = value;
+}
+
+static uint64_t cptr_el2_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    /*
+     * For A-profile AArch32 EL3, if NSACR.CP10
+     * is 0 then HCPTR.{TCP11,TCP10} ignore writes and read as 1.
+     */
+    uint64_t value = env->cp15.cptr_el[2];
+
+    if (arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+        !arm_is_secure(env) && !extract32(env->cp15.nsacr, 10, 1)) {
+        value |= 0x3 << 10;
+    }
+    return value;
+}
+
 static const ARMCPRegInfo el2_cp_reginfo[] = {
     { .name = "HCR_EL2", .state = ARM_CP_STATE_AA64,
       .type = ARM_CP_IO,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
     { .name = "CPTR_EL2", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 2,
       .access = PL2_RW, .accessfn = cptr_access, .resetvalue = 0,
-      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]) },
+      .fieldoffset = offsetof(CPUARMState, cp15.cptr_el[2]),
+      .readfn = cptr_el2_read, .writefn = cptr_el2_write },
     { .name = "MAIR_EL2", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 4, .crn = 10, .crm = 2, .opc2 = 0,
       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.mair_el[2]),
@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
         break;
     }
 
+    /*
+     * The NSACR allows A-profile AArch32 EL3 and M-profile secure mode
+     * to control non-secure access to the FPU. It doesn't have any
+     * effect if EL3 is AArch64 or if EL3 doesn't exist at all.
+     */
+    if ((arm_feature(env, ARM_FEATURE_EL3) && !arm_el_is_aa64(env, 3) &&
+         cur_el <= 2 && !arm_is_secure_below_el3(env))) {
+        if (!extract32(env->cp15.nsacr, 10, 1)) {
+            /* FP insns act as UNDEF */
+            return cur_el == 2 ? 2 : 1;
+        }
+    }
+
     /* For the CPTR registers we don't need to guard with an ARM_FEATURE
      * check because zero bits in the registers mean "don't trap".
      */
-- 
2.20.1

In commit 80376c3fc2c38fdd453 in 2010 we added a workaround for
some qbus buses not being connected to qdev devices -- if the
bus has no parent object then we register a reset function which
resets the bus on system reset (and unregister it when the
bus is unparented).

Nearly a decade later, we have now no buses in the tree which
are created with non-NULL parents, so we can remove the
workaround and instead just assert that if the bus has a NULL
parent then it is the main system bus.

(The absence of other parentless buses was confirmed by
code inspection of all the callsites of qbus_create() and
qbus_create_inplace() and cross-checked by 'make check'.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Damien Hedde <damien.hedde@greensocs.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190523150543.22676-1-peter.maydell@linaro.org
---
 hw/core/bus.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/hw/core/bus.c b/hw/core/bus.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/core/bus.c
+++ b/hw/core/bus.c
@@ -XXX,XX +XXX,XX @@ static void qbus_realize(BusState *bus, DeviceState *parent, const char *name)
         bus->parent->num_child_bus++;
         object_property_add_child(OBJECT(bus->parent), bus->name, OBJECT(bus), NULL);
         object_unref(OBJECT(bus));
-    } else if (bus != sysbus_get_default()) {
-        /* TODO: once all bus devices are qdevified,
-           only reset handler for main_system_bus should be registered here. */
-        qemu_register_reset(qbus_reset_all_fn, bus);
+    } else {
+        /* The only bus without a parent is the main system bus */
+        assert(bus == sysbus_get_default());
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void bus_unparent(Object *obj)
     BusState *bus = BUS(obj);
     BusChild *kid;
 
+    /* Only the main system bus has no parent, and that bus is never freed */
+    assert(bus->parent);
+
     while ((kid = QTAILQ_FIRST(&bus->children)) != NULL) {
         DeviceState *dev = kid->child;
         object_unparent(OBJECT(dev));
     }
-    if (bus->parent) {
-        QLIST_REMOVE(bus, sibling);
-        bus->parent->num_child_bus--;
-        bus->parent = NULL;
-    } else {
-        assert(bus != sysbus_get_default()); /* main_system_bus is never freed */
-        qemu_unregister_reset(qbus_reset_all_fn, bus);
-    }
+    QLIST_REMOVE(bus, sibling);
+    bus->parent->num_child_bus--;
+    bus->parent = NULL;
 }
 
 void qbus_create_inplace(void *bus, size_t size, const char *typename,
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The ARM pseudocode installs the error_code into the original
pointer, not the encrypted pointer.  The difference applies
within the 7 bits of pac data; the result should be the sign
extension of bit 55.

Add a testcase to that effect.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/tcg/aarch64/Makefile.target |  2 +-
 target/arm/pauth_helper.c         |  4 +-
 tests/tcg/aarch64/pauth-2.c       | 61 +++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 3 deletions(-)
 create mode 100644 tests/tcg/aarch64/pauth-2.c

diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -XXX,XX +XXX,XX @@ run-fcvt: fcvt
 	$(call run-test,$<,$(QEMU) $<, "$< on $(TARGET_NAME)")
 	$(call diff-out,$<,$(AARCH64_SRC)/fcvt.ref)
 
-AARCH64_TESTS += pauth-1
+AARCH64_TESTS += pauth-1 pauth-2
 run-pauth-%: QEMU += -cpu max
 
 TESTS:=$(AARCH64_TESTS)
diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
     if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
         int error_code = (keynumber << 1) | (keynumber ^ 1);
         if (param.tbi) {
-            return deposit64(ptr, 53, 2, error_code);
+            return deposit64(orig_ptr, 53, 2, error_code);
         } else {
-            return deposit64(ptr, 61, 2, error_code);
+            return deposit64(orig_ptr, 61, 2, error_code);
         }
     }
     return orig_ptr;
diff --git a/tests/tcg/aarch64/pauth-2.c b/tests/tcg/aarch64/pauth-2.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/pauth-2.c
@@ -XXX,XX +XXX,XX @@
+#include <stdint.h>
+#include <assert.h>
+
+asm(".arch armv8.4-a");
+
+void do_test(uint64_t value)
+{
+    uint64_t salt1, salt2;
+    uint64_t encode, decode;
+
+    /*
+     * With TBI enabled and a 48-bit VA, there are 7 bits of auth,
+     * and so a 1/128 chance of encode = pac(value,key,salt) producing
+     * an auth for which leaves value unchanged.
+     * Iterate until we find a salt for which encode != value.
+     */
+    for (salt1 = 1; ; salt1++) {
+        asm volatile("pacda %0, %2" : "=r"(encode) : "0"(value), "r"(salt1));
+        if (encode != value) {
+            break;
+        }
+    }
+
+    /* A valid salt must produce a valid authorization.  */
+    asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt1));
+    assert(decode == value);
+
+    /*
+     * An invalid salt usually fails authorization, but again there
+     * is a chance of choosing another salt that works.
+     * Iterate until we find another salt which does fail.
+     */
+    for (salt2 = salt1 + 1; ; salt2++) {
+        asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt2));
+        if (decode != value) {
+            break;
+        }
+    }
+
+    /* The VA bits, bit 55, and the TBI bits, should be unchanged.  */
+    assert(((decode ^ value) & 0xff80ffffffffffffull) == 0);
+
+    /*
+     * Bits [54:53] are an error indicator based on the key used;
+     * the DA key above is keynumber 0, so error == 0b01.  Otherwise
+     * bit 55 of the original is sign-extended into the rest of the auth.
+     */
+    if ((value >> 55) & 1) {
+        assert(((decode >> 48) & 0xff) == 0b10111111);
+    } else {
+        assert(((decode >> 48) & 0xff) == 0b00100000);
+    }
+}
+
+int main()
+{
+    do_test(0);
+    do_test(-1);
+    do_test(0xda004acedeadbeefull);
+    return 0;
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Typo comparing the sign of the field, twice, instead of also comparing
the mask of the field (which itself encodes both position and length).

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190604154225.26992-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 scripts/decodetree.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index XXXXXXX..XXXXXXX 100755
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -XXX,XX +XXX,XX @@ class Field:
         return '{0}(insn, {1}, {2})'.format(extr, self.pos, self.len)
 
     def __eq__(self, other):
-        return self.sign == other.sign and self.sign == other.sign
+        return self.sign == other.sign and self.mask == other.mask
 
     def __ne__(self, other):
         return not self.__eq__(other)
-- 
2.20.1

Add the infrastructure for building and invoking a decodetree decoder
for the AArch32 VFP encodings.  At the moment the new decoder covers
nothing, so we always fall back to the existing hand-written decode.

We need to have one decoder for the unconditional insns and one for
the conditional insns, as otherwise the patterns for conditional
insns would incorrectly match against the unconditional ones too.

Since translate.c is over 14,000 lines long and we're going to be
touching pretty much every line of the VFP code as part of the
decodetree conversion, we create a new translate-vfp.inc.c to hold
the code which deals with VFP in the new scheme.  It should be
possible to convert this into a standalone translation unit
eventually, but the conversion process will be much simpler if we
simply #include it midway through translate.c to start with.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/Makefile.objs       | 13 +++++++++++++
 target/arm/translate-vfp.inc.c | 31 +++++++++++++++++++++++++++++++
 target/arm/translate.c         | 19 +++++++++++++++++++
 target/arm/vfp-uncond.decode   | 28 ++++++++++++++++++++++++++++
 target/arm/vfp.decode          | 28 ++++++++++++++++++++++++++++
 5 files changed, 119 insertions(+)
 create mode 100644 target/arm/translate-vfp.inc.c
 create mode 100644 target/arm/vfp-uncond.decode
 create mode 100644 target/arm/vfp.decode

diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
 	  $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
 	  "GEN", $(TARGET_DIR)$@)
 
+target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
+target/arm/decode-vfp-uncond.inc.c: $(SRC_PATH)/target/arm/vfp-uncond.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_vfp_uncond -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
 target/arm/translate-sve.o: target/arm/decode-sve.inc.c
+target/arm/translate.o: target/arm/decode-vfp.inc.c
+target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
+
 obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@
+/*
+ *  ARM translation: AArch32 VFP instructions
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *  Copyright (c) 2005-2007 CodeSourcery
+ *  Copyright (c) 2007 OpenedHand, Ltd.
+ *  Copyright (c) 2019 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This file is intended to be included from translate.c; it uses
+ * some macros and definitions provided by that file.
+ * It might be possible to convert it to a standalone .c file eventually.
+ */
+
+/* Include the generated VFP decoder */
+#include "decode-vfp.inc.c"
+#include "decode-vfp-uncond.inc.c"
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_mov_vreg_F0(int dp, int reg)
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
+/* Include the VFP decoder */
+#include "translate-vfp.inc.c"
+
 static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
 {
     tcg_gen_ld_i64(var, cpu_env, offsetof(CPUARMState, iwmmxt.regs[reg]));
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         return 1;
     }
 
+    /*
+     * If the decodetree decoder handles this insn it will always
+     * emit code to either execute the insn or generate an appropriate
+     * exception; so we don't need to ever return non-zero to tell
+     * the calling code to emit an UNDEF exception.
+     */
+    if (extract32(insn, 28, 4) == 0xf) {
+        if (disas_vfp_uncond(s, insn)) {
+            return 0;
+        }
+    } else {
+        if (disas_vfp(s, insn)) {
+            return 0;
+        }
+    }
+
     /* FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 VFP instruction descriptions (unconditional insns)
+#
+#  Copyright (c) 2019 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+# Encodings for the unconditional VFP instructions are here:
+# generally anything matching A32
+#  1111 1110 .... .... .... 101. ...0 ....
+# and T32
+#  1111 110. .... .... .... 101. .... ....
+#  1111 1110 .... .... .... 101. .... ....
+# (but those patterns might also cover some Neon instructions,
+# which do not live in this file.)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 VFP instruction descriptions (conditional insns)
+#
+#  Copyright (c) 2019 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+# Encodings for the conditional VFP instructions are here:
+# generally anything matching A32
+#  cccc 11.. .... .... .... 101. .... ....
+# and T32
+#  1110 110. .... .... .... 101. .... ....
+#  1110 1110 .... .... .... 101. .... ....
+# (but those patterns might also cover some Neon instructions,
+# which do not live in this file.)
-- 
2.20.1

Factor out the VFP access checking code so that we can use it in the
leaf functions of the decodetree decoder.

We call the function full_vfp_access_check() so we can keep
the more natural vfp_access_check() for a version which doesn't
have the 'ignore_vfp_enabled' flag -- that way almost all VFP
insns will be able to use vfp_access_check(s) and only the
special-register access function will have to use
full_vfp_access_check(s, ignore_vfp_enabled).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 101 +++++----------------------------
 2 files changed, 113 insertions(+), 88 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@
 /* Include the generated VFP decoder */
 #include "decode-vfp.inc.c"
 #include "decode-vfp-uncond.inc.c"
+
+/*
+ * Check that VFP access is enabled. If it is, do the necessary
+ * M-profile lazy-FP handling and then return true.
+ * If not, emit code to generate an appropriate exception and
+ * return false.
+ * The ignore_vfp_enabled argument specifies that we should ignore
+ * whether VFP is enabled via FPEXC[EN]: this should be true for FMXR/FMRX
+ * accesses to FPSID, FPEXC, MVFR0, MVFR1, MVFR2, and false for all other insns.
+ */
+static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+{
+    if (s->fp_excp_el) {
+        if (arm_dc_feature(s, ARM_FEATURE_M)) {
+            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
+                               s->fp_excp_el);
+        } else {
+            gen_exception_insn(s, 4, EXCP_UDEF,
+                               syn_fp_access_trap(1, 0xe, false),
+                               s->fp_excp_el);
+        }
+        return false;
+    }
+
+    if (!s->vfp_enabled && !ignore_vfp_enabled) {
+        assert(!arm_dc_feature(s, ARM_FEATURE_M));
+        gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(),
+                           default_exception_el(s));
+        return false;
+    }
+
+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
+        /* Handle M-profile lazy FP state mechanics */
+
+        /* Trigger lazy-state preservation if necessary */
+        if (s->v7m_lspact) {
+            /*
+             * Lazy state saving affects external memory and also the NVIC,
+             * so we must mark it as an IO operation for icount.
+             */
+            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+                gen_io_start();
+            }
+            gen_helper_v7m_preserve_fp_state(cpu_env);
+            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+                gen_io_end();
+            }
+            /*
+             * If the preserve_fp_state helper doesn't throw an exception
+             * then it will clear LSPACT; we don't need to repeat this for
+             * any further FP insns in this TB.
+             */
+            s->v7m_lspact = false;
+        }
+
+        /* Update ownership of FP context: set FPCCR.S to match current state */
+        if (s->v8m_fpccr_s_wrong) {
+            TCGv_i32 tmp;
+
+            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
+            if (s->v8m_secure) {
+                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
+            } else {
+                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
+            }
+            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
+            /* Don't need to do this for any further FP insns in this TB */
+            s->v8m_fpccr_s_wrong = false;
+        }
+
+        if (s->v7m_new_fp_ctxt_needed) {
+            /*
+             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
+             * and the FPSCR.
+             */
+            TCGv_i32 control, fpscr;
+            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
+
+            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
+            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
+            tcg_temp_free_i32(fpscr);
+            /*
+             * We don't need to arrange to end the TB, because the only
+             * parts of FPSCR which we cache in the TB flags are the VECLEN
+             * and VECSTRIDE, and those don't exist for M-profile.
+             */
+
+            if (s->v8m_secure) {
+                bits |= R_V7M_CONTROL_SFPA_MASK;
+            }
+            control = load_cpu_field(v7m.control[M_REG_S]);
+            tcg_gen_ori_i32(control, control, bits);
+            store_cpu_field(control, v7m.control[M_REG_S]);
+            /* Don't need to do this for any further FP insns in this TB */
+            s->v7m_new_fp_ctxt_needed = false;
+        }
+    }
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
     return 1;
 }
 
-/* Disassemble a VFP instruction.  Returns nonzero if an error occurred
-   (ie. an undefined instruction).  */
+/*
+ * Disassemble a VFP instruction.  Returns nonzero if an error occurred
+ * (ie. an undefined instruction).
+ */
 static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
     uint32_t rd, rn, rm, op, i, n, offset, delta_d, delta_m, bank_mask;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
     TCGv_i32 addr;
     TCGv_i32 tmp;
     TCGv_i32 tmp2;
+    bool ignore_vfp_enabled = false;
 
     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         }
     }
 
-    /* FIXME: this access check should not take precedence over UNDEF
+    /*
+     * FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
      */
-    if (s->fp_excp_el) {
-        if (arm_dc_feature(s, ARM_FEATURE_M)) {
-            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
-                               s->fp_excp_el);
-        } else {
-            gen_exception_insn(s, 4, EXCP_UDEF,
-                               syn_fp_access_trap(1, 0xe, false),
-                               s->fp_excp_el);
-        }
-        return 0;
-    }
-
-    if (!s->vfp_enabled) {
-        /* VFP disabled.  Only allow fmxr/fmrx to/from some control regs.  */
-        if ((insn & 0x0fe00fff) != 0x0ee00a10)
-            return 1;
+    if ((insn & 0x0fe00fff) == 0x0ee00a10) {
         rn = (insn >> 16) & 0xf;
-        if (rn != ARM_VFP_FPSID && rn != ARM_VFP_FPEXC && rn != ARM_VFP_MVFR2
-            && rn != ARM_VFP_MVFR1 && rn != ARM_VFP_MVFR0) {
-            return 1;
+        if (rn == ARM_VFP_FPSID || rn == ARM_VFP_FPEXC || rn == ARM_VFP_MVFR2
+            || rn == ARM_VFP_MVFR1 || rn == ARM_VFP_MVFR0) {
+            ignore_vfp_enabled = true;
         }
     }
-
-    if (arm_dc_feature(s, ARM_FEATURE_M)) {
-        /* Handle M-profile lazy FP state mechanics */
-
-        /* Trigger lazy-state preservation if necessary */
-        if (s->v7m_lspact) {
-            /*
-             * Lazy state saving affects external memory and also the NVIC,
-             * so we must mark it as an IO operation for icount.
-             */
-            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-                gen_io_start();
-            }
-            gen_helper_v7m_preserve_fp_state(cpu_env);
-            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-                gen_io_end();
-            }
-            /*
-             * If the preserve_fp_state helper doesn't throw an exception
-             * then it will clear LSPACT; we don't need to repeat this for
-             * any further FP insns in this TB.
-             */
-            s->v7m_lspact = false;
-        }
-
-        /* Update ownership of FP context: set FPCCR.S to match current state */
-        if (s->v8m_fpccr_s_wrong) {
-            TCGv_i32 tmp;
-
-            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
-            if (s->v8m_secure) {
-                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
-            } else {
-                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
-            }
-            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
-            /* Don't need to do this for any further FP insns in this TB */
-            s->v8m_fpccr_s_wrong = false;
-        }
-
-        if (s->v7m_new_fp_ctxt_needed) {
-            /*
-             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
-             * and the FPSCR.
-             */
-            TCGv_i32 control, fpscr;
-            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
-
-            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
-            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-            tcg_temp_free_i32(fpscr);
-            /*
-             * We don't need to arrange to end the TB, because the only
-             * parts of FPSCR which we cache in the TB flags are the VECLEN
-             * and VECSTRIDE, and those don't exist for M-profile.
-             */
-
-            if (s->v8m_secure) {
-                bits |= R_V7M_CONTROL_SFPA_MASK;
-            }
-            control = load_cpu_field(v7m.control[M_REG_S]);
-            tcg_gen_ori_i32(control, control, bits);
-            store_cpu_field(control, v7m.control[M_REG_S]);
-            /* Don't need to do this for any further FP insns in this TB */
-            s->v7m_new_fp_ctxt_needed = false;
-        }
+    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
+        return 0;
     }
 
     if (extract32(insn, 28, 4) == 0xf) {
-- 
2.20.1

At the moment our -cpu max for AArch32 supports VFP short-vectors
because we always implement them, even for CPUs which should
not have them. The following commits are going to switch to
using the correct ID-register-check to enable or disable short
vector support, so we need to turn it on explicitly for -cpu max,
because Cortex-A15 doesn't implement it.

We don't enable this for the AArch64 -cpu max, because the v8A
architecture never supports short-vectors.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
         kvm_arm_set_cpu_features_from_host(cpu);
     } else {
         cortex_a15_initfn(obj);
+
+        /* old-style VFP short-vector support */
+        cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSHVEC, 1);
+
 #ifdef CONFIG_USER_ONLY
         /* We don't set these in system emulation mode for the moment,
          * since we don't correctly set (all of) the ID registers to
-- 
2.20.1

Convert the VSEL instructions to decodetree.
We leave trans_VSEL() in translate.c for now as this allows
the patch to show just the changes from the old handle_vsel().

In the old code the check for "do D16-D31 exist" was hidden in
the VFP_DREG macro, and assumed that VFPv3 always implied that
D16-D31 exist. In the new code we do the correct ID register test.
This gives identical behaviour for most of our CPUs, and fixes
previously incorrect handling for  Cortex-R5F, Cortex-M4 and
Cortex-M33, which all implement VFPv3 or better with only 16
double-precision registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               |  6 ++++++
 target/arm/translate-vfp.inc.c |  9 +++++++++
 target/arm/translate.c         | 35 ++++++++++++++++++++++++----------
 target/arm/vfp-uncond.decode   | 19 ++++++++++++++++++
 4 files changed, 59 insertions(+), 10 deletions(-)

Convert the VMINNM and VMAXNM instructions to decodetree.
As with VSEL, we leave the trans_VMINMAXNM() function
in translate.c for the moment.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c       | 41 ++++++++++++++++++++++++------------
 target/arm/vfp-uncond.decode |  5 +++++
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
     return true;
 }
 
-static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
-                            uint32_t rm, uint32_t dp)
+static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
 {
-    uint32_t vmin = extract32(insn, 6, 1);
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
+    uint32_t rd, rn, rm;
+    bool dp = a->dp;
+    bool vmin = a->op;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vn | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rn = a->vn;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
 
     if (dp) {
         TCGv_i64 frn, frm, dest;
@@ -XXX,XX +XXX,XX @@ static int handle_vminmaxnm(uint32_t insn, uint32_t rd, uint32_t rn,
     }
 
     tcg_temp_free_ptr(fpst);
-    return 0;
+    return true;
 }
 
 static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
@@ -XXX,XX +XXX,XX @@ static const uint8_t fp_decode_rm[] = {
 
 static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
 {
-    uint32_t rd, rn, rm, dp = extract32(insn, 8, 1);
+    uint32_t rd, rm, dp = extract32(insn, 8, 1);
 
     if (dp) {
         VFP_DREG_D(rd, insn);
-        VFP_DREG_N(rn, insn);
         VFP_DREG_M(rm, insn);
     } else {
         rd = VFP_SREG_D(insn);
-        rn = VFP_SREG_N(insn);
         rm = VFP_SREG_M(insn);
     }
 
-    if ((insn & 0x0fb00e10) == 0x0e800a00 &&
-        dc_isar_feature(aa32_vminmaxnm, s)) {
-        return handle_vminmaxnm(insn, rd, rn, rm, dp);
-    } else if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
-               dc_isar_feature(aa32_vrint, s)) {
+    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
+        dc_isar_feature(aa32_vrint, s)) {
         /* VRINTA, VRINTN, VRINTP, VRINTM */
         int rounding = fp_decode_rm[extract32(insn, 16, 2)];
         return handle_vrint(insn, rd, rm, dp, rounding);
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VSEL        1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 VSEL        1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
+
+VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
+            vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
+VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
+            vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
-- 
2.20.1

Convert the VRINTA/VRINTN/VRINTP/VRINTM instructions to decodetree.
Again, trans_VRINT() is temporarily left in translate.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c       | 60 +++++++++++++++++++++++-------------
 target/arm/vfp-uncond.decode |  5 +++
 2 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
     return true;
 }
 
-static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
-                        int rounding)
+/*
+ * Table for converting the most common AArch32 encoding of
+ * rounding mode to arm_fprounding order (which matches the
+ * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
+ */
+static const uint8_t fp_decode_rm[] = {
+    FPROUNDING_TIEAWAY,
+    FPROUNDING_TIEEVEN,
+    FPROUNDING_POSINF,
+    FPROUNDING_NEGINF,
+};
+
+static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
 {
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
     TCGv_i32 tcg_rmode;
+    int rounding = fp_decode_rm[a->rm];
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
 
     tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
@@ -XXX,XX +XXX,XX @@ static int handle_vrint(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
     tcg_temp_free_i32(tcg_rmode);
 
     tcg_temp_free_ptr(fpst);
-    return 0;
+    return true;
 }
 
 static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
     return 0;
 }
 
-/* Table for converting the most common AArch32 encoding of
- * rounding mode to arm_fprounding order (which matches the
- * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
- */
-static const uint8_t fp_decode_rm[] = {
-    FPROUNDING_TIEAWAY,
-    FPROUNDING_TIEEVEN,
-    FPROUNDING_POSINF,
-    FPROUNDING_NEGINF,
-};
-
 static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
 {
     uint32_t rd, rm, dp = extract32(insn, 8, 1);
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
         rm = VFP_SREG_M(insn);
     }
 
-    if ((insn & 0x0fbc0ed0) == 0x0eb80a40 &&
-        dc_isar_feature(aa32_vrint, s)) {
-        /* VRINTA, VRINTN, VRINTP, VRINTM */
-        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
-        return handle_vrint(insn, rd, rm, dp, rounding);
-    } else if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
-               dc_isar_feature(aa32_vcvt_dr, s)) {
+    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
+        dc_isar_feature(aa32_vcvt_dr, s)) {
         /* VCVTA, VCVTN, VCVTP, VCVTM */
         int rounding = fp_decode_rm[extract32(insn, 16, 2)];
         return handle_vcvt(insn, rd, rm, dp, rounding);
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VMINMAXNM   1111 1110 1.00 .... .... 1010 . op:1 .0 .... \
             vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
 VMINMAXNM   1111 1110 1.00 .... .... 1011 . op:1 .0 .... \
             vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
+
+VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
+            vm=%vm_sp vd=%vd_sp dp=0
+VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
+            vm=%vm_dp vd=%vd_dp dp=1
-- 
2.20.1

Convert the VCVTA/VCVTN/VCVTP/VCVTM instructions to decodetree.
trans_VCVT() is temporarily left in translate.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c       | 72 +++++++++++++++++-------------------
 target/arm/vfp-uncond.decode |  6 +++
 2 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
     return true;
 }
 
-static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
-                       int rounding)
+static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
 {
-    bool is_signed = extract32(insn, 7, 1);
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
     TCGv_i32 tcg_rmode, tcg_shift;
+    int rounding = fp_decode_rm[a->rm];
+    bool is_signed = a->op;
+
+    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
 
     tcg_shift = tcg_const_i32(0);
 
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
     if (dp) {
         TCGv_i64 tcg_double, tcg_res;
         TCGv_i32 tcg_tmp;
-        /* Rd is encoded as a single precision register even when the source
-         * is double precision.
-         */
-        rd = ((rd << 1) & 0x1e) | ((rd >> 4) & 0x1);
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static int handle_vcvt(uint32_t insn, uint32_t rd, uint32_t rm, uint32_t dp,
 
     tcg_temp_free_ptr(fpst);
 
-    return 0;
-}
-
-static int disas_vfp_misc_insn(DisasContext *s, uint32_t insn)
-{
-    uint32_t rd, rm, dp = extract32(insn, 8, 1);
-
-    if (dp) {
-        VFP_DREG_D(rd, insn);
-        VFP_DREG_M(rm, insn);
-    } else {
-        rd = VFP_SREG_D(insn);
-        rm = VFP_SREG_M(insn);
-    }
-
-    if ((insn & 0x0fbc0e50) == 0x0ebc0a40 &&
-        dc_isar_feature(aa32_vcvt_dr, s)) {
-        /* VCVTA, VCVTN, VCVTP, VCVTM */
-        int rounding = fp_decode_rm[extract32(insn, 16, 2)];
-        return handle_vcvt(insn, rd, rm, dp, rounding);
-    }
-    return 1;
+    return true;
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         }
     }
 
+    if (extract32(insn, 28, 4) == 0xf) {
+        /*
+         * Encodings with T=1 (Thumb) or unconditional (ARM): these
+         * were all handled by the decodetree decoder, so any insn
+         * patterns which get here must be UNDEF.
+         */
+        return 1;
+    }
+
     /*
      * FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         return 0;
     }
 
-    if (extract32(insn, 28, 4) == 0xf) {
-        /*
-         * Encodings with T=1 (Thumb) or unconditional (ARM):
-         * only used for the "miscellaneous VFP features" added in v8A
-         * and v7M (and gated on the MVFR2.FPMisc field).
-         */
-        return disas_vfp_misc_insn(s, insn);
-    }
-
     dp = ((insn & 0xf00) == 0xb00);
     switch ((insn >> 24) & 0xf) {
     case 0xe:
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp-uncond.decode
+++ b/target/arm/vfp-uncond.decode
@@ -XXX,XX +XXX,XX @@ VRINT       1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
             vm=%vm_sp vd=%vd_sp dp=0
 VRINT       1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
             vm=%vm_dp vd=%vd_dp dp=1
+
+# VCVT float to int with specified rounding mode; Vd is always single-precision
+VCVT        1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
+            vm=%vm_sp vd=%vd_sp dp=0
+VCVT        1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
+            vm=%vm_dp vd=%vd_sp dp=1
-- 
2.20.1

Move the trans_*() functions we've just created from translate.c
to translate-vfp.inc.c. This is pure code motion with no textual
changes (this can be checked with 'git show --color-moved').

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 337 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 337 ---------------------------------
 2 files changed, 337 insertions(+), 337 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check(DisasContext *s)
 {
     return full_vfp_access_check(s, false);
 }
+
+static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+{
+    uint32_t rd, rn, rm;
+    bool dp = a->dp;
+
+    if (!dc_isar_feature(aa32_vsel, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vn | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rn = a->vn;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (dp) {
+        TCGv_i64 frn, frm, dest;
+        TCGv_i64 tmp, zero, zf, nf, vf;
+
+        zero = tcg_const_i64(0);
+
+        frn = tcg_temp_new_i64();
+        frm = tcg_temp_new_i64();
+        dest = tcg_temp_new_i64();
+
+        zf = tcg_temp_new_i64();
+        nf = tcg_temp_new_i64();
+        vf = tcg_temp_new_i64();
+
+        tcg_gen_extu_i32_i64(zf, cpu_ZF);
+        tcg_gen_ext_i32_i64(nf, cpu_NF);
+        tcg_gen_ext_i32_i64(vf, cpu_VF);
+
+        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        switch (a->cc) {
+        case 0: /* eq: Z */
+            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
+                                frn, frm);
+            break;
+        case 1: /* vs: V */
+            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
+                                frn, frm);
+            break;
+        case 2: /* ge: N == V -> N ^ V == 0 */
+            tmp = tcg_temp_new_i64();
+            tcg_gen_xor_i64(tmp, vf, nf);
+            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
+                                frn, frm);
+            tcg_temp_free_i64(tmp);
+            break;
+        case 3: /* gt: !Z && N == V */
+            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
+                                frn, frm);
+            tmp = tcg_temp_new_i64();
+            tcg_gen_xor_i64(tmp, vf, nf);
+            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
+                                dest, frm);
+            tcg_temp_free_i64(tmp);
+            break;
+        }
+        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i64(frn);
+        tcg_temp_free_i64(frm);
+        tcg_temp_free_i64(dest);
+
+        tcg_temp_free_i64(zf);
+        tcg_temp_free_i64(nf);
+        tcg_temp_free_i64(vf);
+
+        tcg_temp_free_i64(zero);
+    } else {
+        TCGv_i32 frn, frm, dest;
+        TCGv_i32 tmp, zero;
+
+        zero = tcg_const_i32(0);
+
+        frn = tcg_temp_new_i32();
+        frm = tcg_temp_new_i32();
+        dest = tcg_temp_new_i32();
+        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        switch (a->cc) {
+        case 0: /* eq: Z */
+            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
+                                frn, frm);
+            break;
+        case 1: /* vs: V */
+            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
+                                frn, frm);
+            break;
+        case 2: /* ge: N == V -> N ^ V == 0 */
+            tmp = tcg_temp_new_i32();
+            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
+            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
+                                frn, frm);
+            tcg_temp_free_i32(tmp);
+            break;
+        case 3: /* gt: !Z && N == V */
+            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
+                                frn, frm);
+            tmp = tcg_temp_new_i32();
+            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
+            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
+                                dest, frm);
+            tcg_temp_free_i32(tmp);
+            break;
+        }
+        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i32(frn);
+        tcg_temp_free_i32(frm);
+        tcg_temp_free_i32(dest);
+
+        tcg_temp_free_i32(zero);
+    }
+
+    return true;
+}
+
+static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
+{
+    uint32_t rd, rn, rm;
+    bool dp = a->dp;
+    bool vmin = a->op;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vn | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rn = a->vn;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
+
+    if (dp) {
+        TCGv_i64 frn, frm, dest;
+
+        frn = tcg_temp_new_i64();
+        frm = tcg_temp_new_i64();
+        dest = tcg_temp_new_i64();
+
+        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        if (vmin) {
+            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
+        } else {
+            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
+        }
+        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i64(frn);
+        tcg_temp_free_i64(frm);
+        tcg_temp_free_i64(dest);
+    } else {
+        TCGv_i32 frn, frm, dest;
+
+        frn = tcg_temp_new_i32();
+        frm = tcg_temp_new_i32();
+        dest = tcg_temp_new_i32();
+
+        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
+        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        if (vmin) {
+            gen_helper_vfp_minnums(dest, frn, frm, fpst);
+        } else {
+            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
+        }
+        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i32(frn);
+        tcg_temp_free_i32(frm);
+        tcg_temp_free_i32(dest);
+    }
+
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
+/*
+ * Table for converting the most common AArch32 encoding of
+ * rounding mode to arm_fprounding order (which matches the
+ * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
+ */
+static const uint8_t fp_decode_rm[] = {
+    FPROUNDING_TIEAWAY,
+    FPROUNDING_TIEEVEN,
+    FPROUNDING_POSINF,
+    FPROUNDING_NEGINF,
+};
+
+static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
+{
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
+    TCGv_i32 tcg_rmode;
+    int rounding = fp_decode_rm[a->rm];
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+        ((a->vm | a->vd) & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
+
+    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+
+    if (dp) {
+        TCGv_i64 tcg_op;
+        TCGv_i64 tcg_res;
+        tcg_op = tcg_temp_new_i64();
+        tcg_res = tcg_temp_new_i64();
+        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        gen_helper_rintd(tcg_res, tcg_op, fpst);
+        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i64(tcg_op);
+        tcg_temp_free_i64(tcg_res);
+    } else {
+        TCGv_i32 tcg_op;
+        TCGv_i32 tcg_res;
+        tcg_op = tcg_temp_new_i32();
+        tcg_res = tcg_temp_new_i32();
+        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        gen_helper_rints(tcg_res, tcg_op, fpst);
+        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        tcg_temp_free_i32(tcg_op);
+        tcg_temp_free_i32(tcg_res);
+    }
+
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    tcg_temp_free_i32(tcg_rmode);
+
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
+static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
+{
+    uint32_t rd, rm;
+    bool dp = a->dp;
+    TCGv_ptr fpst;
+    TCGv_i32 tcg_rmode, tcg_shift;
+    int rounding = fp_decode_rm[a->rm];
+    bool is_signed = a->op;
+
+    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+    rd = a->vd;
+    rm = a->vm;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(0);
+
+    tcg_shift = tcg_const_i32(0);
+
+    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+
+    if (dp) {
+        TCGv_i64 tcg_double, tcg_res;
+        TCGv_i32 tcg_tmp;
+        tcg_double = tcg_temp_new_i64();
+        tcg_res = tcg_temp_new_i64();
+        tcg_tmp = tcg_temp_new_i32();
+        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
+        if (is_signed) {
+            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
+        } else {
+            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
+        }
+        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
+        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
+        tcg_temp_free_i32(tcg_tmp);
+        tcg_temp_free_i64(tcg_res);
+        tcg_temp_free_i64(tcg_double);
+    } else {
+        TCGv_i32 tcg_single, tcg_res;
+        tcg_single = tcg_temp_new_i32();
+        tcg_res = tcg_temp_new_i32();
+        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
+        if (is_signed) {
+            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
+        } else {
+            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
+        }
+        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
+        tcg_temp_free_i32(tcg_res);
+        tcg_temp_free_i32(tcg_single);
+    }
+
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    tcg_temp_free_i32(tcg_rmode);
+
+    tcg_temp_free_i32(tcg_shift);
+
+    tcg_temp_free_ptr(fpst);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
     tcg_temp_free_i32(tmp);
 }
 
-static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
-{
-    uint32_t rd, rn, rm;
-    bool dp = a->dp;
-
-    if (!dc_isar_feature(aa32_vsel, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
-        ((a->vm | a->vn | a->vd) & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rn = a->vn;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    if (dp) {
-        TCGv_i64 frn, frm, dest;
-        TCGv_i64 tmp, zero, zf, nf, vf;
-
-        zero = tcg_const_i64(0);
-
-        frn = tcg_temp_new_i64();
-        frm = tcg_temp_new_i64();
-        dest = tcg_temp_new_i64();
-
-        zf = tcg_temp_new_i64();
-        nf = tcg_temp_new_i64();
-        vf = tcg_temp_new_i64();
-
-        tcg_gen_extu_i32_i64(zf, cpu_ZF);
-        tcg_gen_ext_i32_i64(nf, cpu_NF);
-        tcg_gen_ext_i32_i64(vf, cpu_VF);
-
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
-        switch (a->cc) {
-        case 0: /* eq: Z */
-            tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
-                                frn, frm);
-            break;
-        case 1: /* vs: V */
-            tcg_gen_movcond_i64(TCG_COND_LT, dest, vf, zero,
-                                frn, frm);
-            break;
-        case 2: /* ge: N == V -> N ^ V == 0 */
-            tmp = tcg_temp_new_i64();
-            tcg_gen_xor_i64(tmp, vf, nf);
-            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
-                                frn, frm);
-            tcg_temp_free_i64(tmp);
-            break;
-        case 3: /* gt: !Z && N == V */
-            tcg_gen_movcond_i64(TCG_COND_NE, dest, zf, zero,
-                                frn, frm);
-            tmp = tcg_temp_new_i64();
-            tcg_gen_xor_i64(tmp, vf, nf);
-            tcg_gen_movcond_i64(TCG_COND_GE, dest, tmp, zero,
-                                dest, frm);
-            tcg_temp_free_i64(tmp);
-            break;
-        }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i64(frn);
-        tcg_temp_free_i64(frm);
-        tcg_temp_free_i64(dest);
-
-        tcg_temp_free_i64(zf);
-        tcg_temp_free_i64(nf);
-        tcg_temp_free_i64(vf);
-
-        tcg_temp_free_i64(zero);
-    } else {
-        TCGv_i32 frn, frm, dest;
-        TCGv_i32 tmp, zero;
-
-        zero = tcg_const_i32(0);
-
-        frn = tcg_temp_new_i32();
-        frm = tcg_temp_new_i32();
-        dest = tcg_temp_new_i32();
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
-        switch (a->cc) {
-        case 0: /* eq: Z */
-            tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
-                                frn, frm);
-            break;
-        case 1: /* vs: V */
-            tcg_gen_movcond_i32(TCG_COND_LT, dest, cpu_VF, zero,
-                                frn, frm);
-            break;
-        case 2: /* ge: N == V -> N ^ V == 0 */
-            tmp = tcg_temp_new_i32();
-            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
-            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
-                                frn, frm);
-            tcg_temp_free_i32(tmp);
-            break;
-        case 3: /* gt: !Z && N == V */
-            tcg_gen_movcond_i32(TCG_COND_NE, dest, cpu_ZF, zero,
-                                frn, frm);
-            tmp = tcg_temp_new_i32();
-            tcg_gen_xor_i32(tmp, cpu_VF, cpu_NF);
-            tcg_gen_movcond_i32(TCG_COND_GE, dest, tmp, zero,
-                                dest, frm);
-            tcg_temp_free_i32(tmp);
-            break;
-        }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i32(frn);
-        tcg_temp_free_i32(frm);
-        tcg_temp_free_i32(dest);
-
-        tcg_temp_free_i32(zero);
-    }
-
-    return true;
-}
-
-static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
-{
-    uint32_t rd, rn, rm;
-    bool dp = a->dp;
-    bool vmin = a->op;
-    TCGv_ptr fpst;
-
-    if (!dc_isar_feature(aa32_vminmaxnm, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
-        ((a->vm | a->vn | a->vd) & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rn = a->vn;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    fpst = get_fpstatus_ptr(0);
-
-    if (dp) {
-        TCGv_i64 frn, frm, dest;
-
-        frn = tcg_temp_new_i64();
-        frm = tcg_temp_new_i64();
-        dest = tcg_temp_new_i64();
-
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
-        if (vmin) {
-            gen_helper_vfp_minnumd(dest, frn, frm, fpst);
-        } else {
-            gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
-        }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i64(frn);
-        tcg_temp_free_i64(frm);
-        tcg_temp_free_i64(dest);
-    } else {
-        TCGv_i32 frn, frm, dest;
-
-        frn = tcg_temp_new_i32();
-        frm = tcg_temp_new_i32();
-        dest = tcg_temp_new_i32();
-
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
-        if (vmin) {
-            gen_helper_vfp_minnums(dest, frn, frm, fpst);
-        } else {
-            gen_helper_vfp_maxnums(dest, frn, frm, fpst);
-        }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i32(frn);
-        tcg_temp_free_i32(frm);
-        tcg_temp_free_i32(dest);
-    }
-
-    tcg_temp_free_ptr(fpst);
-    return true;
-}
-
-/*
- * Table for converting the most common AArch32 encoding of
- * rounding mode to arm_fprounding order (which matches the
- * common AArch64 order); see ARM ARM pseudocode FPDecodeRM().
- */
-static const uint8_t fp_decode_rm[] = {
-    FPROUNDING_TIEAWAY,
-    FPROUNDING_TIEEVEN,
-    FPROUNDING_POSINF,
-    FPROUNDING_NEGINF,
-};
-
-static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
-{
-    uint32_t rd, rm;
-    bool dp = a->dp;
-    TCGv_ptr fpst;
-    TCGv_i32 tcg_rmode;
-    int rounding = fp_decode_rm[a->rm];
-
-    if (!dc_isar_feature(aa32_vrint, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
-        ((a->vm | a->vd) & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    fpst = get_fpstatus_ptr(0);
-
-    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-
-    if (dp) {
-        TCGv_i64 tcg_op;
-        TCGv_i64 tcg_res;
-        tcg_op = tcg_temp_new_i64();
-        tcg_res = tcg_temp_new_i64();
-        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
-        gen_helper_rintd(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i64(tcg_op);
-        tcg_temp_free_i64(tcg_res);
-    } else {
-        TCGv_i32 tcg_op;
-        TCGv_i32 tcg_res;
-        tcg_op = tcg_temp_new_i32();
-        tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
-        gen_helper_rints(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
-        tcg_temp_free_i32(tcg_op);
-        tcg_temp_free_i32(tcg_res);
-    }
-
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    tcg_temp_free_i32(tcg_rmode);
-
-    tcg_temp_free_ptr(fpst);
-    return true;
-}
-
-static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-{
-    uint32_t rd, rm;
-    bool dp = a->dp;
-    TCGv_ptr fpst;
-    TCGv_i32 tcg_rmode, tcg_shift;
-    int rounding = fp_decode_rm[a->rm];
-    bool is_signed = a->op;
-
-    if (!dc_isar_feature(aa32_vcvt_dr, s)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
-        return false;
-    }
-    rd = a->vd;
-    rm = a->vm;
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    fpst = get_fpstatus_ptr(0);
-
-    tcg_shift = tcg_const_i32(0);
-
-    tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-
-    if (dp) {
-        TCGv_i64 tcg_double, tcg_res;
-        TCGv_i32 tcg_tmp;
-        tcg_double = tcg_temp_new_i64();
-        tcg_res = tcg_temp_new_i64();
-        tcg_tmp = tcg_temp_new_i32();
-        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
-        if (is_signed) {
-            gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
-        } else {
-            gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
-        }
-        tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
-        tcg_temp_free_i32(tcg_tmp);
-        tcg_temp_free_i64(tcg_res);
-        tcg_temp_free_i64(tcg_double);
-    } else {
-        TCGv_i32 tcg_single, tcg_res;
-        tcg_single = tcg_temp_new_i32();
-        tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
-        if (is_signed) {
-            gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
-        } else {
-            gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
-        }
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
-        tcg_temp_free_i32(tcg_res);
-        tcg_temp_free_i32(tcg_single);
-    }
-
-    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    tcg_temp_free_i32(tcg_rmode);
-
-    tcg_temp_free_i32(tcg_shift);
-
-    tcg_temp_free_ptr(fpst);
-
-    return true;
-}
-
 /*
  * Disassemble a VFP instruction.  Returns nonzero if an error occurred
  * (ie. an undefined instruction).
-- 
2.20.1

The current VFP code has two different idioms for
loading and storing from the VFP register file:
 1 using the gen_mov_F0_vreg() and similar functions,
   which load and store to a fixed set of TCG globals
   cpu_F0s, CPU_F0d, etc
 2 by direct calls to tcg_gen_ld_f64() and friends

We want to phase out idiom 1 (because the use of the
fixed globals is a relic of a much older version of TCG),
but idiom 2 is quite longwinded:
 tcg_gen_ld_f64(tmp, cpu_env, vfp_reg_offset(true, reg))
requires us to specify the 64-bitness twice, once in
the function name and once by passing 'true' to
vfp_reg_offset(). There's no guard against accidentally
passing the wrong flag.

Instead, let's move to a convention of accessing 64-bit
registers via the existing neon_load_reg64() and
neon_store_reg64(), and provide new neon_load_reg32()
and neon_store_reg32() for the 32-bit equivalents.

Implement the new functions and use them in the code in
translate-vfp.inc.c. We will convert the rest of the VFP
code as we do the decodetree conversion in subsequent
commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 40 +++++++++++++++++-----------------
 target/arm/translate.c         | 10 +++++++++
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         tcg_gen_ext_i32_i64(nf, cpu_NF);
         tcg_gen_ext_i32_i64(vf, cpu_VF);
 
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg64(frn, rn);
+        neon_load_reg64(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i64(tmp);
             break;
         }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         frn = tcg_temp_new_i32();
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg32(frn, rn);
+        neon_load_reg32(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i32(tmp);
             break;
         }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
         frm = tcg_temp_new_i64();
         dest = tcg_temp_new_i64();
 
-        tcg_gen_ld_f64(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f64(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg64(frn, rn);
+        neon_load_reg64(frm, rm);
         if (vmin) {
             gen_helper_vfp_minnumd(dest, frn, frm, fpst);
         } else {
             gen_helper_vfp_maxnumd(dest, frn, frm, fpst);
         }
-        tcg_gen_st_f64(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
 
-        tcg_gen_ld_f32(frn, cpu_env, vfp_reg_offset(dp, rn));
-        tcg_gen_ld_f32(frm, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg32(frn, rn);
+        neon_load_reg32(frm, rm);
         if (vmin) {
             gen_helper_vfp_minnums(dest, frn, frm, fpst);
         } else {
             gen_helper_vfp_maxnums(dest, frn, frm, fpst);
         }
-        tcg_gen_st_f32(dest, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i64 tcg_res;
         tcg_op = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
-        tcg_gen_ld_f64(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg64(tcg_op, rm);
         gen_helper_rintd(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f64(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg64(tcg_res, rd);
         tcg_temp_free_i64(tcg_op);
         tcg_temp_free_i64(tcg_res);
     } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i32 tcg_res;
         tcg_op = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_op, cpu_env, vfp_reg_offset(dp, rm));
+        neon_load_reg32(tcg_op, rm);
         gen_helper_rints(tcg_res, tcg_op, fpst);
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(dp, rd));
+        neon_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_op);
         tcg_temp_free_i32(tcg_res);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
-        tcg_gen_ld_f64(tcg_double, cpu_env, vfp_reg_offset(1, rm));
+        neon_load_reg64(tcg_double, rm);
         if (is_signed) {
             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
         } else {
             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
         }
         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        tcg_gen_st_f32(tcg_tmp, cpu_env, vfp_reg_offset(0, rd));
+        neon_store_reg32(tcg_tmp, rd);
         tcg_temp_free_i32(tcg_tmp);
         tcg_temp_free_i64(tcg_res);
         tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         TCGv_i32 tcg_single, tcg_res;
         tcg_single = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        tcg_gen_ld_f32(tcg_single, cpu_env, vfp_reg_offset(0, rm));
+        neon_load_reg32(tcg_single, rm);
         if (is_signed) {
             gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
         } else {
             gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
         }
-        tcg_gen_st_f32(tcg_res, cpu_env, vfp_reg_offset(0, rd));
+        neon_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_res);
         tcg_temp_free_i32(tcg_single);
     }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 }
 
+static inline void neon_load_reg32(TCGv_i32 var, int reg)
+{
+    tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
+}
+
+static inline void neon_store_reg32(TCGv_i32 var, int reg)
+{
+    tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
-- 
2.20.1

Convert the "double-precision" register moves to decodetree:
this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP.

Note that the conversion process has tightened up a few of the
UNDEF encoding checks: we now correctly forbid:
 * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10
 * VMOV-from-gpr with opc1:opc2 == 0x10
 * VDUP with B:E == 11
 * VDUP with Q == 1 and Vn<0> == 1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
The accesses of elements < 32 bits could be improved by doing
direct ld/st of the right size rather than 32-bit read-and-shift
or read-modify-write, but we leave this for later cleanup,
since this series is generally trying to stick to fixing
the decode.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 147 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  83 +------------------
 target/arm/vfp.decode          |  36 ++++++++
 3 files changed, 185 insertions(+), 81 deletions(-)

Convert the "single-precision" register moves to decodetree:
 * VMSR
 * VMRS
 * VMOV between general purpose register and single precision

Note that the VMSR/VMRS conversions make our handling of
the "should this UNDEF?" checks consistent between the two
instructions:
 * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0
   (previously was a nop)
 * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better
   (previously was a nop)
 * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better
   (previously would write to the register, which had no
   guest-visible effect because we always UNDEF reads)

We also tighten up the decode: we were previously underdecoding
some SBZ or SBO bits.

The conversion of VMOV_single includes the expansion out of the
gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr()
sequences into the simpler direct load/store of the TCG temp via
neon_{load,store}_reg32(): we know in the new function that we're
always single-precision, we don't need to use the old-and-deprecated
cpu_F0* TCG globals, and we don't happen to have the declaration of
gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the
new function is.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 161 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 148 +-----------------------------
 target/arm/vfp.decode          |   4 +
 3 files changed, 168 insertions(+), 145 deletions(-)

Convert the VFP two-register transfer instructions to decodetree
(in the v8 Arm ARM these are the "Advanced SIMD and floating-point
64-bit move" encoding group).

Again, we expand out the sequences involving gen_vfp_msr() and
gen_msr_vfp().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 70 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 46 +---------------------
 target/arm/vfp.decode          |  5 +++
 3 files changed, 77 insertions(+), 44 deletions(-)

Convert the VFP single load/store insns VLDR and VSTR to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 73 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 22 +---------
 target/arm/vfp.decode          |  7 ++++
 3 files changed, 82 insertions(+), 20 deletions(-)

Convert the VFP load/store multiple insns to decodetree.
This includes tightening up the UNDEF checking for pre-VFPv3
CPUs which only have D0-D15 : they now UNDEF for any access
to D16-D31, not merely when the smallest register in the
transfer list is in D16-D31.

This conversion does not try to share code between the single
precision and the double precision versions; this looks a bit
duplicative of code, but it leaves the door open for a future
refactoring which gets rid of the use of the "F0" registers
by inlining the various functions like gen_vfp_ld() and
gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
conditionalisation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 162 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  97 +-------------------
 target/arm/vfp.decode          |  18 ++++
 3 files changed, 183 insertions(+), 94 deletions(-)

Expand out the sequences in the new decoder VLDR/VSTR/VLDM/VSTM trans
functions which perform the memory accesses by going via the TCG
globals cpu_F0s and cpu_F0d, to use local TCG temps instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 46 +++++++++++++++++++++-------------
 target/arm/translate.c         | 18 -------------
 2 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_sp *a)
 static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 {
     uint32_t offset;
-    TCGv_i32 addr;
+    TCGv_i32 addr, tmp;
 
     if (!vfp_access_check(s)) {
         return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
         addr = load_reg(s, a->rn);
     }
     tcg_gen_addi_i32(addr, addr, offset);
+    tmp = tcg_temp_new_i32();
     if (a->l) {
-        gen_vfp_ld(s, false, addr);
-        gen_mov_vreg_F0(false, a->vd);
+        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
+        neon_store_reg32(tmp, a->vd);
     } else {
-        gen_mov_F0_vreg(false, a->vd);
-        gen_vfp_st(s, false, addr);
+        neon_load_reg32(tmp, a->vd);
+        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(addr);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 {
     uint32_t offset;
     TCGv_i32 addr;
+    TCGv_i64 tmp;
 
     /* UNDEF accesses to D16-D31 if they don't exist */
     if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
         addr = load_reg(s, a->rn);
     }
     tcg_gen_addi_i32(addr, addr, offset);
+    tmp = tcg_temp_new_i64();
     if (a->l) {
-        gen_vfp_ld(s, true, addr);
-        gen_mov_vreg_F0(true, a->vd);
+        gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
+        neon_store_reg64(tmp, a->vd);
     } else {
-        gen_mov_F0_vreg(true, a->vd);
-        gen_vfp_st(s, true, addr);
+        neon_load_reg64(tmp, a->vd);
+        gen_aa32_st64(s, tmp, addr, get_mem_index(s));
     }
+    tcg_temp_free_i64(tmp);
     tcg_temp_free_i32(addr);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_sp *a)
 static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
 {
     uint32_t offset;
-    TCGv_i32 addr;
+    TCGv_i32 addr, tmp;
     int i, n;
 
     n = a->imm;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
     }
 
     offset = 4;
+    tmp = tcg_temp_new_i32();
     for (i = 0; i < n; i++) {
         if (a->l) {
             /* load */
-            gen_vfp_ld(s, false, addr);
-            gen_mov_vreg_F0(false, a->vd + i);
+            gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
+            neon_store_reg32(tmp, a->vd + i);
         } else {
             /* store */
-            gen_mov_F0_vreg(false, a->vd + i);
-            gen_vfp_st(s, false, addr);
+            neon_load_reg32(tmp, a->vd + i);
+            gen_aa32_st32(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
     }
+    tcg_temp_free_i32(tmp);
     if (a->w) {
         /* writeback */
         if (a->p) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
 {
     uint32_t offset;
     TCGv_i32 addr;
+    TCGv_i64 tmp;
     int i, n;
 
     n = a->imm >> 1;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
     }
 
     offset = 8;
+    tmp = tcg_temp_new_i64();
     for (i = 0; i < n; i++) {
         if (a->l) {
             /* load */
-            gen_vfp_ld(s, true, addr);
-            gen_mov_vreg_F0(true, a->vd + i);
+            gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
+            neon_store_reg64(tmp, a->vd + i);
         } else {
             /* store */
-            gen_mov_F0_vreg(true, a->vd + i);
-            gen_vfp_st(s, true, addr);
+            neon_load_reg64(tmp, a->vd + i);
+            gen_aa32_st64(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
     }
+    tcg_temp_free_i64(tmp);
     if (a->w) {
         /* writeback */
         if (a->p) {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ VFP_GEN_FIX(uhto, )
 VFP_GEN_FIX(ulto, )
 #undef VFP_GEN_FIX
 
-static inline void gen_vfp_ld(DisasContext *s, int dp, TCGv_i32 addr)
-{
-    if (dp) {
-        gen_aa32_ld64(s, cpu_F0d, addr, get_mem_index(s));
-    } else {
-        gen_aa32_ld32u(s, cpu_F0s, addr, get_mem_index(s));
-    }
-}
-
-static inline void gen_vfp_st(DisasContext *s, int dp, TCGv_i32 addr)
-{
-    if (dp) {
-        gen_aa32_st64(s, cpu_F0d, addr, get_mem_index(s));
-    } else {
-        gen_aa32_st32(s, cpu_F0s, addr, get_mem_index(s));
-    }
-}
-
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
-- 
2.20.1

Convert the VFP VMLA instruction to decodetree.

This is the first of the VFP 3-operand data processing instructions,
so we include in this patch the code which loops over the elements
for an old-style VFP vector operation. The existing code to do this
looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
we are going to be converting instructions one at a time anyway
we can take the opportunity to make the new loop use TCG temporaries,
which means we can do that conversion one operation at a time
rather than needing to do it all in one go.

We include an UNDEF check which was missing in the old code:
short-vector operations (with stride or length non-zero) were
deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
field does not indicate that support for short vectors is present
we UNDEF the operations that would use them. (This is a change
of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
previously were all incorrectly allowing short-vector operations.)

Note that the conversion fixes a bug in the old code for the
case of VFP short-vector "mixed scalar/vector operations". These
happen where the destination register is in a vector bank but
but the second operand is in a scalar bank. For example
  vmla.f64 d10, d1, d16   with length 2 stride 2
is equivalent to the pair of scalar operations
  vmla.f64 d10, d1, d16
  vmla.f64 d8, d3, d16
where the destination and first input register cycle through
their vector but the second input is scalar (d16). In the
old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
as a temporary output for the multiply, which trashes the
second input operand. For the fully-scalar case (where we
never do a second iteration) and the fully-vector case
(where the loop loads the new second input operand) this
doesn't matter, but for the mixed scalar/vector case we
will end up using the wrong value for later loop iterations.
In the new code we use TCG temporaries and so avoid the bug.
This bug is present for all the multiply-accumulate insns
that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.

Note 2: the expression used to calculate the next register
number in the vector bank is not in fact correct; we leave
this behaviour unchanged from the old decoder and will
fix this bug later in the series.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               |   5 +
 target/arm/translate-vfp.inc.c | 205 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  14 ++-
 target/arm/vfp.decode          |   6 +
 4 files changed, 224 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
     return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
 }
 
+static inline bool isar_feature_aa32_fpshvec(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->mvfr0, MVFR0, FPSHVEC) > 0;
+}
+
 /*
  * We always set the FP and SIMD FP16 fields to indicate identical
  * levels of support (assuming SIMD is implemented at all), so
diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
 
     return true;
 }
+
+/*
+ * Types for callbacks for do_vfp_3op_sp() and do_vfp_3op_dp().
+ * The callback should emit code to write a value to vd. If
+ * do_vfp_3op_{sp,dp}() was passed reads_vd then the TCGv vd
+ * will contain the old value of the relevant VFP register;
+ * otherwise it must be written to only.
+ */
+typedef void VFPGen3OpSPFn(TCGv_i32 vd,
+                           TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst);
+typedef void VFPGen3OpDPFn(TCGv_i64 vd,
+                           TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
+
+/*
+ * Perform a 3-operand VFP data processing instruction. fn is the
+ * callback to do the actual operation; this function deals with the
+ * code to handle looping around for VFP vector processing.
+ */
+static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
+                          int vd, int vn, int vm, bool reads_vd)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i32 f0, f1, fd;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0x18;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = s->vec_stride + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i32();
+    f1 = tcg_temp_new_i32();
+    fd = tcg_temp_new_i32();
+    fpst = get_fpstatus_ptr(0);
+
+    neon_load_reg32(f0, vn);
+    neon_load_reg32(f1, vm);
+
+    for (;;) {
+        if (reads_vd) {
+            neon_load_reg32(fd, vd);
+        }
+        fn(fd, f0, f1, fpst);
+        neon_store_reg32(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        neon_load_reg32(f0, vn);
+        if (delta_m) {
+            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            neon_load_reg32(f1, vm);
+        }
+    }
+
+    tcg_temp_free_i32(f0);
+    tcg_temp_free_i32(f1);
+    tcg_temp_free_i32(fd);
+    tcg_temp_free_ptr(fpst);
+
+    return true;
+}
+
+static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
+                          int vd, int vn, int vm, bool reads_vd)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i64 f0, f1, fd;
+    TCGv_ptr fpst;
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vn | vm) & 0x10)) {
+        return false;
+    }
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0xc;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = (s->vec_stride >> 1) + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i64();
+    f1 = tcg_temp_new_i64();
+    fd = tcg_temp_new_i64();
+    fpst = get_fpstatus_ptr(0);
+
+    neon_load_reg64(f0, vn);
+    neon_load_reg64(f1, vm);
+
+    for (;;) {
+        if (reads_vd) {
+            neon_load_reg64(fd, vd);
+        }
+        fn(fd, f0, f1, fpst);
+        neon_store_reg64(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        neon_load_reg64(f0, vn);
+        if (delta_m) {
+            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            neon_load_reg64(f1, vm);
+        }
+    }
+
+    tcg_temp_free_i64(f0);
+    tcg_temp_free_i64(f1);
+    tcg_temp_free_i64(fd);
+    tcg_temp_free_ptr(fpst);
+
+    return true;
+}
+
+static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* Note that order of inputs to the add matters for NaNs */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VMLA_sp(DisasContext *s, arg_VMLA_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VMLA_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /* Note that order of inputs to the add matters for NaNs */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
             rn = VFP_SREG_N(insn);
 
+            switch (op) {
+            case 0:
+                /* Already handled by decodetree */
+                return 1;
+            default:
+                break;
+            }
+
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 0: /* VMLA: fd + (fn * fm) */
-                    /* Note that order of inputs to the add matters for NaNs */
-                    gen_vfp_F1_mul(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_add(dp);
-                    break;
                 case 1: /* VMLS: fd + -(fn * fm) */
                     gen_vfp_mul(dp);
                     gen_vfp_F1_neg(dp);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VLDM_VSTM_sp ---- 1101 0.1 l:1 rn:4 .... 1010 imm:8 \
              vd=%vd_sp p=1 u=0 w=1
 VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
              vd=%vd_dp p=1 u=0 w=1
+
+# 3-register VFP data-processing; bits [23,21:20,6] identify the operation.
+VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP VMLS instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 38 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  8 +------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /*
+     * VMLS: vd = vd + -(vn * vm)
+     * Note that order of inputs to the add matters for NaNs.
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_negs(tmp, tmp);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VMLS_sp(DisasContext *s, arg_VMLS_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VMLS_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /*
+     * VMLS: vd = vd + -(vn * vm)
+     * Note that order of inputs to the add matters for NaNs.
+     */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_negd(tmp, tmp);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0:
+            case 0 ... 1:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 1: /* VMLS: fd + -(fn * fm) */
-                    gen_vfp_mul(dp);
-                    gen_vfp_F1_neg(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_add(dp);
-                    break;
                 case 2: /* VNMLS: -fd + (fn * fm) */
                     /* Note that it isn't valid to replace (-A + B) with (B - A)
                      * or similar plausible looking simplifications
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMLA_sp      ---- 1110 0.00 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMLA_dp      ---- 1110 0.00 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP VNMLS instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 42 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 24 +------------------
 target/arm/vfp.decode          |  5 ++++
 3 files changed, 48 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VNMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /*
+     * VNMLS: -fd + (fn * fm)
+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
+     * plausible looking simplifications because this will give wrong results
+     * for NaNs.
+     */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_negs(vd, vd);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VNMLS_sp(DisasContext *s, arg_VNMLS_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VNMLS_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VNMLS_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /*
+     * VNMLS: -fd + (fn * fm)
+     * Note that it isn't valid to replace (-A + B) with (B - A) or similar
+     * plausible looking simplifications because this will give wrong results
+     * for NaNs.
+     */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_negd(vd, vd);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
 
 #undef VFP_OP2
 
-static inline void gen_vfp_F1_mul(int dp)
-{
-    /* Like gen_vfp_mul() but put result in F1 */
-    TCGv_ptr fpst = get_fpstatus_ptr(0);
-    if (dp) {
-        gen_helper_vfp_muld(cpu_F1d, cpu_F0d, cpu_F1d, fpst);
-    } else {
-        gen_helper_vfp_muls(cpu_F1s, cpu_F0s, cpu_F1s, fpst);
-    }
-    tcg_temp_free_ptr(fpst);
-}
-
 static inline void gen_vfp_F1_neg(int dp)
 {
     /* Like gen_vfp_neg() but put result in F1 */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 1:
+            case 0 ... 2:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 2: /* VNMLS: -fd + (fn * fm) */
-                    /* Note that it isn't valid to replace (-A + B) with (B - A)
-                     * or similar plausible looking simplifications
-                     * because this will give wrong results for NaNs.
-                     */
-                    gen_vfp_F1_mul(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_neg(dp);
-                    gen_vfp_add(dp);
-                    break;
                 case 3: /* VNMLA: -fd + -(fn * fm) */
                     gen_vfp_mul(dp);
                     gen_vfp_F1_neg(dp);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMLS_sp      ---- 1110 0.00 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMLS_dp      ---- 1110 0.00 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP VNMLA instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 34 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 19 +------------------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
 }
+
+static void gen_VNMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* VNMLA: -fd + -(fn * fm) */
+    TCGv_i32 tmp = tcg_temp_new_i32();
+
+    gen_helper_vfp_muls(tmp, vn, vm, fpst);
+    gen_helper_vfp_negs(tmp, tmp);
+    gen_helper_vfp_negs(vd, vd);
+    gen_helper_vfp_adds(vd, vd, tmp, fpst);
+    tcg_temp_free_i32(tmp);
+}
+
+static bool trans_VNMLA_sp(DisasContext *s, arg_VNMLA_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VNMLA_sp, a->vd, a->vn, a->vm, true);
+}
+
+static void gen_VNMLA_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /* VNMLA: -fd + (fn * fm) */
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    gen_helper_vfp_muld(tmp, vn, vm, fpst);
+    gen_helper_vfp_negd(tmp, tmp);
+    gen_helper_vfp_negd(vd, vd);
+    gen_helper_vfp_addd(vd, vd, tmp, fpst);
+    tcg_temp_free_i64(tmp);
+}
+
+static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ VFP_OP2(div)
 
 #undef VFP_OP2
 
-static inline void gen_vfp_F1_neg(int dp)
-{
-    /* Like gen_vfp_neg() but put result in F1 */
-    if (dp) {
-        gen_helper_vfp_negd(cpu_F1d, cpu_F0d);
-    } else {
-        gen_helper_vfp_negs(cpu_F1s, cpu_F0s);
-    }
-}
-
 static inline void gen_vfp_abs(int dp)
 {
     if (dp)
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 2:
+            case 0 ... 3:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 3: /* VNMLA: -fd + -(fn * fm) */
-                    gen_vfp_mul(dp);
-                    gen_vfp_F1_neg(dp);
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_neg(dp);
-                    gen_vfp_add(dp);
-                    break;
                 case 4: /* mul: fn * fm */
                     gen_vfp_mul(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNMLS_sp     ---- 1110 0.01 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMLS_dp     ---- 1110 0.01 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VMUL instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  5 +----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
 }
+
+static bool trans_VMUL_sp(DisasContext *s, arg_VMUL_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_muls, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 3:
+            case 0 ... 4:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 4: /* mul: fn * fm */
-                    gen_vfp_mul(dp);
-                    break;
                 case 5: /* nmul: -(fn * fm) */
                     gen_vfp_mul(dp);
                     gen_vfp_neg(dp);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNMLA_sp     ---- 1110 0.01 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMLA_dp     ---- 1110 0.01 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VNMUL instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 24 ++++++++++++++++++++++++
 target/arm/translate.c         |  7 +------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
 }
+
+static void gen_VNMUL_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
+{
+    /* VNMUL: -(fn * fm) */
+    gen_helper_vfp_muls(vd, vn, vm, fpst);
+    gen_helper_vfp_negs(vd, vd);
+}
+
+static bool trans_VNMUL_sp(DisasContext *s, arg_VNMUL_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_VNMUL_sp, a->vd, a->vn, a->vm, false);
+}
+
+static void gen_VNMUL_dp(TCGv_i64 vd, TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst)
+{
+    /* VNMUL: -(fn * fm) */
+    gen_helper_vfp_muld(vd, vn, vm, fpst);
+    gen_helper_vfp_negd(vd, vd);
+}
+
+static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
 
 VFP_OP2(add)
 VFP_OP2(sub)
-VFP_OP2(mul)
 VFP_OP2(div)
 
 #undef VFP_OP2
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 4:
+            case 0 ... 5:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 5: /* nmul: -(fn * fm) */
-                    gen_vfp_mul(dp);
-                    gen_vfp_neg(dp);
-                    break;
                 case 6: /* add: fn + fm */
                     gen_vfp_add(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMUL_sp      ---- 1110 0.10 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VMUL_dp      ---- 1110 0.10 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VADD instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  6 +-----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_sp *a)
 {
     return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VADD_sp(DisasContext *s, arg_VADD_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_adds, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
     tcg_temp_free_ptr(fpst);                                          \
 }
 
-VFP_OP2(add)
 VFP_OP2(sub)
 VFP_OP2(div)
 
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 5:
+            case 0 ... 6:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 6: /* add: fn + fm */
-                    gen_vfp_add(dp);
-                    break;
                 case 7: /* sub: fn - fm */
                     gen_vfp_sub(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNMUL_sp     ---- 1110 0.10 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VNMUL_dp     ---- 1110 0.10 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VSUB instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  6 +-----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VADD_dp(DisasContext *s, arg_VADD_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VSUB_sp(DisasContext *s, arg_VSUB_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_subs, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp)                             \
     tcg_temp_free_ptr(fpst);                                          \
 }
 
-VFP_OP2(sub)
 VFP_OP2(div)
 
 #undef VFP_OP2
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 6:
+            case 0 ... 7:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 7: /* sub: fn - fm */
-                    gen_vfp_sub(dp);
-                    break;
                 case 8: /* div: fn / fm */
                     gen_vfp_div(dp);
                     break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VADD_sp      ---- 1110 0.11 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VADD_dp      ---- 1110 0.11 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VDIV instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         | 21 +--------------------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VDIV_sp(DisasContext *s, arg_VDIV_sp *a)
+{
+    return do_vfp_3op_sp(s, gen_helper_vfp_divs, a->vd, a->vn, a->vm, false);
+}
+
+static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
+{
+    return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_fpstatus_ptr(int neon)
     return statusptr;
 }
 
-#define VFP_OP2(name)                                                 \
-static inline void gen_vfp_##name(int dp)                             \
-{                                                                     \
-    TCGv_ptr fpst = get_fpstatus_ptr(0);                              \
-    if (dp) {                                                         \
-        gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, cpu_F1d, fpst);    \
-    } else {                                                          \
-        gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, cpu_F1s, fpst);    \
-    }                                                                 \
-    tcg_temp_free_ptr(fpst);                                          \
-}
-
-VFP_OP2(div)
-
-#undef VFP_OP2
-
 static inline void gen_vfp_abs(int dp)
 {
     if (dp)
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 7:
+            case 0 ... 8:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 8: /* div: fn / fm */
-                    gen_vfp_div(dp);
-                    break;
                 case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
                 case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
                 case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VSUB_sp      ---- 1110 0.11 .... .... 1010 .1.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VSUB_dp      ---- 1110 0.11 .... .... 1011 .1.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp
+VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp
-- 
2.20.1

Convert the VFP fused multiply-add instructions (VFNMA, VFNMS,
VFMA, VFMS) to decodetree.

Note that in the old decode structure we were implementing
these to honour the VFP vector stride/length. These instructions
were introduced in VFPv4, and in the v7A architecture they
are UNPREDICTABLE if the vector stride or length are non-zero.
In v8A they must UNDEF if stride or length are non-zero, like
all VFP instructions; we choose to UNDEF always.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 121 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  53 +--------------
 target/arm/vfp.decode          |   9 +++
 3 files changed, 131 insertions(+), 52 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_sp *a)
 {
     return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
 }
+
+static bool trans_VFM_sp(DisasContext *s, arg_VFM_sp *a)
+{
+    /*
+     * VFNMA : fd = muladd(-fd,  fn, fm)
+     * VFNMS : fd = muladd(-fd, -fn, fm)
+     * VFMA  : fd = muladd( fd,  fn, fm)
+     * VFMS  : fd = muladd( fd, -fn, fm)
+     *
+     * These are fused multiply-add, and must be done as one floating
+     * point operation with no rounding between the multiplication and
+     * addition steps.  NB that doing the negations here as separate
+     * steps is correct : an input NaN should come out with its sign
+     * bit flipped if it is a negated-input.
+     */
+    TCGv_ptr fpst;
+    TCGv_i32 vn, vm, vd;
+
+    /*
+     * Present in VFPv4 only.
+     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
+     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
+     */
+    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
+        (s->vec_len != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vn = tcg_temp_new_i32();
+    vm = tcg_temp_new_i32();
+    vd = tcg_temp_new_i32();
+
+    neon_load_reg32(vn, a->vn);
+    neon_load_reg32(vm, a->vm);
+    if (a->o2) {
+        /* VFNMS, VFMS */
+        gen_helper_vfp_negs(vn, vn);
+    }
+    neon_load_reg32(vd, a->vd);
+    if (a->o1 & 1) {
+        /* VFNMA, VFNMS */
+        gen_helper_vfp_negs(vd, vd);
+    }
+    fpst = get_fpstatus_ptr(0);
+    gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
+    neon_store_reg32(vd, a->vd);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(vn);
+    tcg_temp_free_i32(vm);
+    tcg_temp_free_i32(vd);
+
+    return true;
+}
+
+static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
+{
+    /*
+     * VFNMA : fd = muladd(-fd,  fn, fm)
+     * VFNMS : fd = muladd(-fd, -fn, fm)
+     * VFMA  : fd = muladd( fd,  fn, fm)
+     * VFMS  : fd = muladd( fd, -fn, fm)
+     *
+     * These are fused multiply-add, and must be done as one floating
+     * point operation with no rounding between the multiplication and
+     * addition steps.  NB that doing the negations here as separate
+     * steps is correct : an input NaN should come out with its sign
+     * bit flipped if it is a negated-input.
+     */
+    TCGv_ptr fpst;
+    TCGv_i64 vn, vm, vd;
+
+    /*
+     * Present in VFPv4 only.
+     * In v7A, UNPREDICTABLE with non-zero vector length/stride; from
+     * v8A, must UNDEF. We choose to UNDEF for both v7A and v8A.
+     */
+    if (!arm_dc_feature(s, ARM_FEATURE_VFP4) ||
+        (s->vec_len != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vn = tcg_temp_new_i64();
+    vm = tcg_temp_new_i64();
+    vd = tcg_temp_new_i64();
+
+    neon_load_reg64(vn, a->vn);
+    neon_load_reg64(vm, a->vm);
+    if (a->o2) {
+        /* VFNMS, VFMS */
+        gen_helper_vfp_negd(vn, vn);
+    }
+    neon_load_reg64(vd, a->vd);
+    if (a->o1 & 1) {
+        /* VFNMA, VFNMS */
+        gen_helper_vfp_negd(vd, vd);
+    }
+    fpst = get_fpstatus_ptr(0);
+    gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
+    neon_store_reg64(vd, a->vd);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(vn);
+    tcg_temp_free_i64(vm);
+    tcg_temp_free_i64(vd);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             rn = VFP_SREG_N(insn);
 
             switch (op) {
-            case 0 ... 8:
+            case 0 ... 13:
                 /* Already handled by decodetree */
                 return 1;
             default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             for (;;) {
                 /* Perform the calculation.  */
                 switch (op) {
-                case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
-                case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
-                case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
-                case 13: /* VFMS  : fd = muladd( fd, -fn, fm) */
-                    /* These are fused multiply-add, and must be done as one
-                     * floating point operation with no rounding between the
-                     * multiplication and addition steps.
-                     * NB that doing the negations here as separate steps is
-                     * correct : an input NaN should come out with its sign bit
-                     * flipped if it is a negated-input.
-                     */
-                    if (!arm_dc_feature(s, ARM_FEATURE_VFP4)) {
-                        return 1;
-                    }
-                    if (dp) {
-                        TCGv_ptr fpst;
-                        TCGv_i64 frd;
-                        if (op & 1) {
-                            /* VFNMS, VFMS */
-                            gen_helper_vfp_negd(cpu_F0d, cpu_F0d);
-                        }
-                        frd = tcg_temp_new_i64();
-                        tcg_gen_ld_f64(frd, cpu_env, vfp_reg_offset(dp, rd));
-                        if (op & 2) {
-                            /* VFNMA, VFNMS */
-                            gen_helper_vfp_negd(frd, frd);
-                        }
-                        fpst = get_fpstatus_ptr(0);
-                        gen_helper_vfp_muladdd(cpu_F0d, cpu_F0d,
-                                               cpu_F1d, frd, fpst);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_temp_free_i64(frd);
-                    } else {
-                        TCGv_ptr fpst;
-                        TCGv_i32 frd;
-                        if (op & 1) {
-                            /* VFNMS, VFMS */
-                            gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
-                        }
-                        frd = tcg_temp_new_i32();
-                        tcg_gen_ld_f32(frd, cpu_env, vfp_reg_offset(dp, rd));
-                        if (op & 2) {
-                            gen_helper_vfp_negs(frd, frd);
-                        }
-                        fpst = get_fpstatus_ptr(0);
-                        gen_helper_vfp_muladds(cpu_F0s, cpu_F0s,
-                                               cpu_F1s, frd, fpst);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_temp_free_i32(frd);
-                    }
-                    break;
                 case 14: /* fconst */
                     if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
                         return 1;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VDIV_sp      ---- 1110 1.00 .... .... 1010 .0.0 .... \
              vm=%vm_sp vn=%vn_sp vd=%vd_sp
 VDIV_dp      ---- 1110 1.00 .... .... 1011 .0.0 .... \
              vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VFM_sp       ---- 1110 1.01 .... .... 1010 . o2:1 . 0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=1
+VFM_dp       ---- 1110 1.01 .... .... 1011 . o2:1 . 0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=1
+VFM_sp       ---- 1110 1.10 .... .... 1010 . o2:1 . 0 .... \
+             vm=%vm_sp vn=%vn_sp vd=%vd_sp o1=2
+VFM_dp       ---- 1110 1.10 .... .... 1011 . o2:1 . 0 .... \
+             vm=%vm_dp vn=%vn_dp vd=%vd_dp o1=2
-- 
2.20.1

Convert the VFP VMOV (immediate) instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 129 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  27 +------
 target/arm/vfp.decode          |   5 ++
 3 files changed, 136 insertions(+), 25 deletions(-)

Convert the VFP VABS instruction to decodetree.

Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or
VFPGen2OpDPFn because none of the operations which use this format
and support short vectors will need it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 167 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  12 ++-
 target/arm/vfp.decode          |   5 +
 3 files changed, 180 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpSPFn(TCGv_i32 vd,
 typedef void VFPGen3OpDPFn(TCGv_i64 vd,
                            TCGv_i64 vn, TCGv_i64 vm, TCGv_ptr fpst);
 
+/*
+ * Types for callbacks for do_vfp_2op_sp() and do_vfp_2op_dp().
+ * The callback should emit code to write a value to vd (which
+ * should be written to only).
+ */
+typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
+typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
+
 /*
  * Perform a 3-operand VFP data processing instruction. fn is the
  * callback to do the actual operation; this function deals with the
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     return true;
 }
 
+static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i32 f0, fd;
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0x18;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = s->vec_stride + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i32();
+    fd = tcg_temp_new_i32();
+
+    neon_load_reg32(f0, vm);
+
+    for (;;) {
+        fn(fd, f0);
+        neon_store_reg32(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+
+        if (delta_m == 0) {
+            /* single source one-many */
+            while (veclen--) {
+                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                neon_store_reg32(fd, vd);
+            }
+            break;
+        }
+
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        neon_load_reg32(f0, vm);
+    }
+
+    tcg_temp_free_i32(f0);
+    tcg_temp_free_i32(fd);
+
+    return true;
+}
+
+static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
+{
+    uint32_t delta_m = 0;
+    uint32_t delta_d = 0;
+    uint32_t bank_mask = 0;
+    int veclen = s->vec_len;
+    TCGv_i64 f0, fd;
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vm) & 0x10)) {
+        return false;
+    }
+
+    if (!dc_isar_feature(aa32_fpshvec, s) &&
+        (veclen != 0 || s->vec_stride != 0)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (veclen > 0) {
+        bank_mask = 0xc;
+
+        /* Figure out what type of vector operation this is.  */
+        if ((vd & bank_mask) == 0) {
+            /* scalar */
+            veclen = 0;
+        } else {
+            delta_d = (s->vec_stride >> 1) + 1;
+
+            if ((vm & bank_mask) == 0) {
+                /* mixed scalar/vector */
+                delta_m = 0;
+            } else {
+                /* vector */
+                delta_m = delta_d;
+            }
+        }
+    }
+
+    f0 = tcg_temp_new_i64();
+    fd = tcg_temp_new_i64();
+
+    neon_load_reg64(f0, vm);
+
+    for (;;) {
+        fn(fd, f0);
+        neon_store_reg64(fd, vd);
+
+        if (veclen == 0) {
+            break;
+        }
+
+        if (delta_m == 0) {
+            /* single source one-many */
+            while (veclen--) {
+                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                neon_store_reg64(fd, vd);
+            }
+            break;
+        }
+
+        /* Set up the operands for the next iteration */
+        veclen--;
+        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        neon_load_reg64(f0, vm);
+    }
+
+    tcg_temp_free_i64(f0);
+    tcg_temp_free_i64(fd);
+
+    return true;
+}
+
 static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
 {
     /* Note that order of inputs to the add matters for NaNs */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     tcg_temp_free_i64(fd);
     return true;
 }
+
+static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
+{
+    return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
+}
+
+static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
+{
+    return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             case 0 ... 14:
                 /* Already handled by decodetree */
                 return 1;
+            case 15:
+                switch (rn) {
+                case 1:
+                    /* Already handled by decodetree */
+                    return 1;
+                default:
+                    break;
+                }
             default:
                 break;
             }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
                 case 0x00: /* vmov */
-                case 0x01: /* vabs */
                 case 0x02: /* vneg */
                 case 0x03: /* vsqrt */
                     break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     case 0: /* cpy */
                         /* no-op */
                         break;
-                    case 1: /* abs */
-                        gen_vfp_abs(dp);
-                        break;
                     case 2: /* neg */
                         gen_vfp_neg(dp);
                         break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VMOV_imm_sp  ---- 1110 1.11 imm4h:4 .... 1010 0000 imm4l:4 \
              vd=%vd_sp
 VMOV_imm_dp  ---- 1110 1.11 imm4h:4 .... 1011 0000 imm4l:4 \
              vd=%vd_dp
+
+VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VNEG instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  6 +-----
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
 {
     return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
 }
+
+static bool trans_VNEG_sp(DisasContext *s, arg_VNEG_sp *a)
+{
+    return do_vfp_2op_sp(s, gen_helper_vfp_negs, a->vd, a->vm);
+}
+
+static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
+{
+    return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 1:
+                case 1 ... 2:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
                 case 0x00: /* vmov */
-                case 0x02: /* vneg */
                 case 0x03: /* vsqrt */
                     break;
 
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     case 0: /* cpy */
                         /* no-op */
                         break;
-                    case 2: /* neg */
-                        gen_vfp_neg(dp);
-                        break;
                     case 3: /* sqrt */
                         gen_vfp_sqrt(dp);
                         break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VABS_sp      ---- 1110 1.11 0000 .... 1010 11.0 .... \
              vd=%vd_sp vm=%vm_sp
 VABS_dp      ---- 1110 1.11 0000 .... 1011 11.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VSQRT instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 20 ++++++++++++++++++++
 target/arm/translate.c         | 14 +-------------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
 {
     return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
 }
+
+static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
+{
+    gen_helper_vfp_sqrts(vd, vm, cpu_env);
+}
+
+static bool trans_VSQRT_sp(DisasContext *s, arg_VSQRT_sp *a)
+{
+    return do_vfp_2op_sp(s, gen_VSQRT_sp, a->vd, a->vm);
+}
+
+static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
+{
+    gen_helper_vfp_sqrtd(vd, vm, cpu_env);
+}
+
+static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
+{
+    return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
         gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
 }
 
-static inline void gen_vfp_sqrt(int dp)
-{
-    if (dp)
-        gen_helper_vfp_sqrtd(cpu_F0d, cpu_F0d, cpu_env);
-    else
-        gen_helper_vfp_sqrts(cpu_F0s, cpu_F0s, cpu_env);
-}
-
 static inline void gen_vfp_cmp(int dp)
 {
     if (dp)
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 1 ... 2:
+                case 1 ... 3:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
                 case 0x00: /* vmov */
-                case 0x03: /* vsqrt */
                     break;
 
                 case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     case 0: /* cpy */
                         /* no-op */
                         break;
-                    case 3: /* sqrt */
-                        gen_vfp_sqrt(dp);
-                        break;
                     case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(false);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VNEG_sp      ---- 1110 1.11 0001 .... 1010 01.0 .... \
              vd=%vd_sp vm=%vm_sp
 VNEG_dp      ---- 1110 1.11 0001 .... 1011 01.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 10 ++++++++++
 target/arm/translate.c         |  8 +-------
 target/arm/vfp.decode          |  5 +++++
 3 files changed, 16 insertions(+), 7 deletions(-)

Convert the VFP comparison instructions to decodetree.

Note that comparison instructions should not honour the VFP
short-vector length and stride information: they are scalar-only
operations.  This applies to all the 2-operand instructions except
for VMOV, VABS, VNEG and VSQRT.  (In the old decoder this is
implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 75 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 51 +----------------------
 target/arm/vfp.decode          |  5 +++
 3 files changed, 81 insertions(+), 50 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
 {
     return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
 }
+
+static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
+{
+    TCGv_i32 vd, vm;
+
+    /* Vm/M bits must be zero for the Z variant */
+    if (a->z && a->vm != 0) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vd = tcg_temp_new_i32();
+    vm = tcg_temp_new_i32();
+
+    neon_load_reg32(vd, a->vd);
+    if (a->z) {
+        tcg_gen_movi_i32(vm, 0);
+    } else {
+        neon_load_reg32(vm, a->vm);
+    }
+
+    if (a->e) {
+        gen_helper_vfp_cmpes(vd, vm, cpu_env);
+    } else {
+        gen_helper_vfp_cmps(vd, vm, cpu_env);
+    }
+
+    tcg_temp_free_i32(vd);
+    tcg_temp_free_i32(vm);
+
+    return true;
+}
+
+static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
+{
+    TCGv_i64 vd, vm;
+
+    /* Vm/M bits must be zero for the Z variant */
+    if (a->z && a->vm != 0) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vd = tcg_temp_new_i64();
+    vm = tcg_temp_new_i64();
+
+    neon_load_reg64(vd, a->vd);
+    if (a->z) {
+        tcg_gen_movi_i64(vm, 0);
+    } else {
+        neon_load_reg64(vm, a->vm);
+    }
+
+    if (a->e) {
+        gen_helper_vfp_cmped(vd, vm, cpu_env);
+    } else {
+        gen_helper_vfp_cmpd(vd, vm, cpu_env);
+    }
+
+    tcg_temp_free_i64(vd);
+    tcg_temp_free_i64(vm);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_neg(int dp)
         gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
 }
 
-static inline void gen_vfp_cmp(int dp)
-{
-    if (dp)
-        gen_helper_vfp_cmpd(cpu_F0d, cpu_F1d, cpu_env);
-    else
-        gen_helper_vfp_cmps(cpu_F0s, cpu_F1s, cpu_env);
-}
-
-static inline void gen_vfp_cmpe(int dp)
-{
-    if (dp)
-        gen_helper_vfp_cmped(cpu_F0d, cpu_F1d, cpu_env);
-    else
-        gen_helper_vfp_cmpes(cpu_F0s, cpu_F1s, cpu_env);
-}
-
-static inline void gen_vfp_F1_ld0(int dp)
-{
-    if (dp)
-        tcg_gen_movi_i64(cpu_F1d, 0);
-    else
-        tcg_gen_movi_i32(cpu_F1s, 0);
-}
-
 #define VFP_GEN_ITOF(name) \
 static inline void gen_vfp_##name(int dp, int neon) \
 { \
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             case 15:
                 switch (rn) {
                 case 0 ... 3:
+                case 8 ... 11:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     rd_is_dp = false;
                     break;
 
-                case 0x08: case 0x0a: /* vcmp, vcmpz */
-                case 0x09: case 0x0b: /* vcmpe, vcmpez */
-                    no_output = true;
-                    break;
-
                 case 0x0c: /* vrintr */
                 case 0x0d: /* vrintz */
                 case 0x0e: /* vrintx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             /* Load the initial operands.  */
             if (op == 15) {
                 switch (rn) {
-                case 0x08: case 0x09: /* Compare */
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_mov_F1_vreg(dp, rm);
-                    break;
-                case 0x0a: case 0x0b: /* Compare with zero */
-                    gen_mov_F0_vreg(dp, rd);
-                    gen_vfp_F1_ld0(dp);
-                    break;
                 case 0x14: /* vcvt fp <-> fixed */
                 case 0x15:
                 case 0x16:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                         gen_vfp_msr(tmp);
                         break;
                     }
-                    case 8: /* cmp */
-                        gen_vfp_cmp(dp);
-                        break;
-                    case 9: /* cmpe */
-                        gen_vfp_cmpe(dp);
-                        break;
-                    case 10: /* cmpz */
-                        gen_vfp_cmp(dp);
-                        break;
-                    case 11: /* cmpez */
-                        gen_vfp_F1_ld0(dp);
-                        gen_vfp_cmpe(dp);
-                        break;
                     case 12: /* vrintr */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(0);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VSQRT_sp     ---- 1110 1.11 0001 .... 1010 11.0 .... \
              vd=%vd_sp vm=%vm_sp
 VSQRT_dp     ---- 1110 1.11 0001 .... 1011 11.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VCVTT, VCVTB instructions that deal with conversion
from half-precision floats to f32 or 64 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
load of the right half of the input single-precision register
rather than loading the full 32 bits and then doing a
separate shift or sign-extension.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 82 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 56 +----------------------
 target/arm/vfp.decode          |  6 +++
 3 files changed, 89 insertions(+), 55 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@
 #include "decode-vfp.inc.c"
 #include "decode-vfp-uncond.inc.c"
 
+/*
+ * Return the offset of a 16-bit half of the specified VFP single-precision
+ * register. If top is true, returns the top 16 bits; otherwise the bottom
+ * 16 bits.
+ */
+static inline long vfp_f16_offset(unsigned reg, bool top)
+{
+    long offs = vfp_reg_offset(false, reg);
+#ifdef HOST_WORDS_BIGENDIAN
+    if (!top) {
+        offs += 2;
+    }
+#else
+    if (top) {
+        offs += 2;
+    }
+#endif
+    return offs;
+}
+
 /*
  * Check that VFP access is enabled. If it is, do the necessary
  * M-profile lazy-FP handling and then return true.
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
 
     return true;
 }
+
+static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+    /* The T bit tells us if we want the low or high 16 bits of Vm */
+    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
+    gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+    TCGv_i64 vd;
+
+    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd  & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+    /* The T bit tells us if we want the low or high 16 bits of Vm */
+    tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
+    vd = tcg_temp_new_i64();
+    gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
+    neon_store_reg64(vd, a->vd);
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i64(vd);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 3:
+                case 0 ... 5:
                 case 8 ... 11:
                     /* Already handled by decodetree */
                     return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
-                case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
-                case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
-                    /*
-                     * VCVTB, VCVTT: only present with the halfprec extension
-                     * UNPREDICTABLE if bit 8 is set prior to ARMv8
-                     * (we choose to UNDEF)
-                     */
-                    if (dp) {
-                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
-                            return 1;
-                        }
-                    } else {
-                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
-                            return 1;
-                        }
-                    }
-                    rm_is_dp = false;
-                    break;
                 case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
                 case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
                     if (dp) {
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 4: /* vcvtb.f32.f16, vcvtb.f64.f16 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp_mode = get_ahp_flag();
-                        tmp = gen_vfp_mrs();
-                        tcg_gen_ext16u_i32(tmp, tmp);
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
-                                                           fpst, ahp_mode);
-                        } else {
-                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
-                                                           fpst, ahp_mode);
-                        }
-                        tcg_temp_free_i32(ahp_mode);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_temp_free_i32(tmp);
-                        break;
-                    }
-                    case 5: /* vcvtt.f32.f16, vcvtt.f64.f16 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp = get_ahp_flag();
-                        tmp = gen_vfp_mrs();
-                        tcg_gen_shri_i32(tmp, tmp, 16);
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f16_to_f64(cpu_F0d, tmp,
-                                                           fpst, ahp);
-                        } else {
-                            gen_helper_vfp_fcvt_f16_to_f32(cpu_F0s, tmp,
-                                                           fpst, ahp);
-                        }
-                        tcg_temp_free_i32(tmp);
-                        tcg_temp_free_i32(ahp);
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
                     case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(false);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCMP_sp      ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCMP_dp      ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
              vd=%vd_dp vm=%vm_dp
+
+# VCVTT and VCVTB from f16: Vd format depends on size bit; Vm is always vm_sp
+VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
+             vd=%vd_dp vm=%vm_sp
-- 
2.20.1

Convert the VCVTT and VCVTB instructions which convert from
f32 and f64 to f16 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
store of the right half of the input single-precision register
rather than doing a load/modify/store sequence on the full
32 bits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 62 ++++++++++++++++++++++++++
 target/arm/translate.c         | 79 +---------------------------------
 target/arm/vfp.decode          |  6 +++
 3 files changed, 69 insertions(+), 78 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     tcg_temp_free_i64(vd);
     return true;
 }
+
+static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_fp16_spconv, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+
+    neon_load_reg32(tmp, a->vm);
+    gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 ahp_mode;
+    TCGv_i32 tmp;
+    TCGv_i64 vm;
+
+    if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm  & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    ahp_mode = get_ahp_flag();
+    tmp = tcg_temp_new_i32();
+    vm = tcg_temp_new_i64();
+
+    neon_load_reg64(vm, a->vm);
+    gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
+    tcg_temp_free_i64(vm);
+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+    tcg_temp_free_i32(ahp_mode);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
 #define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
 #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 
-/* Move between integer and VFP cores.  */
-static TCGv_i32 gen_vfp_mrs(void)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_mov_i32(tmp, cpu_F0s);
-    return tmp;
-}
-
-static void gen_vfp_msr(TCGv_i32 tmp)
-{
-    tcg_gen_mov_i32(cpu_F0s, tmp);
-    tcg_temp_free_i32(tmp);
-}
-
 static void gen_neon_dup_low16(TCGv_i32 var)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
     uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
     int dp, veclen;
-    TCGv_i32 tmp;
-    TCGv_i32 tmp2;
 
     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 5:
-                case 8 ... 11:
+                case 0 ... 11:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
-                case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
-                case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
-                    if (dp) {
-                        if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
-                            return 1;
-                        }
-                    } else {
-                        if (!dc_isar_feature(aa32_fp16_spconv, s)) {
-                            return 1;
-                        }
-                    }
-                    rd_is_dp = false;
-                    break;
-
                 case 0x0c: /* vrintr */
                 case 0x0d: /* vrintz */
                 case 0x0e: /* vrintx */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 6: /* vcvtb.f16.f32, vcvtb.f16.f64 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp = get_ahp_flag();
-                        tmp = tcg_temp_new_i32();
-
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
-                                                           fpst, ahp);
-                        } else {
-                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
-                                                           fpst, ahp);
-                        }
-                        tcg_temp_free_i32(ahp);
-                        tcg_temp_free_ptr(fpst);
-                        gen_mov_F0_vreg(0, rd);
-                        tmp2 = gen_vfp_mrs();
-                        tcg_gen_andi_i32(tmp2, tmp2, 0xffff0000);
-                        tcg_gen_or_i32(tmp, tmp, tmp2);
-                        tcg_temp_free_i32(tmp2);
-                        gen_vfp_msr(tmp);
-                        break;
-                    }
-                    case 7: /* vcvtt.f16.f32, vcvtt.f16.f64 */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(false);
-                        TCGv_i32 ahp = get_ahp_flag();
-                        tmp = tcg_temp_new_i32();
-                        if (dp) {
-                            gen_helper_vfp_fcvt_f64_to_f16(tmp, cpu_F0d,
-                                                           fpst, ahp);
-                        } else {
-                            gen_helper_vfp_fcvt_f32_to_f16(tmp, cpu_F0s,
-                                                           fpst, ahp);
-                        }
-                        tcg_temp_free_i32(ahp);
-                        tcg_temp_free_ptr(fpst);
-                        tcg_gen_shli_i32(tmp, tmp, 16);
-                        gen_mov_F0_vreg(0, rd);
-                        tmp2 = gen_vfp_mrs();
-                        tcg_gen_ext16u_i32(tmp2, tmp2);
-                        tcg_gen_or_i32(tmp, tmp, tmp2);
-                        tcg_temp_free_i32(tmp2);
-                        gen_vfp_msr(tmp);
-                        break;
-                    }
                     case 12: /* vrintr */
                     {
                         TCGv_ptr fpst = get_fpstatus_ptr(0);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f32_f16 ---- 1110 1.11 0010 .... 1010 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
              vd=%vd_dp vm=%vm_sp
+
+# VCVTB and VCVTT to f16: Vd format is always vd_sp; Vm format depends on size bit
+VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_dp
-- 
2.20.1

Convert the VFP round-to-integer instructions VRINTR, VRINTZ and
VRINTX to decodetree.

These instructions were only introduced as part of the "VFP misc"
additions in v8A, so we check this. The old decoder's implementation
was incorrectly providing them even for v7A CPUs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 163 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  45 +--------
 target/arm/vfp.decode          |  15 +++
 3 files changed, 179 insertions(+), 44 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     tcg_temp_free_i32(tmp);
     return true;
 }
+
+static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rints(tmp, tmp, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i64();
+    neon_load_reg64(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rintd(tmp, tmp, fpst);
+    neon_store_reg64(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(tmp);
+    return true;
+}
+
+static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+    TCGv_i32 tcg_rmode;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    tcg_rmode = tcg_const_i32(float_round_to_zero);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    gen_helper_rints(tmp, tmp, fpst);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_rmode);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+    TCGv_i32 tcg_rmode;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i64();
+    neon_load_reg64(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    tcg_rmode = tcg_const_i32(float_round_to_zero);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    gen_helper_rintd(tmp, tmp, fpst);
+    gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+    neon_store_reg64(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i32(tcg_rmode);
+    return true;
+}
+
+static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    neon_load_reg32(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rints_exact(tmp, tmp, fpst);
+    neon_store_reg32(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
+static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+
+    if (!dc_isar_feature(aa32_vrint, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i64();
+    neon_load_reg64(tmp, a->vm);
+    fpst = get_fpstatus_ptr(false);
+    gen_helper_rintd_exact(tmp, tmp, fpst);
+    neon_store_reg64(tmp, a->vd);
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i64(tmp);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 11:
+                case 0 ... 14:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             if (op == 15) {
                 /* rn is opcode, encoded as per VFP_SREG_N. */
                 switch (rn) {
-                case 0x0c: /* vrintr */
-                case 0x0d: /* vrintz */
-                case 0x0e: /* vrintx */
-                    break;
-
                 case 0x0f: /* vcvt double<->single */
                     rd_is_dp = !dp;
                     break;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 12: /* vrintr */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(0);
-                        if (dp) {
-                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
-                        } else {
-                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
-                        }
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
-                    case 13: /* vrintz */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(0);
-                        TCGv_i32 tcg_rmode;
-                        tcg_rmode = tcg_const_i32(float_round_to_zero);
-                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-                        if (dp) {
-                            gen_helper_rintd(cpu_F0d, cpu_F0d, fpst);
-                        } else {
-                            gen_helper_rints(cpu_F0s, cpu_F0s, fpst);
-                        }
-                        gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-                        tcg_temp_free_i32(tcg_rmode);
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
-                    case 14: /* vrintx */
-                    {
-                        TCGv_ptr fpst = get_fpstatus_ptr(0);
-                        if (dp) {
-                            gen_helper_rintd_exact(cpu_F0d, cpu_F0d, fpst);
-                        } else {
-                            gen_helper_rints_exact(cpu_F0s, cpu_F0s, fpst);
-                        }
-                        tcg_temp_free_ptr(fpst);
-                        break;
-                    }
                     case 15: /* single<->double conversion */
                         if (dp) {
                             gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_dp
+
+VRINTR_sp    ---- 1110 1.11 0110 .... 1010 01.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VRINTR_dp    ---- 1110 1.11 0110 .... 1011 01.0 .... \
+             vd=%vd_dp vm=%vm_dp
+
+VRINTZ_sp    ---- 1110 1.11 0110 .... 1010 11.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VRINTZ_dp    ---- 1110 1.11 0110 .... 1011 11.0 .... \
+             vd=%vd_dp vm=%vm_dp
+
+VRINTX_sp    ---- 1110 1.11 0111 .... 1010 01.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VRINTX_dp    ---- 1110 1.11 0111 .... 1011 01.0 .... \
+             vd=%vd_dp vm=%vm_dp
-- 
2.20.1

Convert the VCVT double/single precision conversion insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 48 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 13 +--------
 target/arm/vfp.decode          |  6 +++++
 3 files changed, 55 insertions(+), 12 deletions(-)

Convert the VCVT integer-to-float instructions to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 58 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 12 +------
 target/arm/vfp.decode          |  6 ++++
 3 files changed, 65 insertions(+), 11 deletions(-)

Convert the VJCVT instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 28 ++++++++++++++++++++++++++++
 target/arm/translate.c         | 12 +-----------
 target/arm/vfp.decode          |  4 ++++
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
+{
+    TCGv_i32 vd;
+    TCGv_i64 vm;
+
+    if (!dc_isar_feature(aa32_jscvt, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    vm = tcg_temp_new_i64();
+    vd = tcg_temp_new_i32();
+    neon_load_reg64(vm, a->vm);
+    gen_helper_vjcvt(vd, vm, cpu_env);
+    neon_store_reg32(vd, a->vd);
+    tcg_temp_free_i64(vm);
+    tcg_temp_free_i32(vd);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 return 1;
             case 15:
                 switch (rn) {
-                case 0 ... 17:
+                case 0 ... 19:
                     /* Already handled by decodetree */
                     return 1;
                 default:
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     rm_is_dp = false;
                     break;
 
-                case 0x13: /* vjcvt */
-                    if (!dp || !dc_isar_feature(aa32_jscvt, s)) {
-                        return 1;
-                    }
-                    rd_is_dp = false;
-                    break;
-
                 default:
                     return 1;
                 }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                 switch (op) {
                 case 15: /* extension space */
                     switch (rn) {
-                    case 19: /* vjcvt */
-                        gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
-                        break;
                     case 20: /* fshto */
                         gen_vfp_shto(dp, 16 - rm, 0);
                         break;
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_int_sp  ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_int_dp  ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
              vd=%vd_dp vm=%vm_sp
+
+# VJCVT is always dp to sp
+VJCVT        ---- 1110 1.11 1001 .... 1011 11.0 .... \
+             vd=%vd_sp vm=%vm_dp
-- 
2.20.1

Convert the VCVT (between floating-point and fixed-point) instructions
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 124 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |  57 +--------------
 target/arm/vfp.decode          |  10 +++
 3 files changed, 136 insertions(+), 55 deletions(-)

Convert the float-to-integer VCVT instructions to decodetree.
Since these are the last unconverted instructions, we can
delete the old decoder structure entirely now.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c |  72 ++++++++++
 target/arm/translate.c         | 241 +--------------------------------
 target/arm/vfp.decode          |   6 +
 3 files changed, 80 insertions(+), 239 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
+{
+    TCGv_i32 vm;
+    TCGv_ptr fpst;
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    vm = tcg_temp_new_i32();
+    neon_load_reg32(vm, a->vm);
+
+    if (a->s) {
+        if (a->rz) {
+            gen_helper_vfp_tosizs(vm, vm, fpst);
+        } else {
+            gen_helper_vfp_tosis(vm, vm, fpst);
+        }
+    } else {
+        if (a->rz) {
+            gen_helper_vfp_touizs(vm, vm, fpst);
+        } else {
+            gen_helper_vfp_touis(vm, vm, fpst);
+        }
+    }
+    neon_store_reg32(vm, a->vd);
+    tcg_temp_free_i32(vm);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
+static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
+{
+    TCGv_i32 vd;
+    TCGv_i64 vm;
+    TCGv_ptr fpst;
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = get_fpstatus_ptr(false);
+    vm = tcg_temp_new_i64();
+    vd = tcg_temp_new_i32();
+    neon_load_reg64(vm, a->vm);
+
+    if (a->s) {
+        if (a->rz) {
+            gen_helper_vfp_tosizd(vd, vm, fpst);
+        } else {
+            gen_helper_vfp_tosid(vd, vm, fpst);
+        }
+    } else {
+        if (a->rz) {
+            gen_helper_vfp_touizd(vd, vm, fpst);
+        } else {
+            gen_helper_vfp_touid(vd, vm, fpst);
+        }
+    }
+    neon_store_reg32(vd, a->vd);
+    tcg_temp_free_i32(vd);
+    tcg_temp_free_i64(vm);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_vfp_##name(int dp, int neon) \
     tcg_temp_free_ptr(statusptr); \
 }
 
-VFP_GEN_FTOI(toui)
 VFP_GEN_FTOI(touiz)
-VFP_GEN_FTOI(tosi)
 VFP_GEN_FTOI(tosiz)
 #undef VFP_GEN_FTOI
 
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 }
 
 #define tcg_gen_ld_f32 tcg_gen_ld_i32
-#define tcg_gen_ld_f64 tcg_gen_ld_i64
 #define tcg_gen_st_f32 tcg_gen_st_i32
-#define tcg_gen_st_f64 tcg_gen_st_i64
-
-static inline void gen_mov_F0_vreg(int dp, int reg)
-{
-    if (dp)
-        tcg_gen_ld_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
-    else
-        tcg_gen_ld_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
-}
-
-static inline void gen_mov_F1_vreg(int dp, int reg)
-{
-    if (dp)
-        tcg_gen_ld_f64(cpu_F1d, cpu_env, vfp_reg_offset(dp, reg));
-    else
-        tcg_gen_ld_f32(cpu_F1s, cpu_env, vfp_reg_offset(dp, reg));
-}
-
-static inline void gen_mov_vreg_F0(int dp, int reg)
-{
-    if (dp)
-        tcg_gen_st_f64(cpu_F0d, cpu_env, vfp_reg_offset(dp, reg));
-    else
-        tcg_gen_st_f32(cpu_F0s, cpu_env, vfp_reg_offset(dp, reg));
-}
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
  */
 static int disas_vfp_insn(DisasContext *s, uint32_t insn)
 {
-    uint32_t rd, rn, rm, op, delta_d, delta_m, bank_mask;
-    int dp, veclen;
-
     if (!arm_dc_feature(s, ARM_FEATURE_VFP)) {
         return 1;
     }
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             return 0;
         }
     }
-
-    if (extract32(insn, 28, 4) == 0xf) {
-        /*
-         * Encodings with T=1 (Thumb) or unconditional (ARM): these
-         * were all handled by the decodetree decoder, so any insn
-         * patterns which get here must be UNDEF.
-         */
-        return 1;
-    }
-
-    /*
-     * FIXME: this access check should not take precedence over UNDEF
-     * for invalid encodings; we will generate incorrect syndrome information
-     * for attempts to execute invalid vfp/neon encodings with FP disabled.
-     */
-    if (!vfp_access_check(s)) {
-        return 0;
-    }
-
-    dp = ((insn & 0xf00) == 0xb00);
-    switch ((insn >> 24) & 0xf) {
-    case 0xe:
-        if (insn & (1 << 4)) {
-            /* already handled by decodetree */
-            return 1;
-        } else {
-            /* data processing */
-            bool rd_is_dp = dp;
-            bool rm_is_dp = dp;
-            bool no_output = false;
-
-            /* The opcode is in bits 23, 21, 20 and 6.  */
-            op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
-            rn = VFP_SREG_N(insn);
-
-            switch (op) {
-            case 0 ... 14:
-                /* Already handled by decodetree */
-                return 1;
-            case 15:
-                switch (rn) {
-                case 0 ... 23:
-                case 28 ... 31:
-                    /* Already handled by decodetree */
-                    return 1;
-                default:
-                    break;
-                }
-            default:
-                break;
-            }
-
-            if (op == 15) {
-                /* rn is opcode, encoded as per VFP_SREG_N. */
-                switch (rn) {
-                case 0x18: /* vcvtr.u32.fxx */
-                case 0x19: /* vcvtz.u32.fxx */
-                case 0x1a: /* vcvtr.s32.fxx */
-                case 0x1b: /* vcvtz.s32.fxx */
-                    rd_is_dp = false;
-                    break;
-
-                default:
-                    return 1;
-                }
-            } else if (dp) {
-                /* rn is register number */
-                VFP_DREG_N(rn, insn);
-            }
-
-            if (rd_is_dp) {
-                VFP_DREG_D(rd, insn);
-            } else {
-                rd = VFP_SREG_D(insn);
-            }
-            if (rm_is_dp) {
-                VFP_DREG_M(rm, insn);
-            } else {
-                rm = VFP_SREG_M(insn);
-            }
-
-            veclen = s->vec_len;
-            if (op == 15 && rn > 3) {
-                veclen = 0;
-            }
-
-            /* Shut up compiler warnings.  */
-            delta_m = 0;
-            delta_d = 0;
-            bank_mask = 0;
-
-            if (veclen > 0) {
-                if (dp)
-                    bank_mask = 0xc;
-                else
-                    bank_mask = 0x18;
-
-                /* Figure out what type of vector operation this is.  */
-                if ((rd & bank_mask) == 0) {
-                    /* scalar */
-                    veclen = 0;
-                } else {
-                    if (dp)
-                        delta_d = (s->vec_stride >> 1) + 1;
-                    else
-                        delta_d = s->vec_stride + 1;
-
-                    if ((rm & bank_mask) == 0) {
-                        /* mixed scalar/vector */
-                        delta_m = 0;
-                    } else {
-                        /* vector */
-                        delta_m = delta_d;
-                    }
-                }
-            }
-
-            /* Load the initial operands.  */
-            if (op == 15) {
-                switch (rn) {
-                default:
-                    /* One source operand.  */
-                    gen_mov_F0_vreg(rm_is_dp, rm);
-                    break;
-                }
-            } else {
-                /* Two source operands.  */
-                gen_mov_F0_vreg(dp, rn);
-                gen_mov_F1_vreg(dp, rm);
-            }
-
-            for (;;) {
-                /* Perform the calculation.  */
-                switch (op) {
-                case 15: /* extension space */
-                    switch (rn) {
-                    case 24: /* ftoui */
-                        gen_vfp_toui(dp, 0);
-                        break;
-                    case 25: /* ftouiz */
-                        gen_vfp_touiz(dp, 0);
-                        break;
-                    case 26: /* ftosi */
-                        gen_vfp_tosi(dp, 0);
-                        break;
-                    case 27: /* ftosiz */
-                        gen_vfp_tosiz(dp, 0);
-                        break;
-                    default: /* undefined */
-                        g_assert_not_reached();
-                    }
-                    break;
-                default: /* undefined */
-                    return 1;
-                }
-
-                /* Write back the result, if any.  */
-                if (!no_output) {
-                    gen_mov_vreg_F0(rd_is_dp, rd);
-                }
-
-                /* break out of the loop if we have finished  */
-                if (veclen == 0) {
-                    break;
-                }
-
-                if (op == 15 && delta_m == 0) {
-                    /* single source one-many */
-                    while (veclen--) {
-                        rd = ((rd + delta_d) & (bank_mask - 1))
-                             | (rd & bank_mask);
-                        gen_mov_vreg_F0(dp, rd);
-                    }
-                    break;
-                }
-                /* Setup the next operands.  */
-                veclen--;
-                rd = ((rd + delta_d) & (bank_mask - 1))
-                     | (rd & bank_mask);
-
-                if (op == 15) {
-                    /* One source operand.  */
-                    rm = ((rm + delta_m) & (bank_mask - 1))
-                         | (rm & bank_mask);
-                    gen_mov_F0_vreg(dp, rm);
-                } else {
-                    /* Two source operands.  */
-                    rn = ((rn + delta_d) & (bank_mask - 1))
-                         | (rn & bank_mask);
-                    gen_mov_F0_vreg(dp, rn);
-                    if (delta_m) {
-                        rm = ((rm + delta_m) & (bank_mask - 1))
-                             | (rm & bank_mask);
-                        gen_mov_F1_vreg(dp, rm);
-                    }
-                }
-            }
-        }
-        break;
-    case 0xc:
-    case 0xd:
-        /* Already handled by decodetree */
-        return 1;
-    default:
-        /* Should never happen.  */
-        return 1;
-    }
-    return 0;
+    /* If the decodetree decoder didn't handle this insn, it must be UNDEF */
+    return 1;
 }
 
 static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_fix_sp  ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
              vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
 VCVT_fix_dp  ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
              vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
+
+# VCVT float to integer (VCVT and VCVTR): Vd always single; Vd depends on size
+VCVT_sp_int  ---- 1110 1.11 110 s:1 .... 1010 rz:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
+VCVT_dp_int  ---- 1110 1.11 110 s:1 .... 1011 rz:1 1.0 .... \
+             vd=%vd_sp vm=%vm_dp
-- 
2.20.1

For VFP short vectors, the VFP registers are divided into a
series of banks: for single-precision these are s0-s7, s8-s15,
s16-s23 and s24-s31; for double-precision they are d0-d3,
d4-d7, ... d28-d31. Some banks are "scalar" meaning that
use of a register within them triggers a pure-scalar or
mixed vector-scalar operation rather than a full vector
operation. The scalar banks are s0-s7, d0-d3 and d16-d19.
When using a bank as part of a vector operation, we
iterate through it, increasing the register number by
the specified stride each time, and wrapping around to
the beginning of the bank.

Unfortunately our calculation of the "increment" part of this
was incorrect:
 vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask)
will only do the intended thing if bank_mask has exactly
one set high bit. For instance for doubles (bank_mask = 0xc),
if we start with vd = 6 and delta_d = 2 then vd is updated
to 12 rather than the intended 4.

This only causes problems in the unlikely case that the
starting register is not the first in its bank: if the
register number doesn't have to wrap around then the
expression happens to give the right answer.

Fix this bug by abstracting out the "check whether register
is in a scalar bank" and "advance register within bank"
operations to utility functions which use the right
bit masking operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++-------------
 1 file changed, 60 insertions(+), 40 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ typedef void VFPGen3OpDPFn(TCGv_i64 vd,
 typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
 typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
 
+/*
+ * Return true if the specified S reg is in a scalar bank
+ * (ie if it is s0..s7)
+ */
+static inline bool vfp_sreg_is_scalar(int reg)
+{
+    return (reg & 0x18) == 0;
+}
+
+/*
+ * Return true if the specified D reg is in a scalar bank
+ * (ie if it is d0..d3 or d16..d19)
+ */
+static inline bool vfp_dreg_is_scalar(int reg)
+{
+    return (reg & 0xc) == 0;
+}
+
+/*
+ * Advance the S reg number forwards by delta within its bank
+ * (ie increment the low 3 bits but leave the rest the same)
+ */
+static inline int vfp_advance_sreg(int reg, int delta)
+{
+    return ((reg + delta) & 0x7) | (reg & ~0x7);
+}
+
+/*
+ * Advance the D reg number forwards by delta within its bank
+ * (ie increment the low 2 bits but leave the rest the same)
+ */
+static inline int vfp_advance_dreg(int reg, int delta)
+{
+    return ((reg + delta) & 0x3) | (reg & ~0x3);
+}
+
 /*
  * Perform a 3-operand VFP data processing instruction. fn is the
  * callback to do the actual operation; this function deals with the
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 f0, f1, fd;
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = s->vec_stride + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_sreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
+        vn = vfp_advance_sreg(vn, delta_d);
         neon_load_reg32(f0, vn);
         if (delta_m) {
-            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            vm = vfp_advance_sreg(vm, delta_m);
             neon_load_reg32(f1, vm);
         }
     }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 f0, f1, fd;
     TCGv_ptr fpst;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = (s->vec_stride >> 1) + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_dreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
         }
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        vd = vfp_advance_dreg(vd, delta_d);
+        vn = vfp_advance_dreg(vn, delta_d);
         neon_load_reg64(f0, vn);
         if (delta_m) {
-            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            vm = vfp_advance_dreg(vm, delta_m);
             neon_load_reg64(f1, vm);
         }
     }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 f0, fd;
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = s->vec_stride + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_sreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
         if (delta_m == 0) {
             /* single source one-many */
             while (veclen--) {
-                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                vd = vfp_advance_sreg(vd, delta_d);
                 neon_store_reg32(fd, vd);
             }
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
+        vm = vfp_advance_sreg(vm, delta_m);
         neon_load_reg32(f0, vm);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 f0, fd;
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = (s->vec_stride >> 1) + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_dreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
         if (delta_m == 0) {
             /* single source one-many */
             while (veclen--) {
-                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                vd = vfp_advance_dreg(vd, delta_d);
                 neon_store_reg64(fd, vd);
             }
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        vd = vfp_advance_dreg(vd, delta_d);
+        vd = vfp_advance_dreg(vm, delta_m);
         neon_load_reg64(f0, vm);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
 static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 {
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 fd;
     uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
     }
 
     tcg_temp_free_i32(fd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
 {
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 fd;
     uint32_t n, i, vd;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vfp_advance_dreg(vd, delta_d);
     }
 
     tcg_temp_free_i64(fd);
-- 
2.20.1

Most of this is the Neon decodetree patches, followed by Edgar's versal cleanups.

thanks
-- PMM

The following changes since commit 2ef486e76d64436be90f7359a3071fb2a56ce835:

Merge remote-tracking branch 'remotes/marcel/tags/rdma-pull-request' into staging (2020-05-03 14:12:56 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200504

for you to fetch changes up to 9aefc6cf9b73f66062d2f914a0136756e7a28211:

target/arm: Move gen_ function typedefs to translate.h (2020-05-04 12:59:26 +0100)

----------------------------------------------------------------
target-arm queue:
 * Start of conversion of Neon insns to decodetree
 * versal board: support SD and RTC
 * Implement ARMv8.2-TTS2UXN
 * Make VQDMULL undefined when U=1
 * Some minor code cleanups

----------------------------------------------------------------
Edgar E. Iglesias (11):
      hw/arm: versal: Remove inclusion of arm_gicv3_common.h
      hw/arm: versal: Move misplaced comment
      hw/arm: versal-virt: Fix typo xlnx-ve -> xlnx-versal
      hw/arm: versal: Embed the UARTs into the SoC type
      hw/arm: versal: Embed the GEMs into the SoC type
      hw/arm: versal: Embed the ADMAs into the SoC type
      hw/arm: versal: Embed the APUs into the SoC type
      hw/arm: versal: Add support for SD
      hw/arm: versal: Add support for the RTC
      hw/arm: versal-virt: Add support for SD
      hw/arm: versal-virt: Add support for the RTC

Fredrik Strupe (1):
      target/arm: Make VQDMULL undefined when U=1

Peter Maydell (25):
      target/arm: Don't use a TLB for ARMMMUIdx_Stage2
      target/arm: Use enum constant in get_phys_addr_lpae() call
      target/arm: Add new 's1_is_el0' argument to get_phys_addr_lpae()
      target/arm: Implement ARMv8.2-TTS2UXN
      target/arm: Use correct variable for setting 'max' cpu's ID_AA64DFR0
      target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
      target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
      target/arm: Add stubs for AArch32 Neon decodetree
      target/arm: Convert VCMLA (vector) to decodetree
      target/arm: Convert VCADD (vector) to decodetree
      target/arm: Convert V[US]DOT (vector) to decodetree
      target/arm: Convert VFM[AS]L (vector) to decodetree
      target/arm: Convert VCMLA (scalar) to decodetree
      target/arm: Convert V[US]DOT (scalar) to decodetree
      target/arm: Convert VFM[AS]L (scalar) to decodetree
      target/arm: Convert Neon load/store multiple structures to decodetree
      target/arm: Convert Neon 'load single structure to all lanes' to decodetree
      target/arm: Convert Neon 'load/store single structure' to decodetree
      target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
      target/arm: Convert Neon 3-reg-same logic ops to decodetree
      target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
      target/arm: Convert Neon 3-reg-same comparisons to decodetree
      target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
      target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
      target/arm: Move gen_ function typedefs to translate.h

Philippe Mathieu-Daudé (2):
      hw/arm/mps2-tz: Use TYPE_IOTKIT instead of hardcoded string
      target/arm: Use uint64_t for midr field in CPU state struct

include/hw/arm/xlnx-versal.h    |  31 +-
 target/arm/cpu-param.h          |   2 +-
 target/arm/cpu.h                |  38 ++-
 target/arm/translate-a64.h      |   9 -
 target/arm/translate.h          |  26 ++
 target/arm/neon-dp.decode       |  86 +++++
 target/arm/neon-ls.decode       |  52 +++
 target/arm/neon-shared.decode   |  66 ++++
 hw/arm/mps2-tz.c                |   2 +-
 hw/arm/xlnx-versal-virt.c       |  74 ++++-
 hw/arm/xlnx-versal.c            | 115 +++++--
 target/arm/cpu.c                |   3 +-
 target/arm/cpu64.c              |   8 +-
 target/arm/helper.c             | 183 ++++------
 target/arm/translate-a64.c      |  17 -
 target/arm/translate-neon.inc.c | 714 +++++++++++++++++++++++++++++++++++++++
 target/arm/translate-vfp.inc.c  |   6 -
 target/arm/translate.c          | 716 +++-------------------------------------
 target/arm/Makefile.objs        |  18 +
 19 files changed, 1302 insertions(+), 864 deletions(-)
 create mode 100644 target/arm/neon-dp.decode
 create mode 100644 target/arm/neon-ls.decode
 create mode 100644 target/arm/neon-shared.decode
 create mode 100644 target/arm/translate-neon.inc.c

From: Fredrik Strupe <fredrik@strupe.net>

According to Arm ARM, VQDMULL is only valid when U=0, while having
U=1 is unallocated.

Signed-off-by: Fredrik Strupe <fredrik@strupe.net>
Fixes: 695272dcb976 ("target-arm: Handle UNDEF cases for Neon 3-regs-different-widths")
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 0}, /* VMLSL */
                     {0, 0, 0, 9}, /* VQDMLSL */
                     {0, 0, 0, 0}, /* Integer VMULL */
-                    {0, 0, 0, 1}, /* VQDMULL */
+                    {0, 0, 0, 9}, /* VQDMULL */
                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

By using the TYPE_* definitions for devices, we can:
 - quickly find where devices are used with 'git-grep'
 - easily rename a device (one-line change).

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200428154650.21991-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/mps2-tz.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
         exit(EXIT_FAILURE);
     }
 
-    sysbus_init_child_obj(OBJECT(machine), "iotkit", &mms->iotkit,
+    sysbus_init_child_obj(OBJECT(machine), TYPE_IOTKIT, &mms->iotkit,
                           sizeof(mms->iotkit), mmc->armsse_type);
     iotkitdev = DEVICE(&mms->iotkit);
     object_property_set_link(OBJECT(&mms->iotkit), OBJECT(system_memory),
-- 
2.20.1

We define ARMMMUIdx_Stage2 as being an MMU index which uses a QEMU
TLB.  However we never actually use the TLB -- all stage 2 lookups
are done by direct calls to get_phys_addr_lpae() followed by a
physical address load via address_space_ld*().

Remove Stage2 from the list of ARM MMU indexes which correspond to
real core MMU indexes, and instead put it in the set of "NOTLB" ARM
MMU indexes.

This allows us to drop NB_MMU_MODES to 11.  It also means we can
safely add support for the ARMv8.3-TTS2UXN extension, which adds
permission bits to the stage 2 descriptors which define execute
permission separatel for EL0 and EL1; supporting that while keeping
Stage2 in a QEMU TLB would require us to use separate TLBs for
"Stage2 for an EL0 access" and "Stage2 for an EL1 access", which is a
lot of extra complication given we aren't even using the QEMU TLB.

In the process of updating the comment on our MMU index use,
fix a couple of other minor errors:
 * NS EL2 EL2&0 was missing from the list in the comment
 * some text hadn't been updated from when we bumped NB_MMU_MODES
   above 8

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-2-peter.maydell@linaro.org
---
 target/arm/cpu-param.h |   2 +-
 target/arm/cpu.h       |  21 +++++---
 target/arm/helper.c    | 112 ++++-------------------------------------
 3 files changed, 27 insertions(+), 108 deletions(-)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -XXX,XX +XXX,XX @@
 # define TARGET_PAGE_BITS_MIN  10
 #endif
 
-#define NB_MMU_MODES 12
+#define NB_MMU_MODES 11
 
 #endif
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  *     handling via the TLB. The only way to do a stage 1 translation without
  *     the immediate stage 2 translation is via the ATS or AT system insns,
  *     which can be slow-pathed and always do a page table walk.
+ *     The only use of stage 2 translations is either as part of an s1+2
+ *     lookup or when loading the descriptors during a stage 1 page table walk,
+ *     and in both those cases we don't use the TLB.
  *  4. we can also safely fold together the "32 bit EL3" and "64 bit EL3"
  *     translation regimes, because they map reasonably well to each other
  *     and they can't both be active at the same time.
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  * NS EL1 EL1&0 stage 1+2 (aka NS PL1)
  * NS EL1 EL1&0 stage 1+2 +PAN
  * NS EL0 EL2&0
+ * NS EL2 EL2&0
  * NS EL2 EL2&0 +PAN
  * NS EL2 (aka NS PL2)
  * S EL0 EL1&0 (aka S PL0)
  * S EL1 EL1&0 (not used if EL3 is 32 bit)
  * S EL1 EL1&0 +PAN
  * S EL3 (aka S PL1)
- * NS EL1&0 stage 2
  *
- * for a total of 12 different mmu_idx.
+ * for a total of 11 different mmu_idx.
  *
  * R profile CPUs have an MPU, but can use the same set of MMU indexes
  * as A profile. They only need to distinguish NS EL0 and NS EL1 (and
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  * are not quite the same -- different CPU types (most notably M profile
  * vs A/R profile) would like to use MMU indexes with different semantics,
  * but since we don't ever need to use all of those in a single CPU we
- * can avoid setting NB_MMU_MODES to more than 8. The lower bits of
+ * can avoid having to set NB_MMU_MODES to "total number of A profile MMU
+ * modes + total number of M profile MMU modes". The lower bits of
  * ARMMMUIdx are the core TLB mmu index, and the higher bits are always
  * the same for any particular CPU.
  * Variables of type ARMMUIdx are always full values, and the core
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
     ARMMMUIdx_SE10_1_PAN = 9 | ARM_MMU_IDX_A,
     ARMMMUIdx_SE3        = 10 | ARM_MMU_IDX_A,
 
-    ARMMMUIdx_Stage2     = 11 | ARM_MMU_IDX_A,
-
     /*
      * These are not allocated TLBs and are used only for AT system
      * instructions or for the first stage of an S12 page table walk.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
     ARMMMUIdx_Stage1_E0 = 0 | ARM_MMU_IDX_NOTLB,
     ARMMMUIdx_Stage1_E1 = 1 | ARM_MMU_IDX_NOTLB,
     ARMMMUIdx_Stage1_E1_PAN = 2 | ARM_MMU_IDX_NOTLB,
+    /*
+     * Not allocated a TLB: used only for second stage of an S12 page
+     * table walk, or for descriptor loads during first stage of an S1
+     * page table walk. Note that if we ever want to have a TLB for this
+     * then various TLB flush insns which currently are no-ops or flush
+     * only stage 1 MMU indexes will need to change to flush stage 2.
+     */
+    ARMMMUIdx_Stage2     = 3 | ARM_MMU_IDX_NOTLB,
 
     /*
      * M-profile.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdxBit {
     TO_CORE_BIT(SE10_1),
     TO_CORE_BIT(SE10_1_PAN),
     TO_CORE_BIT(SE3),
-    TO_CORE_BIT(Stage2),
 
     TO_CORE_BIT(MUser),
     TO_CORE_BIT(MPriv),
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri,
     tlb_flush_by_mmuidx(cs,
                         ARMMMUIdxBit_E10_1 |
                         ARMMMUIdxBit_E10_1_PAN |
-                        ARMMMUIdxBit_E10_0 |
-                        ARMMMUIdxBit_Stage2);
+                        ARMMMUIdxBit_E10_0);
 }
 
 static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
     tlb_flush_by_mmuidx_all_cpus_synced(cs,
                                         ARMMMUIdxBit_E10_1 |
                                         ARMMMUIdxBit_E10_1_PAN |
-                                        ARMMMUIdxBit_E10_0 |
-                                        ARMMMUIdxBit_Stage2);
+                                        ARMMMUIdxBit_E10_0);
 }
 
-static void tlbiipas2_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                            uint64_t value)
-{
-    /* Invalidate by IPA. This has to invalidate any structures that
-     * contain only stage 2 translation information, but does not need
-     * to apply to structures that contain combined stage 1 and stage 2
-     * translation information.
-     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
-     */
-    CPUState *cs = env_cpu(env);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 40);
-
-    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
-}
-
-static void tlbiipas2_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                               uint64_t value)
-{
-    CPUState *cs = env_cpu(env);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 40);
-
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
-                                             ARMMMUIdxBit_Stage2);
-}
 
 static void tlbiall_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri,
                               uint64_t value)
@@ -XXX,XX +XXX,XX @@ static void vttbr_write(CPUARMState *env, const ARMCPRegInfo *ri,
         tlb_flush_by_mmuidx(cs,
                             ARMMMUIdxBit_E10_1 |
                             ARMMMUIdxBit_E10_1_PAN |
-                            ARMMMUIdxBit_E10_0 |
-                            ARMMMUIdxBit_Stage2);
+                            ARMMMUIdxBit_E10_0);
         raw_write(env, ri, value);
     }
 }
@@ -XXX,XX +XXX,XX @@ static int alle1_tlbmask(CPUARMState *env)
         return ARMMMUIdxBit_SE10_1 |
                ARMMMUIdxBit_SE10_1_PAN |
                ARMMMUIdxBit_SE10_0;
-    } else if (arm_feature(env, ARM_FEATURE_EL2)) {
-        return ARMMMUIdxBit_E10_1 |
-               ARMMMUIdxBit_E10_1_PAN |
-               ARMMMUIdxBit_E10_0 |
-               ARMMMUIdxBit_Stage2;
     } else {
         return ARMMMUIdxBit_E10_1 |
                ARMMMUIdxBit_E10_1_PAN |
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                              ARMMMUIdxBit_SE3);
 }
 
-static void tlbi_aa64_ipas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                                    uint64_t value)
-{
-    /* Invalidate by IPA. This has to invalidate any structures that
-     * contain only stage 2 translation information, but does not need
-     * to apply to structures that contain combined stage 1 and stage 2
-     * translation information.
-     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
-     */
-    ARMCPU *cpu = env_archcpu(env);
-    CPUState *cs = CPU(cpu);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 48);
-
-    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
-}
-
-static void tlbi_aa64_ipas2e1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                                      uint64_t value)
-{
-    CPUState *cs = env_cpu(env);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 48);
-
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
-                                             ARMMMUIdxBit_Stage2);
-}
-
 static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri,
                                       bool isread)
 {
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = tlbi_aa64_vae1_write },
     { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1is_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_IPAS2LE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1is_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4,
       .access = PL2_W, .type = ARM_CP_NO_RAW,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = tlbi_aa64_alle1is_write },
     { .name = "TLBI_IPAS2E1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_IPAS2LE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4,
       .access = PL2_W, .type = ARM_CP_NO_RAW,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = tlbimva_hyp_is_write },
     { .name = "TLBIIPAS2",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     { .name = "TLBIIPAS2IS",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_is_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     { .name = "TLBIIPAS2L",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     { .name = "TLBIIPAS2LIS",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_is_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     /* 32 bit cache operations */
     { .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
-- 
2.20.1

The access_type argument to get_phys_addr_lpae() is an MMUAccessType;
use the enum constant MMU_DATA_LOAD rather than a literal 0 when we
call it in S1_ptw_translate().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-3-peter.maydell@linaro.org
---
 target/arm/helper.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
             pcacheattrs = &cacheattrs;
         }
 
-        ret = get_phys_addr_lpae(env, addr, 0, ARMMMUIdx_Stage2, &s2pa,
-                                 &txattrs, &s2prot, &s2size, fi, pcacheattrs);
+        ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
+                                 &s2pa, &txattrs, &s2prot, &s2size, fi,
+                                 pcacheattrs);
         if (ret) {
             assert(fi->type != ARMFault_None);
             fi->s2addr = addr;
-- 
2.20.1

For ARMv8.2-TTS2UXN, the stage 2 page table walk wants to know
whether the stage 1 access is for EL0 or not, because whether
exec permission is given can depend on whether this is an EL0
or EL1 access. Add a new argument to get_phys_addr_lpae() so
the call sites can pass this information in.

Since get_phys_addr_lpae() doesn't already have a doc comment,
add one so we have a place to put the documentation of the
semantics of the new s1_is_el0 argument.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-4-peter.maydell@linaro.org
---
 target/arm/helper.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
 
 static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                               bool s1_is_el0,
                                hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
                                target_ulong *page_size_ptr,
                                ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs);
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
         }
 
         ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
+                                 false,
                                  &s2pa, &txattrs, &s2prot, &s2size, fi,
                                  pcacheattrs);
         if (ret) {
@@ -XXX,XX +XXX,XX @@ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
     };
 }
 
+/**
+ * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
+ *
+ * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
+ * prot and page_size may not be filled in, and the populated fsr value provides
+ * information on why the translation aborted, in the format of a long-format
+ * DFSR/IFSR fault register, with the following caveats:
+ *  * the WnR bit is never set (the caller must do this).
+ *
+ * @env: CPUARMState
+ * @address: virtual address to get physical address for
+ * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
+ * @mmu_idx: MMU index indicating required translation regime
+ * @s1_is_el0: if @mmu_idx is ARMMMUIdx_Stage2 (so this is a stage 2 page table
+ *             walk), must be true if this is stage 2 of a stage 1+2 walk for an
+ *             EL0 access). If @mmu_idx is anything else, @s1_is_el0 is ignored.
+ * @phys_ptr: set to the physical address corresponding to the virtual address
+ * @attrs: set to the memory transaction attributes to use
+ * @prot: set to the permissions for the page containing phys_ptr
+ * @page_size_ptr: set to the size of the page containing phys_ptr
+ * @fi: set to fault info if the translation fails
+ * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
+ */
 static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                               bool s1_is_el0,
                                hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
                                target_ulong *page_size_ptr,
                                ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 
             /* S1 is done. Now do S2 translation.  */
             ret = get_phys_addr_lpae(env, ipa, access_type, ARMMMUIdx_Stage2,
+                                     mmu_idx == ARMMMUIdx_E10_0,
                                      phys_ptr, attrs, &s2_prot,
                                      page_size, fi,
                                      cacheattrs != NULL ? &cacheattrs2 : NULL);
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
     }
 
     if (regime_using_lpae_format(env, mmu_idx)) {
-        return get_phys_addr_lpae(env, address, access_type, mmu_idx,
+        return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
                                   phys_ptr, attrs, prot, page_size,
                                   fi, cacheattrs);
     } else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
-- 
2.20.1

The ARMv8.2-TTS2UXN feature extends the XN field in stage 2
translation table descriptors from just bit [54] to bits [54:53],
allowing stage 2 to control execution permissions separately for EL0
and EL1. Implement the new semantics of the XN field and enable
the feature for our 'max' CPU.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-5-peter.maydell@linaro.org
---
 target/arm/cpu.h    | 15 +++++++++++++++
 target/arm/cpu.c    |  1 +
 target/arm/cpu64.c  |  2 ++
 target/arm/helper.c | 37 +++++++++++++++++++++++++++++++------
 4 files changed, 49 insertions(+), 6 deletions(-)

In aarch64_max_initfn() we update both 32-bit and 64-bit ID
registers.  The intended pattern is that for 64-bit ID registers we
use FIELD_DP64 and the uint64_t 't' register, while 32-bit ID
registers use FIELD_DP32 and the uint32_t 'u' register.  For
ID_AA64DFR0 we accidentally used 'u', meaning that the top 32 bits of
this 64-bit ID register would end up always zero.  Luckily at the
moment that's what they should be anyway, so this bug has no visible
effects.

Use the right-sized variable.

Fixes: 3bec78447a958d481991
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200423110915.10527-1-peter.maydell@linaro.org
---
 target/arm/cpu64.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
         cpu->isar.id_mmfr4 = u;
 
-        u = cpu->isar.id_aa64dfr0;
-        u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
-        cpu->isar.id_aa64dfr0 = u;
+        t = cpu->isar.id_aa64dfr0;
+        t = FIELD_DP64(t, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
+        cpu->isar.id_aa64dfr0 = t;
 
         u = cpu->isar.id_dfr0;
         u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

MIDR_EL1 is a 64-bit system register with the top 32-bit being RES0.
Represent it in QEMU's ARMCPU struct with a uint64_t, not a
uint32_t.

This fixes an error when compiling with -Werror=conversion
because we were manipulating the register value using a
local uint64_t variable:

target/arm/cpu64.c: In function ‘aarch64_max_initfn’:
  target/arm/cpu64.c:628:21: error: conversion from ‘uint64_t’ {aka ‘long unsigned int’} to ‘uint32_t’ {aka ‘unsigned int’} may change value [-Werror=conversion]
    628 |         cpu->midr = t;
        |                     ^

and future-proofs us against a possible future architecture
change using some of the top 32 bits.

Suggested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Message-id: 20200428172634.29707-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 2 +-
 target/arm/cpu.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
         uint64_t id_aa64dfr0;
         uint64_t id_aa64dfr1;
     } isar;
-    uint32_t midr;
+    uint64_t midr;
     uint32_t revidr;
     uint32_t reset_fpsid;
     uint32_t ctr;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_cpus[] = {
 static Property arm_cpu_properties[] = {
     DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
     DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
-    DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
+    DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
     DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
                         mp_affinity, ARM64_AFFINITY_INVALID),
     DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
-- 
2.20.1