Series comparison

-[PULL 00/23] target-arm queue
+[PULL 00/26] target-arm queue
-Mostly my decodetree stuff, but also some patches for various
+Small pile of bug fixes for rc1. I've included my patches to get
-smaller bugs/features from others.
+our docs building with Sphinx 3, just for convenience...
-thanks
 -- PMM
-The following changes since commit 53550e81e2cafe7c03a39526b95cd21b5194d9b1:
+The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:
-  Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2020-06-15 16:36:34 +0100)
+  Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200616
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102
-for you to fetch changes up to 64b397417a26509bcdff44ab94356a35c7901c79:
+for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:
-  hw: arm: Set vendor property for IMX SDHCI emulations (2020-06-16 10:32:29 +0100)
+  tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)
 ----------------------------------------------------------------
- * hw: arm: Set vendor property for IMX SDHCI emulations
+target-arm queue:
- * sd: sdhci: Implement basic vendor specific register support
+ * target/arm: Fix Neon emulation bugs on big-endian hosts
- * hw/net/imx_fec: Convert debug fprintf() to trace events
+ * target/arm: fix handling of HCR.FB
- * target/arm/cpu: adjust virtual time for all KVM arm cpus
+ * target/arm: fix LORID_EL1 access check
- * Implement configurable descriptor size in ftgmac100
+ * disas/capstone: Fix monitor disassembly of >32 bytes
- * hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+ * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
- * target/arm: More Neon decodetree conversion work
+ * hw/arm/boot: fix SVE for EL3 direct kernel boot
  * hw/display/omap_lcdc: Fix potential NULL pointer dereference
  * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
  * target/arm: Get correct MMU index for other-security-state
  * configure: Test that gio libs from pkg-config work
  * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
  * docs: Fix building with Sphinx 3
  * tests/qtest/npcm7xx_rng-test: Disable randomness tests
 ----------------------------------------------------------------
-Erik Smit (1):
+AlexChen (2):
-      Implement configurable descriptor size in ftgmac100
+      hw/display/omap_lcdc: Fix potential NULL pointer dereference
       hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
-Guenter Roeck (2):
+Peter Maydell (9):
-      sd: sdhci: Implement basic vendor specific register support
+      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
-      hw: arm: Set vendor property for IMX SDHCI emulations
+      target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
       disas/capstone: Fix monitor disassembly of >32 bytes
       target/arm: Get correct MMU index for other-security-state
       configure: Test that gio libs from pkg-config work
       hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
       scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
       qemu-option-trace.rst.inc: Don't use option:: markup
       tests/qtest/npcm7xx_rng-test: Disable randomness tests
-Jean-Christophe Dubois (2):
+Philippe Mathieu-Daudé (1):
-      hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
       hw/net/imx_fec: Convert debug fprintf() to trace events
-Peter Maydell (17):
+Richard Henderson (11):
-      target/arm: Fix missing temp frees in do_vshll_2sh
+      target/arm: Introduce neon_full_reg_offset
-      target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
+      target/arm: Move neon_element_offset to translate.c
-      target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
+      target/arm: Use neon_element_offset in neon_load/store_reg
-      target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
+      target/arm: Use neon_element_offset in vfp_reg_offset
-      target/arm: Convert Neon 3-reg-diff long multiplies
+      target/arm: Add read/write_neon_element32
-      target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
+      target/arm: Expand read/write_neon_element32 to all MemOp
-      target/arm: Convert Neon 3-reg-diff polynomial VMULL
+      target/arm: Rename neon_load_reg32 to vfp_load_reg32
-      target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
+      target/arm: Add read/write_neon_element64
-      target/arm: Add missing TCG temp free in do_2shift_env_64()
+      target/arm: Rename neon_load_reg64 to vfp_load_reg64
-      target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
+      target/arm: Simplify do_long_3d and do_2scalar_long
-      target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
+      target/arm: Improve do_prewiden_3d
       target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
       target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
       target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
       target/arm: Convert Neon VEXT to decodetree
       target/arm: Convert Neon VTBL, VTBX to decodetree
       target/arm: Convert Neon VDUP (scalar) to decodetree
-fangying (1):
+Rémi Denis-Courmont (3):
-      target/arm/cpu: adjust virtual time for all KVM arm cpus
+      target/arm: fix handling of HCR.FB
       target/arm: fix LORID_EL1 access check
       hw/arm/boot: fix SVE for EL3 direct kernel boot
- hw/sd/sdhci-internal.h          |    5 +
+ docs/qemu-option-trace.rst.inc     |   6 +-
- include/hw/sd/sdhci.h           |    5 +
+ configure                          |  10 +-
- target/arm/translate.h          |    1 +
+ include/hw/intc/arm_gicv3_common.h |   1 -
- target/arm/neon-dp.decode       |  130 +++++
+ disas/capstone.c                   |   2 +-
- hw/arm/fsl-imx25.c              |    6 +
+ hw/arm/boot.c                      |   3 +
- hw/arm/fsl-imx6.c               |    6 +
+ hw/arm/smmuv3.c                    |   3 +-
- hw/arm/fsl-imx6ul.c             |    2 +
+ hw/display/exynos4210_fimd.c       |   4 +-
- hw/arm/fsl-imx7.c               |    2 +
+ hw/display/omap_lcdc.c             |  10 +-
- hw/misc/imx6ul_ccm.c            |   76 ++-
+ hw/intc/arm_gicv3_cpuif.c          |   5 +-
- hw/net/ftgmac100.c              |   26 +-
+ target/arm/helper.c                |  24 +-
- hw/net/imx_fec.c                |  106 ++--
+ target/arm/m_helper.c              |   3 +-
- hw/sd/sdhci.c                   |   18 +-
+ target/arm/translate.c             | 153 +++++++++---
- target/arm/cpu.c                |    6 +-
+ target/arm/vec_helper.c            |  12 +-
- target/arm/cpu64.c              |    1 -
+ tests/qtest/npcm7xx_rng-test.c     |  14 +-
- target/arm/kvm.c                |   21 +-
+ scripts/kernel-doc                 |  18 +-
- target/arm/translate-neon.inc.c | 1148 ++++++++++++++++++++++++++++++++++++++-
+ target/arm/translate-neon.c.inc    | 472 ++++++++++++++++++++-----------------
- target/arm/translate.c          |  684 +----------------------
+ target/arm/translate-vfp.c.inc     | 341 +++++++++++----------------
- hw/net/trace-events             |   18 +
+files changed, 588 insertions(+), 493 deletions(-)
 files changed, 1495 insertions(+), 766 deletions(-)

-[PULL 07/23] target/arm: Convert Neon 3-reg-diff polynomial VMULL
+[PULL 01/26] target/arm: Introduce neon_full_reg_offset
-Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
+From: Richard Henderson <richard.henderson@linaro.org>
 insn in this group to be converted.
+This function makes it clear that we're talking about the whole
+register, and not the 32-bit piece at index 0.  This fixes a bug
+when running on a big-endian host.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  2 ++
+ target/arm/translate.c          |  8 ++++++
- target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
+ target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
- target/arm/translate.c          | 60 ++-------------------------------
+ target/arm/translate-vfp.c.inc  |  2 +-
-files changed, 48 insertions(+), 57 deletions(-)
+files changed, 31 insertions(+), 23 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
-     VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
-+
-+    VMULL_P_3d   1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
-   ]
- }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
-     return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
- }
-+
-+static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
-+{
-+    gen_helper_gvec_3 *fn_gvec;
-+
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (a->vd & 1) {
-+        return false;
-+    }
-+
-+    switch (a->size) {
-+    case 0:
-+        fn_gvec = gen_helper_neon_pmull_h;
-+        break;
-+    case 2:
-+        if (!dc_isar_feature(aa32_pmull, s)) {
-+            return false;
-+        }
-+        fn_gvec = gen_helper_gvec_pmull_q;
-+        break;
-+    default:
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
-+                       neon_reg_offset(a->vn, 0),
-+                       neon_reg_offset(a->vm, 0),
-+                       16, 16, 0, fn_gvec);
-+    return true;
-+}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
      unallocated_encoding(s);
  }
 +/*
 + * Return the offset of a "full" NEON Dreg.
 + */
 +static long neon_full_reg_offset(unsigned reg)
 +{
 +    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
 +}
 +
  static inline long vfp_reg_offset(bool dp, unsigned reg)
  {
-     int op;
+     if (dp) {
-     int q;
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
--    int rd, rn, rm, rd_ofs, rn_ofs, rm_ofs;
+index XXXXXXX..XXXXXXX 100644
-+    int rd, rn, rm, rd_ofs, rm_ofs;
+--- a/target/arm/translate-neon.c.inc
-     int size;
++++ b/target/arm/translate-neon.c.inc
-     int pass;
+@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
-     int u;
+         ofs ^= 8 - element_size;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+     }
-     size = (insn >> 20) & 3;
+ #endif
-     vec_size = q ? 16 : 8;
+-    return neon_reg_offset(reg, 0) + ofs;
-     rd_ofs = neon_reg_offset(rd, 0);
++    return neon_full_reg_offset(reg) + ofs;
--    rn_ofs = neon_reg_offset(rn, 0);
+ }
-     rm_ofs = neon_reg_offset(rm, 0);
+ static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
-     if ((insn & (1 << 23)) == 0) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+              * We cannot write 16 bytes at once because the
-         if (size != 3) {
+              * destination is unaligned.
-             op = (insn >> 8) & 0xf;
+              */
-             if ((insn & (1 << 6)) == 0) {
+-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
--                /* Three registers of different lengths.  */
++            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
--                /* undefreq: bit 0 : UNDEF if size == 0
+, 8, tmp);
--                 *           bit 1 : UNDEF if size == 1
+-            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
--                 *           bit 2 : UNDEF if size == 2
+-                             neon_reg_offset(vd, 0), 8, 8);
--                 *           bit 3 : UNDEF if U == 1
++            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
--                 * Note that [2:0] set implies 'always UNDEF'
++                             neon_full_reg_offset(vd), 8, 8);
--                 */
+         } else {
--                int undefreq;
+-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
--                /* prewiden, src1_wide, src2_wide, undefreq */
++            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
--                static const int neon_3reg_wide[16][4] = {
+                                  vec_size, vec_size, tmp);
--                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
+         }
--                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
+         tcg_gen_addi_i32(addr, addr, 1 << size);
--                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
--                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
+ static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
--                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
+ {
--                    {0, 0, 0, 7}, /* VABAL */
+     int vec_size = a->q ? 16 : 8;
--                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
+-    int rd_ofs = neon_reg_offset(a->vd, 0);
--                    {0, 0, 0, 7}, /* VABDL */
+-    int rn_ofs = neon_reg_offset(a->vn, 0);
--                    {0, 0, 0, 7}, /* VMLAL */
+-    int rm_ofs = neon_reg_offset(a->vm, 0);
--                    {0, 0, 0, 7}, /* VQDMLAL */
++    int rd_ofs = neon_full_reg_offset(a->vd);
--                    {0, 0, 0, 7}, /* VMLSL */
++    int rn_ofs = neon_full_reg_offset(a->vn);
--                    {0, 0, 0, 7}, /* VQDMLSL */
++    int rm_ofs = neon_full_reg_offset(a->vm);
--                    {0, 0, 0, 7}, /* Integer VMULL */
--                    {0, 0, 0, 7}, /* VQDMULL */
+     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
--                    {0, 0, 0, 0xa}, /* Polynomial VMULL */
+         return false;
--                    {0, 0, 0, 7}, /* Reserved: always UNDEF */
+@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
--                };
+ {
--
+     /* Handle a 2-reg-shift insn which can be vectorized. */
--                undefreq = neon_3reg_wide[op][3];
+     int vec_size = a->q ? 16 : 8;
--
+-    int rd_ofs = neon_reg_offset(a->vd, 0);
--                if ((undefreq & (1 << size)) ||
+-    int rm_ofs = neon_reg_offset(a->vm, 0);
--                    ((undefreq & 8) && u)) {
++    int rd_ofs = neon_full_reg_offset(a->vd);
--                    return 1;
++    int rm_ofs = neon_full_reg_offset(a->vm);
--                }
--                if (rd & 1) {
+     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
--                    return 1;
+         return false;
--                }
+@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
--
+ {
--                /* Handle polynomial VMULL in a single pass.  */
+     /* FP operations in 2-reg-and-shift group */
--                if (op == 14) {
+     int vec_size = a->q ? 16 : 8;
--                    if (size == 0) {
+-    int rd_ofs = neon_reg_offset(a->vd, 0);
--                        /* VMULL.P8 */
+-    int rm_ofs = neon_reg_offset(a->vm, 0);
--                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
++    int rd_ofs = neon_full_reg_offset(a->vd);
--                                           0, gen_helper_neon_pmull_h);
++    int rm_ofs = neon_full_reg_offset(a->vm);
--                    } else {
+     TCGv_ptr fpst;
--                        /* VMULL.P64 */
--                        if (!dc_isar_feature(aa32_pmull, s)) {
+     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
--                            return 1;
+@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
--                        }
+         return true;
--                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
+     }
--                                           0, gen_helper_gvec_pmull_q);
--                    }
+-    reg_ofs = neon_reg_offset(a->vd, 0);
--                    return 0;
++    reg_ofs = neon_full_reg_offset(a->vd);
--                }
+     vec_size = a->q ? 16 : 8;
--                abort(); /* all others handled by decodetree */
+     imm = asimd_imm_const(a->imm, a->cmode, a->op);
-+                /* Three registers of different lengths: handled by decodetree */
-+                return 1;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
-             } else {
+         return true;
-                 /* Two registers and a scalar. NB that for ops of this form
+     }
-                  * the ARM ARM labels bit 24 as Q, but it is in our variable
 -    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
 -                       neon_reg_offset(a->vn, 0),
 -                       neon_reg_offset(a->vm, 0),
 +    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
 +                       neon_full_reg_offset(a->vn),
 +                       neon_full_reg_offset(a->vm),
 , 16, 0, fn_gvec);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
  {
      /* Two registers and a scalar, using gvec */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rn_ofs = neon_reg_offset(a->vn, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rn_ofs = neon_full_reg_offset(a->vn);
      int rm_ofs;
      int idx;
      TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
      /* a->vm is M:Vm, which encodes both register and index */
      idx = extract32(a->vm, a->size + 2, 2);
      a->vm = extract32(a->vm, 0, a->size + 2);
 -    rm_ofs = neon_reg_offset(a->vm, 0);
 +    rm_ofs = neon_full_reg_offset(a->vm);
      fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
      tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
          return true;
      }
 -    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
 +    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                           neon_element_offset(a->vm, a->index, a->size),
                           a->q ? 16 : 8, a->q ? 16 : 8);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
  static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
  {
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
      }
      tmp = load_reg(s, a->rt);
 -    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
 +    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                           vec_size, vec_size, tmp);
      tcg_temp_free_i32(tmp);
 --
 .20.1

-[PULL 06/23] target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
+[PULL 02/26] target/arm: Move neon_element_offset to translate.c
-Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
+From: Richard Henderson <richard.henderson@linaro.org>
 these are all saturating doubling long multiplies with a possible
 accumulate step.
-These are the last insns in the group which use the pass-over-each
+This will shortly have users outside of translate-neon.c.inc.
 elements loop, so we can delete that code.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  6 +++
+ target/arm/translate.c          | 20 ++++++++++++++++++++
- target/arm/translate-neon.inc.c | 82 +++++++++++++++++++++++++++++++++
+ target/arm/translate-neon.c.inc | 19 -------------------
- target/arm/translate.c          | 59 ++----------------------
+files changed, 20 insertions(+), 19 deletions(-)
 files changed, 92 insertions(+), 55 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     VMLAL_S_3d   1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
-     VMLAL_U_3d   1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
-+    VQDMLAL_3d   1111 001 0 1 . .. .... .... 1001 . 0 . 0 .... @3diff
-+
-     VMLSL_S_3d   1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
-     VMLSL_U_3d   1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
-+    VQDMLSL_3d   1111 001 0 1 . .. .... .... 1011 . 0 . 0 .... @3diff
-+
-     VMULL_S_3d   1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
-     VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
-+
-+    VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
-   ]
- }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ DO_VMLAL(VMLAL_S,mull_s,add)
- DO_VMLAL(VMLAL_U,mull_u,add)
- DO_VMLAL(VMLSL_S,mull_s,sub)
- DO_VMLAL(VMLSL_U,mull_u,sub)
-+
-+static void gen_VQDMULL_16(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
-+{
-+    gen_helper_neon_mull_s16(rd, rn, rm);
-+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rd, rd);
-+}
-+
-+static void gen_VQDMULL_32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
-+{
-+    gen_mull_s32(rd, rn, rm);
-+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rd, rd);
-+}
-+
-+static bool trans_VQDMULL_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
-+        gen_VQDMULL_16,
-+        gen_VQDMULL_32,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], NULL);
-+}
-+
-+static void gen_VQDMLAL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-+{
-+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
-+}
-+
-+static void gen_VQDMLAL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-+{
-+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
-+}
-+
-+static bool trans_VQDMLAL_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
-+        gen_VQDMULL_16,
-+        gen_VQDMULL_32,
-+        NULL,
-+    };
-+    static NeonGenTwo64OpFn * const accfn[] = {
-+        NULL,
-+        gen_VQDMLAL_acc_16,
-+        gen_VQDMLAL_acc_32,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
-+}
-+
-+static void gen_VQDMLSL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-+{
-+    gen_helper_neon_negl_u32(rm, rm);
-+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
-+}
-+
-+static void gen_VQDMLSL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-+{
-+    tcg_gen_neg_i64(rm, rm);
-+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
-+}
-+
-+static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
-+        gen_VQDMULL_16,
-+        gen_VQDMULL_32,
-+        NULL,
-+    };
-+    static NeonGenTwo64OpFn * const accfn[] = {
-+        NULL,
-+        gen_VQDMLSL_acc_16,
-+        gen_VQDMLSL_acc_32,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
-+}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
-                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
+     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
-                     {0, 0, 0, 7}, /* VABDL */
+ }
-                     {0, 0, 0, 7}, /* VMLAL */
--                    {0, 0, 0, 9}, /* VQDMLAL */
++/*
-+                    {0, 0, 0, 7}, /* VQDMLAL */
++ * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
-                     {0, 0, 0, 7}, /* VMLSL */
++ * where 0 is the least significant end of the register.
--                    {0, 0, 0, 9}, /* VQDMLSL */
++ */
-+                    {0, 0, 0, 7}, /* VQDMLSL */
++static long neon_element_offset(int reg, int element, MemOp size)
-                     {0, 0, 0, 7}, /* Integer VMULL */
++{
--                    {0, 0, 0, 9}, /* VQDMULL */
++    int element_size = 1 << size;
-+                    {0, 0, 0, 7}, /* VQDMULL */
++    int ofs = element * element_size;
-                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
++#ifdef HOST_WORDS_BIGENDIAN
-                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
++    /*
-                 };
++     * Calculate the offset assuming fully little-endian,
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++     * then XOR to account for the order of the 8-byte units.
-                     }
++     */
-                     return 0;
++    if (element_size < 8) {
-                 }
++        ofs ^= 8 - element_size;
 +    }
 +#endif
 +    return neon_full_reg_offset(reg) + ofs;
 +}
 +
  static inline long vfp_reg_offset(bool dp, unsigned reg)
  {
      if (dp) {
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
  #include "decode-neon-ls.c.inc"
  #include "decode-neon-shared.c.inc"
 -/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
 - * where 0 is the least significant end of the register.
 - */
 -static inline long
 -neon_element_offset(int reg, int element, MemOp size)
 -{
 -    int element_size = 1 << size;
 -    int ofs = element * element_size;
 -#ifdef HOST_WORDS_BIGENDIAN
 -    /* Calculate the offset assuming fully little-endian,
 -     * then XOR to account for the order of the 8-byte units.
 -     */
 -    if (element_size < 8) {
 -        ofs ^= 8 - element_size;
 -    }
 -#endif
 -    return neon_full_reg_offset(reg) + ofs;
 -}
 -
--                /* Avoid overlapping operands.  Wide source operands are
+ static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
--                   always aligned so will never overlap with wide
+ {
--                   destinations in problematic ways.  */
+     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
 -                if (rd == rm) {
 -                    tmp = neon_load_reg(rm, 1);
 -                    neon_store_scratch(2, tmp);
 -                } else if (rd == rn) {
 -                    tmp = neon_load_reg(rn, 1);
 -                    neon_store_scratch(2, tmp);
 -                }
 -                tmp3 = NULL;
 -                for (pass = 0; pass < 2; pass++) {
 -                    if (pass == 1 && rd == rn) {
 -                        tmp = neon_load_scratch(2);
 -                    } else {
 -                        tmp = neon_load_reg(rn, pass);
 -                    }
 -                    if (pass == 1 && rd == rm) {
 -                        tmp2 = neon_load_scratch(2);
 -                    } else {
 -                        tmp2 = neon_load_reg(rm, pass);
 -                    }
 -                    switch (op) {
 -                    case 9: case 11: case 13:
 -                        /* VQDMLAL, VQDMLSL, VQDMULL */
 -                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
 -                        break;
 -                    default: /* 15 is RESERVED: caught earlier  */
 -                        abort();
 -                    }
 -                    if (op == 13) {
 -                        /* VQDMULL */
 -                        gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
 -                        neon_store_reg64(cpu_V0, rd + pass);
 -                    } else {
 -                        /* Accumulate.  */
 -                        neon_load_reg64(cpu_V1, rd + pass);
 -                        switch (op) {
 -                        case 9: case 11: /* VQDMLAL, VQDMLSL */
 -                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
 -                            if (op == 11) {
 -                                gen_neon_negl(cpu_V0, size);
 -                            }
 -                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
 -                            break;
 -                        default:
 -                            abort();
 -                        }
 -                        neon_store_reg64(cpu_V0, rd + pass);
 -                    }
 -                }
 +                abort(); /* all others handled by decodetree */
              } else {
                  /* Two registers and a scalar. NB that for ops of this form
                   * the ARM ARM labels bit 24 as Q, but it is in our variable
 --
 .20.1

-[PULL 10/23] target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
+[PULL 03/26] target/arm: Use neon_element_offset in neon_load/store_reg
-Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
+From: Richard Henderson <richard.henderson@linaro.org>
 scalar" group to decodetree.  These are 32x32->32 operations where
 one of the inputs is the scalar, followed by a possible accumulate
 operation of the 32-bit result.
-The refactoring removes some of the oddities of the old decoder:
+These are the only users of neon_reg_offset, so remove that.
  * operands to the operation and accumulation were often
    reversed (taking advantage of the fact that most of these ops
    are commutative); the new code follows the pseudocode order
  * the Q bit in the insn was in a local variable 'u'; in the
    new code it is decoded into a->q
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  15 ++++
+ target/arm/translate.c | 14 ++------------
- target/arm/translate-neon.inc.c | 133 ++++++++++++++++++++++++++++++++
+file changed, 2 insertions(+), 12 deletions(-)
  target/arm/translate.c          |  77 ++----------------
 files changed, 154 insertions(+), 71 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
-     VMULL_P_3d   1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
-+
-+    ##################################################################
-+    # 2-regs-plus-scalar grouping:
-+    # 1111 001 Q 1 D sz!=11 Vn:4 Vd:4 opc:4 N 1 M 0 Vm:4
-+    ##################################################################
-+    &2scalar vm vn vd size q
-+
-+    @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
-+                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+    VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
-+
-+    VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
-+
-+    VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
-   ]
- }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
-, 16, 0, fn_gvec);
-     return true;
- }
-+
-+static void gen_neon_dup_low16(TCGv_i32 var)
-+{
-+    TCGv_i32 tmp = tcg_temp_new_i32();
-+    tcg_gen_ext16u_i32(var, var);
-+    tcg_gen_shli_i32(tmp, var, 16);
-+    tcg_gen_or_i32(var, var, tmp);
-+    tcg_temp_free_i32(tmp);
-+}
-+
-+static void gen_neon_dup_high16(TCGv_i32 var)
-+{
-+    TCGv_i32 tmp = tcg_temp_new_i32();
-+    tcg_gen_andi_i32(var, var, 0xffff0000);
-+    tcg_gen_shri_i32(tmp, var, 16);
-+    tcg_gen_or_i32(var, var, tmp);
-+    tcg_temp_free_i32(tmp);
-+}
-+
-+static inline TCGv_i32 neon_get_scalar(int size, int reg)
-+{
-+    TCGv_i32 tmp;
-+    if (size == 1) {
-+        tmp = neon_load_reg(reg & 7, reg >> 4);
-+        if (reg & 8) {
-+            gen_neon_dup_high16(tmp);
-+        } else {
-+            gen_neon_dup_low16(tmp);
-+        }
-+    } else {
-+        tmp = neon_load_reg(reg & 15, reg >> 4);
-+    }
-+    return tmp;
-+}
-+
-+static bool do_2scalar(DisasContext *s, arg_2scalar *a,
-+                       NeonGenTwoOpFn *opfn, NeonGenTwoOpFn *accfn)
-+{
-+    /*
-+     * Two registers and a scalar: perform an operation between
-+     * the input elements and the scalar, and then possibly
-+     * perform an accumulation operation of that result into the
-+     * destination.
-+     */
-+    TCGv_i32 scalar;
-+    int pass;
-+
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!opfn) {
-+        /* Bad size (including size == 3, which is a different insn group) */
-+        return false;
-+    }
-+
-+    if (a->q && ((a->vd | a->vn) & 1)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    scalar = neon_get_scalar(a->size, a->vm);
-+
-+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-+        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
-+        opfn(tmp, tmp, scalar);
-+        if (accfn) {
-+            TCGv_i32 rd = neon_load_reg(a->vd, pass);
-+            accfn(tmp, rd, tmp);
-+            tcg_temp_free_i32(rd);
-+        }
-+        neon_store_reg(a->vd, pass, tmp);
-+    }
-+    tcg_temp_free_i32(scalar);
-+    return true;
-+}
-+
-+static bool trans_VMUL_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenTwoOpFn * const opfn[] = {
-+        NULL,
-+        gen_helper_neon_mul_u16,
-+        tcg_gen_mul_i32,
-+        NULL,
-+    };
-+
-+    return do_2scalar(s, a, opfn[a->size], NULL);
-+}
-+
-+static bool trans_VMLA_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenTwoOpFn * const opfn[] = {
-+        NULL,
-+        gen_helper_neon_mul_u16,
-+        tcg_gen_mul_i32,
-+        NULL,
-+    };
-+    static NeonGenTwoOpFn * const accfn[] = {
-+        NULL,
-+        gen_helper_neon_add_u16,
-+        tcg_gen_add_i32,
-+        NULL,
-+    };
-+
-+    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
-+}
-+
-+static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenTwoOpFn * const opfn[] = {
-+        NULL,
-+        gen_helper_neon_mul_u16,
-+        tcg_gen_mul_i32,
-+        NULL,
-+    };
-+    static NeonGenTwoOpFn * const accfn[] = {
-+        NULL,
-+        gen_helper_neon_sub_u16,
-+        tcg_gen_sub_i32,
-+        NULL,
-+    };
-+
-+    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
-+}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
- #define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16,  7)
+     }
- #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
+ }
--static void gen_neon_dup_low16(TCGv_i32 var)
+-/* Return the offset of a 32-bit piece of a NEON register.
 -   zero is the least significant end of the register.  */
 -static inline long
 -neon_reg_offset (int reg, int n)
 -{
--    TCGv_i32 tmp = tcg_temp_new_i32();
+-    int sreg;
--    tcg_gen_ext16u_i32(var, var);
+-    sreg = reg * 2 + n;
--    tcg_gen_shli_i32(tmp, var, 16);
+-    return vfp_reg_offset(0, sreg);
 -    tcg_gen_or_i32(var, var, tmp);
 -    tcg_temp_free_i32(tmp);
 -}
 -
--static void gen_neon_dup_high16(TCGv_i32 var)
+ static TCGv_i32 neon_load_reg(int reg, int pass)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_andi_i32(var, var, 0xffff0000);
 -    tcg_gen_shri_i32(tmp, var, 16);
 -    tcg_gen_or_i32(var, var, tmp);
 -    tcg_temp_free_i32(tmp);
 -}
 -
  static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
  {
  #ifndef CONFIG_USER_ONLY
@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
  #define CPU_V001 cpu_V0, cpu_V0, cpu_V1
 -static inline void gen_neon_add(int size, TCGv_i32 t0, TCGv_i32 t1)
 -{
 -    switch (size) {
 -    case 0: gen_helper_neon_add_u8(t0, t0, t1); break;
 -    case 1: gen_helper_neon_add_u16(t0, t0, t1); break;
 -    case 2: tcg_gen_add_i32(t0, t0, t1); break;
 -    default: abort();
 -    }
 -}
 -
 -static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
 -{
 -    switch (size) {
 -    case 0: gen_helper_neon_sub_u8(t0, t1, t0); break;
 -    case 1: gen_helper_neon_sub_u16(t0, t1, t0); break;
 -    case 2: tcg_gen_sub_i32(t0, t1, t0); break;
 -    default: return;
 -    }
 -}
 -
  static TCGv_i32 neon_load_scratch(int scratch)
  {
      TCGv_i32 tmp = tcg_temp_new_i32();
-@@ -XXX,XX +XXX,XX @@ static void neon_store_scratch(int scratch, TCGv_i32 var)
+-    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
 +    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
      return tmp;
  }
  static void neon_store_reg(int reg, int pass, TCGv_i32 var)
  {
 -    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
 +    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
      tcg_temp_free_i32(var);
  }
--static inline TCGv_i32 neon_get_scalar(int size, int reg)
--{
--    TCGv_i32 tmp;
--    if (size == 1) {
--        tmp = neon_load_reg(reg & 7, reg >> 4);
--        if (reg & 8) {
--            gen_neon_dup_high16(tmp);
--        } else {
--            gen_neon_dup_low16(tmp);
--        }
--    } else {
--        tmp = neon_load_reg(reg & 15, reg >> 4);
--    }
--    return tmp;
--}
--
- static int gen_neon_unzip(int rd, int rm, int size, int q)
- {
-     TCGv_ptr pd, pm;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     return 1;
-                 }
-                 switch (op) {
-+                case 0: /* Integer VMLA scalar */
-+                case 4: /* Integer VMLS scalar */
-+                case 8: /* Integer VMUL scalar */
-+                    return 1; /* handled by decodetree */
-+
-                 case 1: /* Float VMLA scalar */
-                 case 5: /* Floating point VMLS scalar */
-                 case 9: /* Floating point VMUL scalar */
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         return 1;
-                     }
-                     /* fall through */
--                case 0: /* Integer VMLA scalar */
--                case 4: /* Integer VMLS scalar */
--                case 8: /* Integer VMUL scalar */
-                 case 12: /* VQDMULH scalar */
-                 case 13: /* VQRDMULH scalar */
-                     if (u && ((rd | rn) & 1)) {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                             } else {
-                                 gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
-                             }
--                        } else if (op & 1) {
-+                        } else {
-                             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-                             gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
-                             tcg_temp_free_ptr(fpstatus);
--                        } else {
--                            switch (size) {
--                            case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
--                            case 1: gen_helper_neon_mul_u16(tmp, tmp, tmp2); break;
--                            case 2: tcg_gen_mul_i32(tmp, tmp, tmp2); break;
--                            default: abort();
--                            }
-                         }
-                         tcg_temp_free_i32(tmp2);
-                         if (op < 8) {
-                             /* Accumulate.  */
-                             tmp2 = neon_load_reg(rd, pass);
-                             switch (op) {
--                            case 0:
--                                gen_neon_add(size, tmp, tmp2);
--                                break;
-                             case 1:
-                             {
-                                 TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                                 tcg_temp_free_ptr(fpstatus);
-                                 break;
-                             }
--                            case 4:
--                                gen_neon_rsb(size, tmp, tmp2);
--                                break;
-                             case 5:
-                             {
-                                 TCGv_ptr fpstatus = get_fpstatus_ptr(1);
 --
 .20.1

-[PULL 14/23] target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
+[PULL 04/26] target/arm: Use neon_element_offset in vfp_reg_offset
-Convert the Neon 2-reg-scalar long multiplies to decodetree.
+From: Richard Henderson <richard.henderson@linaro.org>
 These are the last instructions in the group.
+This seems a bit more readable than using offsetof CPU_DoubleU.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  18 ++++
+ target/arm/translate.c | 13 ++++---------
- target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
+file changed, 4 insertions(+), 9 deletions(-)
  target/arm/translate.c          | 182 ++------------------------------
 files changed, 187 insertions(+), 176 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
-                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+    # For the 'long' ops the Q bit is part of insn decode
-+    @2scalar_q0  .... ... . . . size:2 .... .... .... . . . . .... \
-+                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
-     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
-     VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
-+    VMLAL_S_2sc  1111 001 0 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
-+    VMLAL_U_2sc  1111 001 1 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
-+
-+    VQDMLAL_2sc  1111 001 0 1 . .. .... .... 0011 . 1 . 0 .... @2scalar_q0
-+
-     VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
-     VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
-+    VMLSL_S_2sc  1111 001 0 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
-+    VMLSL_U_2sc  1111 001 1 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
-+
-+    VQDMLSL_2sc  1111 001 0 1 . .. .... .... 0111 . 1 . 0 .... @2scalar_q0
-+
-     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
-     VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
-+    VMULL_S_2sc  1111 001 0 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
-+    VMULL_U_2sc  1111 001 1 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
-+
-+    VQDMULL_2sc  1111 001 0 1 . .. .... .... 1011 . 1 . 0 .... @2scalar_q0
-+
-     VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
-     VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
-     };
-     return do_vqrdmlah_2sc(s, a, opfn[a->size]);
- }
-+
-+static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
-+                            NeonGenTwoOpWidenFn *opfn,
-+                            NeonGenTwo64OpFn *accfn)
-+{
-+    /*
-+     * Two registers and a scalar, long operations: perform an
-+     * operation on the input elements and the scalar which produces
-+     * a double-width result, and then possibly perform an accumulation
-+     * operation of that result into the destination.
-+     */
-+    TCGv_i32 scalar, rn;
-+    TCGv_i64 rn0_64, rn1_64;
-+
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!opfn) {
-+        /* Bad size (including size == 3, which is a different insn group) */
-+        return false;
-+    }
-+
-+    if (a->vd & 1) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    scalar = neon_get_scalar(a->size, a->vm);
-+
-+    /* Load all inputs before writing any outputs, in case of overlap */
-+    rn = neon_load_reg(a->vn, 0);
-+    rn0_64 = tcg_temp_new_i64();
-+    opfn(rn0_64, rn, scalar);
-+    tcg_temp_free_i32(rn);
-+
-+    rn = neon_load_reg(a->vn, 1);
-+    rn1_64 = tcg_temp_new_i64();
-+    opfn(rn1_64, rn, scalar);
-+    tcg_temp_free_i32(rn);
-+    tcg_temp_free_i32(scalar);
-+
-+    if (accfn) {
-+        TCGv_i64 t64 = tcg_temp_new_i64();
-+        neon_load_reg64(t64, a->vd);
-+        accfn(t64, t64, rn0_64);
-+        neon_store_reg64(t64, a->vd);
-+        neon_load_reg64(t64, a->vd + 1);
-+        accfn(t64, t64, rn1_64);
-+        neon_store_reg64(t64, a->vd + 1);
-+        tcg_temp_free_i64(t64);
-+    } else {
-+        neon_store_reg64(rn0_64, a->vd);
-+        neon_store_reg64(rn1_64, a->vd + 1);
-+    }
-+    tcg_temp_free_i64(rn0_64);
-+    tcg_temp_free_i64(rn1_64);
-+    return true;
-+}
-+
-+static bool trans_VMULL_S_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
-+        gen_helper_neon_mull_s16,
-+        gen_mull_s32,
-+        NULL,
-+    };
-+
-+    return do_2scalar_long(s, a, opfn[a->size], NULL);
-+}
-+
-+static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
-+        gen_helper_neon_mull_u16,
-+        gen_mull_u32,
-+        NULL,
-+    };
-+
-+    return do_2scalar_long(s, a, opfn[a->size], NULL);
-+}
-+
-+#define DO_VMLAL_2SC(INSN, MULL, ACC)                                   \
-+    static bool trans_##INSN##_2sc(DisasContext *s, arg_2scalar *a)     \
-+    {                                                                   \
-+        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
-+            NULL,                                                       \
-+            gen_helper_neon_##MULL##16,                                 \
-+            gen_##MULL##32,                                             \
-+            NULL,                                                       \
-+        };                                                              \
-+        static NeonGenTwo64OpFn * const accfn[] = {                     \
-+            NULL,                                                       \
-+            gen_helper_neon_##ACC##l_u32,                               \
-+            tcg_gen_##ACC##_i64,                                        \
-+            NULL,                                                       \
-+        };                                                              \
-+        return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);    \
-+    }
-+
-+DO_VMLAL_2SC(VMLAL_S, mull_s, add)
-+DO_VMLAL_2SC(VMLAL_U, mull_u, add)
-+DO_VMLAL_2SC(VMLSL_S, mull_s, sub)
-+DO_VMLAL_2SC(VMLSL_U, mull_u, sub)
-+
-+static bool trans_VQDMULL_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
-+        gen_VQDMULL_16,
-+        gen_VQDMULL_32,
-+        NULL,
-+    };
-+
-+    return do_2scalar_long(s, a, opfn[a->size], NULL);
-+}
-+
-+static bool trans_VQDMLAL_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
-+        gen_VQDMULL_16,
-+        gen_VQDMULL_32,
-+        NULL,
-+    };
-+    static NeonGenTwo64OpFn * const accfn[] = {
-+        NULL,
-+        gen_VQDMLAL_acc_16,
-+        gen_VQDMLAL_acc_32,
-+        NULL,
-+    };
-+
-+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
-+}
-+
-+static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
-+        gen_VQDMULL_16,
-+        gen_VQDMULL_32,
-+        NULL,
-+    };
-+    static NeonGenTwo64OpFn * const accfn[] = {
-+        NULL,
-+        gen_VQDMLSL_acc_16,
-+        gen_VQDMLSL_acc_32,
-+        NULL,
-+    };
-+
-+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
-+}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
+@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
-     tcg_gen_ext16s_i32(dest, var);
+     return neon_full_reg_offset(reg) + ofs;
  }
--/* 32x32->64 multiply.  Marks inputs as dead.  */
+-static inline long vfp_reg_offset(bool dp, unsigned reg)
--static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
++/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
--{
++static long vfp_reg_offset(bool dp, unsigned reg)
 -    TCGv_i32 lo = tcg_temp_new_i32();
 -    TCGv_i32 hi = tcg_temp_new_i32();
 -    TCGv_i64 ret;
 -
 -    tcg_gen_mulu2_i32(lo, hi, a, b);
 -    tcg_temp_free_i32(a);
 -    tcg_temp_free_i32(b);
 -
 -    ret = tcg_temp_new_i64();
 -    tcg_gen_concat_i32_i64(ret, lo, hi);
 -    tcg_temp_free_i32(lo);
 -    tcg_temp_free_i32(hi);
 -
 -    return ret;
 -}
 -
 -static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
 -{
 -    TCGv_i32 lo = tcg_temp_new_i32();
 -    TCGv_i32 hi = tcg_temp_new_i32();
 -    TCGv_i64 ret;
 -
 -    tcg_gen_muls2_i32(lo, hi, a, b);
 -    tcg_temp_free_i32(a);
 -    tcg_temp_free_i32(b);
 -
 -    ret = tcg_temp_new_i64();
 -    tcg_gen_concat_i32_i64(ret, lo, hi);
 -    tcg_temp_free_i32(lo);
 -    tcg_temp_free_i32(hi);
 -
 -    return ret;
 -}
 -
  /* Swap low and high halfwords.  */
  static void gen_swap_half(TCGv_i32 var)
  {
-@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
+     if (dp) {
 -        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
 +        return neon_element_offset(reg, 0, MO_64);
      } else {
 -        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
 -        if (reg & 1) {
 -            ofs += offsetof(CPU_DoubleU, l.upper);
 -        } else {
 -            ofs += offsetof(CPU_DoubleU, l.lower);
 -        }
 -        return ofs;
 +        return neon_element_offset(reg >> 1, reg & 1, MO_32);
      }
  }
--static inline void gen_neon_negl(TCGv_i64 var, int size)
--{
--    switch (size) {
--    case 0: gen_helper_neon_negl_u16(var, var); break;
--    case 1: gen_helper_neon_negl_u32(var, var); break;
--    case 2:
--        tcg_gen_neg_i64(var, var);
--        break;
--    default: abort();
--    }
--}
--
--static inline void gen_neon_addl_saturate(TCGv_i64 op0, TCGv_i64 op1, int size)
--{
--    switch (size) {
--    case 1: gen_helper_neon_addl_saturate_s32(op0, cpu_env, op0, op1); break;
--    case 2: gen_helper_neon_addl_saturate_s64(op0, cpu_env, op0, op1); break;
--    default: abort();
--    }
--}
--
--static inline void gen_neon_mull(TCGv_i64 dest, TCGv_i32 a, TCGv_i32 b,
--                                 int size, int u)
--{
--    TCGv_i64 tmp;
--
--    switch ((size << 1) | u) {
--    case 0: gen_helper_neon_mull_s8(dest, a, b); break;
--    case 1: gen_helper_neon_mull_u8(dest, a, b); break;
--    case 2: gen_helper_neon_mull_s16(dest, a, b); break;
--    case 3: gen_helper_neon_mull_u16(dest, a, b); break;
--    case 4:
--        tmp = gen_muls_i64_i32(a, b);
--        tcg_gen_mov_i64(dest, tmp);
--        tcg_temp_free_i64(tmp);
--        break;
--    case 5:
--        tmp = gen_mulu_i64_i32(a, b);
--        tcg_gen_mov_i64(dest, tmp);
--        tcg_temp_free_i64(tmp);
--        break;
--    default: abort();
--    }
--
--    /* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
--       Don't forget to clean them now.  */
--    if (size < 2) {
--        tcg_temp_free_i32(a);
--        tcg_temp_free_i32(b);
--    }
--}
--
- static void gen_neon_narrow_op(int op, int u, int size,
-                                TCGv_i32 dest, TCGv_i64 src)
- {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-     int u;
-     int vec_size;
-     uint32_t imm;
--    TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
-+    TCGv_i32 tmp, tmp2, tmp3, tmp5;
-     TCGv_ptr ptr1;
-     TCGv_i64 tmp64;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-         return 1;
-     } else { /* (insn & 0x00800010 == 0x00800000) */
-         if (size != 3) {
--            op = (insn >> 8) & 0xf;
--            if ((insn & (1 << 6)) == 0) {
--                /* Three registers of different lengths: handled by decodetree */
--                return 1;
--            } else {
--                /* Two registers and a scalar. NB that for ops of this form
--                 * the ARM ARM labels bit 24 as Q, but it is in our variable
--                 * 'u', not 'q'.
--                 */
--                if (size == 0) {
--                    return 1;
--                }
--                switch (op) {
--                case 0: /* Integer VMLA scalar */
--                case 4: /* Integer VMLS scalar */
--                case 8: /* Integer VMUL scalar */
--                case 1: /* Float VMLA scalar */
--                case 5: /* Floating point VMLS scalar */
--                case 9: /* Floating point VMUL scalar */
--                case 12: /* VQDMULH scalar */
--                case 13: /* VQRDMULH scalar */
--                case 14: /* VQRDMLAH scalar */
--                case 15: /* VQRDMLSH scalar */
--                    return 1; /* handled by decodetree */
--
--                case 3: /* VQDMLAL scalar */
--                case 7: /* VQDMLSL scalar */
--                case 11: /* VQDMULL scalar */
--                    if (u == 1) {
--                        return 1;
--                    }
--                    /* fall through */
--                case 2: /* VMLAL sclar */
--                case 6: /* VMLSL scalar */
--                case 10: /* VMULL scalar */
--                    if (rd & 1) {
--                        return 1;
--                    }
--                    tmp2 = neon_get_scalar(size, rm);
--                    /* We need a copy of tmp2 because gen_neon_mull
--                     * deletes it during pass 0.  */
--                    tmp4 = tcg_temp_new_i32();
--                    tcg_gen_mov_i32(tmp4, tmp2);
--                    tmp3 = neon_load_reg(rn, 1);
--
--                    for (pass = 0; pass < 2; pass++) {
--                        if (pass == 0) {
--                            tmp = neon_load_reg(rn, 0);
--                        } else {
--                            tmp = tmp3;
--                            tmp2 = tmp4;
--                        }
--                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
--                        if (op != 11) {
--                            neon_load_reg64(cpu_V1, rd + pass);
--                        }
--                        switch (op) {
--                        case 6:
--                            gen_neon_negl(cpu_V0, size);
--                            /* Fall through */
--                        case 2:
--                            gen_neon_addl(size);
--                            break;
--                        case 3: case 7:
--                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
--                            if (op == 7) {
--                                gen_neon_negl(cpu_V0, size);
--                            }
--                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
--                            break;
--                        case 10:
--                            /* no-op */
--                            break;
--                        case 11:
--                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
--                            break;
--                        default:
--                            abort();
--                        }
--                        neon_store_reg64(cpu_V0, rd + pass);
--                    }
--                    break;
--                default:
--                    g_assert_not_reached();
--                }
--            }
-+            /*
-+             * Three registers of different lengths, or two registers and
-+             * a scalar: handled by decodetree
-+             */
-+            return 1;
-         } else { /* size == 3 */
-             if (!u) {
-                 /* Extract.  */
 --
 .20.1

-[PULL 15/23] target/arm: Convert Neon VEXT to decodetree
+[PULL 05/26] target/arm: Add read/write_neon_element32
-Convert the Neon VEXT insn to decodetree. Rather than keeping the
+From: Richard Henderson <richard.henderson@linaro.org>
 old implementation which used fixed temporaries cpu_V0 and cpu_V1
 and did the extraction with by-hand shift and logic ops, we use
 the TCG extract2 insn.
-We don't need to special case 0 or 8 immediates any more as the
+Model these off the aa64 read/write_vec_element functions.
-optimizer is smart enough to throw away the dead code.
+Use it within translate-neon.c.inc.  The new functions do
 not allocate or free temps, so this rearranges the calling
 code a bit.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  8 +++-
+ target/arm/translate.c          |  26 ++++
- target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
+ target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
- target/arm/translate.c          | 58 +------------------------
+files changed, 183 insertions(+), 99 deletions(-)
 files changed, 85 insertions(+), 57 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
- # return false for size==3.
- ######################################################################
- {
--  # 0b11 subgroup will go here
-+  [
-+    ##################################################################
-+    # Miscellaneous size=0b11 insns
-+    ##################################################################
-+    VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
-+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+  ]
-   # Subgroup for size != 0b11
-   [
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
-     return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
- }
-+
-+static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
-+{
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if ((a->vn | a->vm | a->vd) & a->q) {
-+        return false;
-+    }
-+
-+    if (a->imm > 7 && !a->q) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    if (!a->q) {
-+        /* Extract 64 bits from <Vm:Vn> */
-+        TCGv_i64 left, right, dest;
-+
-+        left = tcg_temp_new_i64();
-+        right = tcg_temp_new_i64();
-+        dest = tcg_temp_new_i64();
-+
-+        neon_load_reg64(right, a->vn);
-+        neon_load_reg64(left, a->vm);
-+        tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
-+        neon_store_reg64(dest, a->vd);
-+
-+        tcg_temp_free_i64(left);
-+        tcg_temp_free_i64(right);
-+        tcg_temp_free_i64(dest);
-+    } else {
-+        /* Extract 128 bits from <Vm+1:Vm:Vn+1:Vn> */
-+        TCGv_i64 left, middle, right, destleft, destright;
-+
-+        left = tcg_temp_new_i64();
-+        middle = tcg_temp_new_i64();
-+        right = tcg_temp_new_i64();
-+        destleft = tcg_temp_new_i64();
-+        destright = tcg_temp_new_i64();
-+
-+        if (a->imm < 8) {
-+            neon_load_reg64(right, a->vn);
-+            neon_load_reg64(middle, a->vn + 1);
-+            tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
-+            neon_load_reg64(left, a->vm);
-+            tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
-+        } else {
-+            neon_load_reg64(right, a->vn + 1);
-+            neon_load_reg64(middle, a->vm);
-+            tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
-+            neon_load_reg64(left, a->vm + 1);
-+            tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
-+        }
-+
-+        neon_store_reg64(destright, a->vd);
-+        neon_store_reg64(destleft, a->vd + 1);
-+
-+        tcg_temp_free_i64(destright);
-+        tcg_temp_free_i64(destleft);
-+        tcg_temp_free_i64(right);
-+        tcg_temp_free_i64(middle);
-+        tcg_temp_free_i64(left);
-+    }
-+    return true;
-+}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
      tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
  }
 +static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
 +{
 +    long off = neon_element_offset(reg, ele, size);
 +
 +    switch (size) {
 +    case MO_32:
 +        tcg_gen_ld_i32(dest, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
 +static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
 +{
 +    long off = neon_element_offset(reg, ele, size);
 +
 +    switch (size) {
 +    case MO_32:
 +        tcg_gen_st_i32(src, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
  static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  {
      TCGv_ptr ret = tcg_temp_new_ptr();
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
       * early. Since Q is 0 there are always just two passes, so instead
       * of a complicated loop over each pass we just unroll.
       */
 -    tmp = neon_load_reg(a->vn, 0);
 -    tmp2 = neon_load_reg(a->vn, 1);
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    tmp3 = tcg_temp_new_i32();
 +
 +    read_neon_element32(tmp, a->vn, 0, MO_32);
 +    read_neon_element32(tmp2, a->vn, 1, MO_32);
      fn(tmp, tmp, tmp2);
 -    tcg_temp_free_i32(tmp2);
 -    tmp3 = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(tmp3, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      fn(tmp3, tmp3, tmp2);
 -    tcg_temp_free_i32(tmp2);
 -    neon_store_reg(a->vd, 0, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
       * 2-reg-and-shift operations, size < 3 case, where the
       * helper needs to be passed cpu_env.
       */
 -    TCGv_i32 constimm;
 +    TCGv_i32 constimm, tmp;
      int pass;
-     int u;
-     int vec_size;
--    uint32_t imm;
-     TCGv_i32 tmp, tmp2, tmp3, tmp5;
-     TCGv_ptr ptr1;
--    TCGv_i64 tmp64;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-         return 1;
+@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+      * by immediate using the variable shift operations.
-             return 1;
+      */
-         } else { /* size == 3 */
+     constimm = tcg_const_i32(dup_const(a->size, a->shift));
-             if (!u) {
++    tmp = tcg_temp_new_i32();
--                /* Extract.  */
--                imm = (insn >> 8) & 0xf;
+     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
--
+-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
--                if (imm > 7 && !q)
++        read_neon_element32(tmp, a->vm, pass, MO_32);
--                    return 1;
+         fn(tmp, cpu_env, tmp, constimm);
--
+-        neon_store_reg(a->vd, pass, tmp);
--                if (q && ((rd | rn | rm) & 1)) {
++        write_neon_element32(tmp, a->vd, pass, MO_32);
--                    return 1;
+     }
--                }
++    tcg_temp_free_i32(tmp);
--
+     tcg_temp_free_i32(constimm);
--                if (imm == 0) {
+     return true;
--                    neon_load_reg64(cpu_V0, rn);
+ }
--                    if (q) {
+@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
--                        neon_load_reg64(cpu_V1, rn + 1);
+     constimm = tcg_const_i64(-a->shift);
--                    }
+     rm1 = tcg_temp_new_i64();
--                } else if (imm == 8) {
+     rm2 = tcg_temp_new_i64();
--                    neon_load_reg64(cpu_V0, rn + 1);
++    rd = tcg_temp_new_i32();
--                    if (q) {
--                        neon_load_reg64(cpu_V1, rm);
+     /* Load both inputs first to avoid potential overwrite if rm == rd */
--                    }
+     neon_load_reg64(rm1, a->vm);
--                } else if (q) {
+     neon_load_reg64(rm2, a->vm + 1);
--                    tmp64 = tcg_temp_new_i64();
--                    if (imm < 8) {
+     shiftfn(rm1, rm1, constimm);
--                        neon_load_reg64(cpu_V0, rn);
+-    rd = tcg_temp_new_i32();
--                        neon_load_reg64(tmp64, rn + 1);
+     narrowfn(rd, cpu_env, rm1);
--                    } else {
+-    neon_store_reg(a->vd, 0, rd);
--                        neon_load_reg64(cpu_V0, rn + 1);
++    write_neon_element32(rd, a->vd, 0, MO_32);
--                        neon_load_reg64(tmp64, rm);
--                    }
+     shiftfn(rm2, rm2, constimm);
--                    tcg_gen_shri_i64(cpu_V0, cpu_V0, (imm & 7) * 8);
+-    rd = tcg_temp_new_i32();
--                    tcg_gen_shli_i64(cpu_V1, tmp64, 64 - ((imm & 7) * 8));
+     narrowfn(rd, cpu_env, rm2);
--                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
+-    neon_store_reg(a->vd, 1, rd);
--                    if (imm < 8) {
++    write_neon_element32(rd, a->vd, 1, MO_32);
--                        neon_load_reg64(cpu_V1, rm);
--                    } else {
++    tcg_temp_free_i32(rd);
--                        neon_load_reg64(cpu_V1, rm + 1);
+     tcg_temp_free_i64(rm1);
--                        imm -= 8;
+     tcg_temp_free_i64(rm2);
--                    }
+     tcg_temp_free_i64(constimm);
--                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
+@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
--                    tcg_gen_shri_i64(tmp64, tmp64, imm * 8);
+     constimm = tcg_const_i32(imm);
--                    tcg_gen_or_i64(cpu_V1, cpu_V1, tmp64);
--                    tcg_temp_free_i64(tmp64);
+     /* Load all inputs first to avoid potential overwrite */
--                } else {
+-    rm1 = neon_load_reg(a->vm, 0);
--                    /* BUGFIX */
+-    rm2 = neon_load_reg(a->vm, 1);
--                    neon_load_reg64(cpu_V0, rn);
+-    rm3 = neon_load_reg(a->vm + 1, 0);
--                    tcg_gen_shri_i64(cpu_V0, cpu_V0, imm * 8);
+-    rm4 = neon_load_reg(a->vm + 1, 1);
--                    neon_load_reg64(cpu_V1, rm);
++    rm1 = tcg_temp_new_i32();
--                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
++    rm2 = tcg_temp_new_i32();
--                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
++    rm3 = tcg_temp_new_i32();
--                }
++    rm4 = tcg_temp_new_i32();
--                neon_store_reg64(cpu_V0, rd);
++    read_neon_element32(rm1, a->vm, 0, MO_32);
--                if (q) {
++    read_neon_element32(rm2, a->vm, 1, MO_32);
--                    neon_store_reg64(cpu_V1, rd + 1);
++    read_neon_element32(rm3, a->vm, 2, MO_32);
--                }
++    read_neon_element32(rm4, a->vm, 3, MO_32);
-+                /* Extract: handled by decodetree */
+     rtmp = tcg_temp_new_i64();
-+                return 1;
-             } else if ((insn & (1 << 11)) == 0) {
+     shiftfn(rm1, rm1, constimm);
-                 /* Two register misc.  */
+@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
-                 op = ((insn >> 12) & 0x30) | ((insn >> 7) & 0xf);
+     tcg_temp_free_i32(rm2);
      narrowfn(rm1, cpu_env, rtmp);
 -    neon_store_reg(a->vd, 0, rm1);
 +    write_neon_element32(rm1, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(rm1);
      shiftfn(rm3, rm3, constimm);
      shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      narrowfn(rm3, cpu_env, rtmp);
      tcg_temp_free_i64(rtmp);
 -    neon_store_reg(a->vd, 1, rm3);
 +    write_neon_element32(rm3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rm3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          widen_mask = dup_const(a->size + 1, widen_mask);
      }
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      tmp = tcg_temp_new_i64();
      widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn0_64, a->vn);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 0);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn1_64, a->vn + 1);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 1);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      neon_store_reg64(rn0_64, a->vd);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      narrowfn(rd1, rn_64);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rn_64);
      tcg_temp_free_i64(rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i64();
      rd1 = tcg_temp_new_i64();
 -    rn = neon_load_reg(a->vn, 0);
 -    rm = neon_load_reg(a->vm, 0);
 +    rn = tcg_temp_new_i32();
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      opfn(rd0, rn, rm);
 -    tcg_temp_free_i32(rn);
 -    tcg_temp_free_i32(rm);
 -    rn = neon_load_reg(a->vn, 1);
 -    rm = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      opfn(rd1, rn, rm);
      tcg_temp_free_i32(rn);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
  static inline TCGv_i32 neon_get_scalar(int size, int reg)
  {
 -    TCGv_i32 tmp;
 -    if (size == 1) {
 -        tmp = neon_load_reg(reg & 7, reg >> 4);
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +    if (size == MO_16) {
 +        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
          if (reg & 8) {
              gen_neon_dup_high16(tmp);
          } else {
              gen_neon_dup_low16(tmp);
          }
      } else {
 -        tmp = neon_load_reg(reg & 15, reg >> 4);
 +        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
      }
      return tmp;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
       * perform an accumulation operation of that result into the
       * destination.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
 +        read_neon_element32(tmp, a->vn, pass, MO_32);
          opfn(tmp, tmp, scalar);
          if (accfn) {
 -            TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +            TCGv_i32 rd = tcg_temp_new_i32();
 +            read_neon_element32(rd, a->vd, pass, MO_32);
              accfn(tmp, rd, tmp);
              tcg_temp_free_i32(rd);
          }
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(scalar);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
       * performs a kind of fused op-then-accumulate using a helper
       * function that takes all of rd, rn and the scalar at once.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, rn, rd;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    rn = tcg_temp_new_i32();
 +    rd = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 rn = neon_load_reg(a->vn, pass);
 -        TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +        read_neon_element32(rn, a->vn, pass, MO_32);
 +        read_neon_element32(rd, a->vd, pass, MO_32);
          opfn(rd, cpu_env, rn, scalar, rd);
 -        tcg_temp_free_i32(rn);
 -        neon_store_reg(a->vd, pass, rd);
 +        write_neon_element32(rd, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i32(scalar);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      scalar = neon_get_scalar(a->size, a->vm);
      /* Load all inputs before writing any outputs, in case of overlap */
 -    rn = neon_load_reg(a->vn, 0);
 +    rn = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
      rn0_64 = tcg_temp_new_i64();
      opfn(rn0_64, rn, scalar);
 -    tcg_temp_free_i32(rn);
 -    rn = neon_load_reg(a->vn, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
      rn1_64 = tcg_temp_new_i64();
      opfn(rn1_64, rn, scalar);
      tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
          return false;
      }
      n <<= 3;
 +    tmp = tcg_temp_new_i32();
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 0);
 +        read_neon_element32(tmp, a->vd, 0, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp2 = neon_load_reg(a->vm, 0);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 0, MO_32);
      ptr1 = vfp_reg_ptr(true, a->vn);
      tmp4 = tcg_const_i32(n);
      gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
 -    tcg_temp_free_i32(tmp);
 +
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 1);
 +        read_neon_element32(tmp, a->vd, 1, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp3 = neon_load_reg(a->vm, 1);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 1, MO_32);
      gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(tmp4);
      tcg_temp_free_ptr(ptr1);
 -    neon_store_reg(a->vd, 0, tmp2);
 -    neon_store_reg(a->vd, 1, tmp3);
 -    tcg_temp_free_i32(tmp);
 +
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
  static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
  {
      int pass, half;
 +    TCGv_i32 tmp[2];
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
          return true;
      }
 -    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        TCGv_i32 tmp[2];
 +    tmp[0] = tcg_temp_new_i32();
 +    tmp[1] = tcg_temp_new_i32();
 +    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
          for (half = 0; half < 2; half++) {
 -            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
 +            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
              switch (a->size) {
              case 0:
                  tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                  g_assert_not_reached();
              }
          }
 -        neon_store_reg(a->vd, pass * 2, tmp[1]);
 -        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
 +        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
 +        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
      }
 +
 +    tcg_temp_free_i32(tmp[0]);
 +    tcg_temp_free_i32(tmp[1]);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          rm0_64 = tcg_temp_new_i64();
          rm1_64 = tcg_temp_new_i64();
          rd_64 = tcg_temp_new_i64();
 -        tmp = neon_load_reg(a->vm, pass * 2);
 +
 +        tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
          widenfn(rm0_64, tmp);
 -        tcg_temp_free_i32(tmp);
 -        tmp = neon_load_reg(a->vm, pass * 2 + 1);
 +        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
          widenfn(rm1_64, tmp);
          tcg_temp_free_i32(tmp);
 +
          opfn(rd_64, rm0_64, rm1_64);
          tcg_temp_free_i64(rm0_64);
          tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      narrowfn(rd0, cpu_env, rm);
      neon_load_reg64(rm, a->vm + 1);
      narrowfn(rd1, cpu_env, rm);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rm);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      }
      rd = tcg_temp_new_i64();
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
 -    tmp = neon_load_reg(a->vm, 0);
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
      tcg_gen_shli_i32(tmp2, tmp2, 16);
      tcg_gen_or_i32(tmp2, tmp2, tmp);
 -    tcg_temp_free_i32(tmp);
 -    tmp = neon_load_reg(a->vm, 2);
 +    read_neon_element32(tmp, a->vm, 2, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp3 = neon_load_reg(a->vm, 3);
 -    neon_store_reg(a->vd, 0, tmp2);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 3, MO_32);
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(tmp2);
      gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
      tcg_gen_shli_i32(tmp3, tmp3, 16);
      tcg_gen_or_i32(tmp3, tmp3, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
      tmp3 = tcg_temp_new_i32();
 -    tmp = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      tcg_gen_ext16u_i32(tmp3, tmp);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 0, tmp3);
 +    write_neon_element32(tmp3, a->vd, 0, MO_32);
      tcg_gen_shri_i32(tmp, tmp, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
 -    neon_store_reg(a->vd, 1, tmp);
 -    tmp3 = tcg_temp_new_i32();
 +    write_neon_element32(tmp, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp);
      tcg_gen_ext16u_i32(tmp3, tmp2);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 2, tmp3);
 +    write_neon_element32(tmp3, a->vd, 2, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_gen_shri_i32(tmp2, tmp2, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
 -    neon_store_reg(a->vd, 3, tmp2);
 +    write_neon_element32(tmp2, a->vd, 3, MO_32);
 +    tcg_temp_free_i32(tmp2);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
  static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
  {
 +    TCGv_i32 tmp;
      int pass;
      /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
          return true;
      }
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, tmp);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
          return true;
      }
 -    if (a->size == 2) {
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    if (a->size == MO_32) {
          for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass + 1);
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass + 1, tmp);
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
          }
      } else {
          for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass);
 -            if (a->size == 0) {
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass, MO_32);
 +            if (a->size == MO_8) {
                  gen_neon_trn_u8(tmp, tmp2);
              } else {
                  gen_neon_trn_u16(tmp, tmp2);
              }
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass, tmp);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass, MO_32);
          }
      }
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
      return true;
  }
 --
 .20.1

-[PULL 12/23] target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
+[PULL 06/26] target/arm: Expand read/write_neon_element32 to all MemOp
-Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
+From: Richard Henderson <richard.henderson@linaro.org>
-to decodetree.
+We can then use this to improve VMOV (scalar to gp) and
 VMOV (gp to scalar) so that we simply perform the memory
 operation that we wanted, rather than inserting or
 extracting from a 32-bit quantity.
 These were the last uses of neon_load/store_reg, so remove them.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  3 +++
+ target/arm/translate.c         | 50 +++++++++++++-----------
- target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
+ target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
- target/arm/translate.c          | 42 ++-------------------------------
+files changed, 37 insertions(+), 84 deletions(-)
-files changed, 34 insertions(+), 40 deletions(-)
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-dp.decode
 +++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
      VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
      VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
 +
 +    VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
 +    VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
    ]
  }
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
      return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
  }
 +
 +WRAP_ENV_FN(gen_VQDMULH_16, gen_helper_neon_qdmulh_s16)
 +WRAP_ENV_FN(gen_VQDMULH_32, gen_helper_neon_qdmulh_s32)
 +WRAP_ENV_FN(gen_VQRDMULH_16, gen_helper_neon_qrdmulh_s16)
 +WRAP_ENV_FN(gen_VQRDMULH_32, gen_helper_neon_qrdmulh_s32)
 +
 +static bool trans_VQDMULH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULH_16,
 +        gen_VQDMULH_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_VQRDMULH_16,
 +        gen_VQRDMULH_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
+@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
- #define CPU_V001 cpu_V0, cpu_V0, cpu_V1
+  * where 0 is the least significant end of the register.
+  */
--static TCGv_i32 neon_load_scratch(int scratch)
+-static long neon_element_offset(int reg, int element, MemOp size)
 +static long neon_element_offset(int reg, int element, MemOp memop)
  {
 -    int element_size = 1 << size;
 +    int element_size = 1 << (memop & MO_SIZE);
      int ofs = element * element_size;
  #ifdef HOST_WORDS_BIGENDIAN
      /*
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
      }
  }
 -static TCGv_i32 neon_load_reg(int reg, int pass)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
--    tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
+-    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
 -    return tmp;
 -}
 -
--static void neon_store_scratch(int scratch, TCGv_i32 var)
+-static void neon_store_reg(int reg, int pass, TCGv_i32 var)
 -{
--    tcg_gen_st_i32(var, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
+-    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
 -    tcg_temp_free_i32(var);
 -}
 -
- static int gen_neon_unzip(int rd, int rm, int size, int q)
+ static inline void neon_load_reg64(TCGv_i64 var, int reg)
  {
-     TCGv_ptr pd, pm;
+     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
-                 case 1: /* Float VMLA scalar */
+     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
-                 case 5: /* Floating point VMLS scalar */
+ }
-                 case 9: /* Floating point VMUL scalar */
--                    return 1; /* handled by decodetree */
+-static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
--
++static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
-                 case 12: /* VQDMULH scalar */
+ {
-                 case 13: /* VQRDMULH scalar */
+-    long off = neon_element_offset(reg, ele, size);
--                    if (u && ((rd | rn) & 1)) {
++    long off = neon_element_offset(reg, ele, memop);
--                        return 1;
--                    }
+-    switch (size) {
--                    tmp = neon_get_scalar(size, rm);
+-    case MO_32:
--                    neon_store_scratch(0, tmp);
++    switch (memop) {
--                    for (pass = 0; pass < (u ? 4 : 2); pass++) {
++    case MO_SB:
--                        tmp = neon_load_scratch(0);
++        tcg_gen_ld8s_i32(dest, cpu_env, off);
--                        tmp2 = neon_load_reg(rn, pass);
++        break;
--                        if (op == 12) {
++    case MO_UB:
--                            if (size == 1) {
++        tcg_gen_ld8u_i32(dest, cpu_env, off);
--                                gen_helper_neon_qdmulh_s16(tmp, cpu_env, tmp, tmp2);
++        break;
--                            } else {
++    case MO_SW:
--                                gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
++        tcg_gen_ld16s_i32(dest, cpu_env, off);
--                            }
++        break;
--                        } else {
++    case MO_UW:
--                            if (size == 1) {
++        tcg_gen_ld16u_i32(dest, cpu_env, off);
--                                gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
++        break;
--                            } else {
++    case MO_UL:
--                                gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
++    case MO_SL:
--                            }
+         tcg_gen_ld_i32(dest, cpu_env, off);
--                        }
+         break;
--                        tcg_temp_free_i32(tmp2);
+     default:
--                        neon_store_reg(rd, pass, tmp);
+@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
--                    }
+     }
--                    break;
+ }
-+                    return 1; /* handled by decodetree */
-+
+-static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
-                 case 3: /* VQDMLAL scalar */
++static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
-                 case 7: /* VQDMLSL scalar */
+ {
-                 case 11: /* VQDMULL scalar */
+-    long off = neon_element_offset(reg, ele, size);
 +    long off = neon_element_offset(reg, ele, memop);
 -    switch (size) {
 +    switch (memop) {
 +    case MO_8:
 +        tcg_gen_st8_i32(src, cpu_env, off);
 +        break;
 +    case MO_16:
 +        tcg_gen_st16_i32(src, cpu_env, off);
 +        break;
      case MO_32:
          tcg_gen_st_i32(src, cpu_env, off);
          break;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  {
      /* VMOV scalar to general purpose register */
      TCGv_i32 tmp;
 -    int pass;
 -    uint32_t offset;
 -    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == 2
 +    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == MO_32
          ? !dc_isar_feature(aa32_fpsp_v2, s)
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
 -    tmp = neon_load_reg(a->vn, pass);
 -    switch (a->size) {
 -    case 0:
 -        if (offset) {
 -            tcg_gen_shri_i32(tmp, tmp, offset);
 -        }
 -        if (a->u) {
 -            gen_uxtb(tmp);
 -        } else {
 -            gen_sxtb(tmp);
 -        }
 -        break;
 -    case 1:
 -        if (a->u) {
 -            if (offset) {
 -                tcg_gen_shri_i32(tmp, tmp, 16);
 -            } else {
 -                gen_uxth(tmp);
 -            }
 -        } else {
 -            if (offset) {
 -                tcg_gen_sari_i32(tmp, tmp, 16);
 -            } else {
 -                gen_sxth(tmp);
 -            }
 -        }
 -        break;
 -    case 2:
 -        break;
 -    }
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
      store_reg(s, a->rt, tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
  {
      /* VMOV general purpose register to scalar */
 -    TCGv_i32 tmp, tmp2;
 -    int pass;
 -    uint32_t offset;
 +    TCGv_i32 tmp;
 -    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == 2
 +    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == MO_32
          ? !dc_isar_feature(aa32_fpsp_v2, s)
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
      tmp = load_reg(s, a->rt);
 -    switch (a->size) {
 -    case 0:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 1:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 2:
 -        break;
 -    }
 -    neon_store_reg(a->vn, pass, tmp);
 +    write_neon_element32(tmp, a->vn, a->index, a->size);
 +    tcg_temp_free_i32(tmp);
      return true;
  }
 --
 .20.1

-[PULL 17/23] target/arm: Convert Neon VDUP (scalar) to decodetree
+[PULL 07/26] target/arm: Rename neon_load_reg32 to vfp_load_reg32
-Convert the Neon VDUP (scalar) insn to decodetree.  (Note that we
+From: Richard Henderson <richard.henderson@linaro.org>
 can't call this just "VDUP" as we used that already in vfp.decode for
 the "VDUP (general purpose register" insn.)
+The only uses of this function are for loading VFP
+single-precision values, and nothing to do with NEON.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  7 +++++++
+ target/arm/translate.c         |   4 +-
- target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
+ target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
- target/arm/translate.c          | 25 +------------------------
+files changed, 94 insertions(+), 94 deletions(-)
 files changed, 34 insertions(+), 24 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
-                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+    VDUP_scalar  1111 001 1 1 . 11 index:3 1 .... 11 000 q:1 . 0 .... \
-+                 vm=%vm_dp vd=%vd_dp size=0
-+    VDUP_scalar  1111 001 1 1 . 11 index:2 10 .... 11 000 q:1 . 0 .... \
-+                 vm=%vm_dp vd=%vd_dp size=1
-+    VDUP_scalar  1111 001 1 1 . 11 index:1 100 .... 11 000 q:1 . 0 .... \
-+                 vm=%vm_dp vd=%vd_dp size=2
-   ]
-   # Subgroup for size != 0b11
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
-     tcg_temp_free_i32(tmp);
-     return true;
- }
-+
-+static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
-+{
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (a->vd & a->q) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
-+                         neon_element_offset(a->vm, a->index, a->size),
-+                         a->q ? 16 : 8, a->q ? 16 : 8);
-+    return true;
-+}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
-                     }
+     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
-                     break;
+ }
-                 }
--            } else if ((insn & (1 << 10)) == 0) {
+-static inline void neon_load_reg32(TCGv_i32 var, int reg)
--                /* VTBL, VTBX: handled by decodetree */
++static inline void vfp_load_reg32(TCGv_i32 var, int reg)
--                return 1;
+ {
--            } else if ((insn & 0x380) == 0) {
+     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
--                /* VDUP */
+ }
--                int element;
--                MemOp size;
+-static inline void neon_store_reg32(TCGv_i32 var, int reg)
--
++static inline void vfp_store_reg32(TCGv_i32 var, int reg)
--                if ((insn & (7 << 16)) == 0 || (q && (rd & 1))) {
+ {
--                    return 1;
+     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
--                }
+ }
--                if (insn & (1 << 16)) {
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
--                    size = MO_8;
+index XXXXXXX..XXXXXXX 100644
--                    element = (insn >> 17) & 7;
+--- a/target/arm/translate-vfp.c.inc
--                } else if (insn & (1 << 17)) {
++++ b/target/arm/translate-vfp.c.inc
--                    size = MO_16;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
--                    element = (insn >> 18) & 3;
+         frn = tcg_temp_new_i32();
--                } else {
+         frm = tcg_temp_new_i32();
--                    size = MO_32;
+         dest = tcg_temp_new_i32();
--                    element = (insn >> 19) & 1;
+-        neon_load_reg32(frn, rn);
--                }
+-        neon_load_reg32(frm, rm);
--                tcg_gen_gvec_dup_mem(size, neon_reg_offset(rd, 0),
++        vfp_load_reg32(frn, rn);
--                                     neon_element_offset(rm, element, size),
++        vfp_load_reg32(frm, rm);
--                                     q ? 16 : 8, q ? 16 : 8);
+         switch (a->cc) {
-             } else {
+         case 0: /* eq: Z */
-+                /* VTBL, VTBX, VDUP: handled by decodetree */
+             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
-                 return 1;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          if (sz == 1) {
              tcg_gen_andi_i32(dest, dest, 0xffff);
          }
 -        neon_store_reg32(dest, rd);
 +        vfp_store_reg32(dest, rd);
          tcg_temp_free_i32(frn);
          tcg_temp_free_i32(frm);
          tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i32 tcg_res;
          tcg_op = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_op, rm);
 +        vfp_load_reg32(tcg_op, rm);
          if (sz == 1) {
              gen_helper_rinth(tcg_res, tcg_op, fpst);
          } else {
              gen_helper_rints(tcg_res, tcg_op, fpst);
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_op);
          tcg_temp_free_i32(tcg_res);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
              gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
          }
          tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
 -        neon_store_reg32(tcg_tmp, rd);
 +        vfp_store_reg32(tcg_tmp, rd);
          tcg_temp_free_i32(tcg_tmp);
          tcg_temp_free_i64(tcg_res);
          tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          TCGv_i32 tcg_single, tcg_res;
          tcg_single = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_single, rm);
 +        vfp_load_reg32(tcg_single, rm);
          if (sz == 1) {
              if (is_signed) {
                  gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                  gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
              }
          }
+-        neon_store_reg32(tcg_res, rd);
++        vfp_store_reg32(tcg_res, rd);
+         tcg_temp_free_i32(tcg_res);
+         tcg_temp_free_i32(tcg_single);
+     }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
+     if (a->l) {
+         /* VFP to general purpose register */
+         tmp = tcg_temp_new_i32();
+-        neon_load_reg32(tmp, a->vn);
++        vfp_load_reg32(tmp, a->vn);
+         tcg_gen_andi_i32(tmp, tmp, 0xffff);
+         store_reg(s, a->rt, tmp);
+     } else {
+         /* general purpose register to VFP */
+         tmp = load_reg(s, a->rt);
+         tcg_gen_andi_i32(tmp, tmp, 0xffff);
+-        neon_store_reg32(tmp, a->vn);
++        vfp_store_reg32(tmp, a->vn);
+         tcg_temp_free_i32(tmp);
+     }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
+     if (a->l) {
+         /* VFP to general purpose register */
+         tmp = tcg_temp_new_i32();
+-        neon_load_reg32(tmp, a->vn);
++        vfp_load_reg32(tmp, a->vn);
+         if (a->rt == 15) {
+             /* Set the 4 flag bits in the CPSR.  */
+             gen_set_nzcv(tmp);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
+     } else {
+         /* general purpose register to VFP */
+         tmp = load_reg(s, a->rt);
+-        neon_store_reg32(tmp, a->vn);
++        vfp_store_reg32(tmp, a->vn);
+         tcg_temp_free_i32(tmp);
+     }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
+     if (a->op) {
+         /* fpreg to gpreg */
+         tmp = tcg_temp_new_i32();
+-        neon_load_reg32(tmp, a->vm);
++        vfp_load_reg32(tmp, a->vm);
+         store_reg(s, a->rt, tmp);
+         tmp = tcg_temp_new_i32();
+-        neon_load_reg32(tmp, a->vm + 1);
++        vfp_load_reg32(tmp, a->vm + 1);
+         store_reg(s, a->rt2, tmp);
+     } else {
+         /* gpreg to fpreg */
+         tmp = load_reg(s, a->rt);
+-        neon_store_reg32(tmp, a->vm);
++        vfp_store_reg32(tmp, a->vm);
+         tcg_temp_free_i32(tmp);
+         tmp = load_reg(s, a->rt2);
+-        neon_store_reg32(tmp, a->vm + 1);
++        vfp_store_reg32(tmp, a->vm + 1);
+         tcg_temp_free_i32(tmp);
+     }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
+     if (a->op) {
+         /* fpreg to gpreg */
+         tmp = tcg_temp_new_i32();
+-        neon_load_reg32(tmp, a->vm * 2);
++        vfp_load_reg32(tmp, a->vm * 2);
+         store_reg(s, a->rt, tmp);
+         tmp = tcg_temp_new_i32();
+-        neon_load_reg32(tmp, a->vm * 2 + 1);
++        vfp_load_reg32(tmp, a->vm * 2 + 1);
+         store_reg(s, a->rt2, tmp);
+     } else {
+         /* gpreg to fpreg */
+         tmp = load_reg(s, a->rt);
+-        neon_store_reg32(tmp, a->vm * 2);
++        vfp_store_reg32(tmp, a->vm * 2);
+         tcg_temp_free_i32(tmp);
+         tmp = load_reg(s, a->rt2);
+-        neon_store_reg32(tmp, a->vm * 2 + 1);
++        vfp_store_reg32(tmp, a->vm * 2 + 1);
+         tcg_temp_free_i32(tmp);
+     }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
+     tmp = tcg_temp_new_i32();
+     if (a->l) {
+         gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
+-        neon_store_reg32(tmp, a->vd);
++        vfp_store_reg32(tmp, a->vd);
+     } else {
+-        neon_load_reg32(tmp, a->vd);
++        vfp_load_reg32(tmp, a->vd);
+         gen_aa32_st16(s, tmp, addr, get_mem_index(s));
+     }
+     tcg_temp_free_i32(tmp);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
+     tmp = tcg_temp_new_i32();
+     if (a->l) {
+         gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
+-        neon_store_reg32(tmp, a->vd);
++        vfp_store_reg32(tmp, a->vd);
+     } else {
+-        neon_load_reg32(tmp, a->vd);
++        vfp_load_reg32(tmp, a->vd);
+         gen_aa32_st32(s, tmp, addr, get_mem_index(s));
+     }
+     tcg_temp_free_i32(tmp);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
+         if (a->l) {
+             /* load */
+             gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
+-            neon_store_reg32(tmp, a->vd + i);
++            vfp_store_reg32(tmp, a->vd + i);
+         } else {
+             /* store */
+-            neon_load_reg32(tmp, a->vd + i);
++            vfp_load_reg32(tmp, a->vd + i);
+             gen_aa32_st32(s, tmp, addr, get_mem_index(s));
+         }
+         tcg_gen_addi_i32(addr, addr, offset);
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
+     fd = tcg_temp_new_i32();
+     fpst = fpstatus_ptr(FPST_FPCR);
+-    neon_load_reg32(f0, vn);
+-    neon_load_reg32(f1, vm);
++    vfp_load_reg32(f0, vn);
++    vfp_load_reg32(f1, vm);
+     for (;;) {
+         if (reads_vd) {
+-            neon_load_reg32(fd, vd);
++            vfp_load_reg32(fd, vd);
+         }
+         fn(fd, f0, f1, fpst);
+-        neon_store_reg32(fd, vd);
++        vfp_store_reg32(fd, vd);
+         if (veclen == 0) {
+             break;
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
+         veclen--;
+         vd = vfp_advance_sreg(vd, delta_d);
+         vn = vfp_advance_sreg(vn, delta_d);
+-        neon_load_reg32(f0, vn);
++        vfp_load_reg32(f0, vn);
+         if (delta_m) {
+             vm = vfp_advance_sreg(vm, delta_m);
+-            neon_load_reg32(f1, vm);
++            vfp_load_reg32(f1, vm);
+         }
+     }
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
+     fd = tcg_temp_new_i32();
+     fpst = fpstatus_ptr(FPST_FPCR_F16);
+-    neon_load_reg32(f0, vn);
+-    neon_load_reg32(f1, vm);
++    vfp_load_reg32(f0, vn);
++    vfp_load_reg32(f1, vm);
+     if (reads_vd) {
+-        neon_load_reg32(fd, vd);
++        vfp_load_reg32(fd, vd);
+     }
+     fn(fd, f0, f1, fpst);
+-    neon_store_reg32(fd, vd);
++    vfp_store_reg32(fd, vd);
+     tcg_temp_free_i32(f0);
+     tcg_temp_free_i32(f1);
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+     f0 = tcg_temp_new_i32();
+     fd = tcg_temp_new_i32();
+-    neon_load_reg32(f0, vm);
++    vfp_load_reg32(f0, vm);
+     for (;;) {
+         fn(fd, f0);
+-        neon_store_reg32(fd, vd);
++        vfp_store_reg32(fd, vd);
+         if (veclen == 0) {
+             break;
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+             /* single source one-many */
+             while (veclen--) {
+                 vd = vfp_advance_sreg(vd, delta_d);
+-                neon_store_reg32(fd, vd);
++                vfp_store_reg32(fd, vd);
+             }
+             break;
+         }
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+         veclen--;
+         vd = vfp_advance_sreg(vd, delta_d);
+         vm = vfp_advance_sreg(vm, delta_m);
+-        neon_load_reg32(f0, vm);
++        vfp_load_reg32(f0, vm);
+     }
+     tcg_temp_free_i32(f0);
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+     }
+     f0 = tcg_temp_new_i32();
+-    neon_load_reg32(f0, vm);
++    vfp_load_reg32(f0, vm);
+     fn(f0, f0);
+-    neon_store_reg32(f0, vd);
++    vfp_store_reg32(f0, vd);
+     tcg_temp_free_i32(f0);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
+     vm = tcg_temp_new_i32();
+     vd = tcg_temp_new_i32();
+-    neon_load_reg32(vn, a->vn);
+-    neon_load_reg32(vm, a->vm);
++    vfp_load_reg32(vn, a->vn);
++    vfp_load_reg32(vm, a->vm);
+     if (neg_n) {
+         /* VFNMS, VFMS */
+         gen_helper_vfp_negh(vn, vn);
+     }
+-    neon_load_reg32(vd, a->vd);
++    vfp_load_reg32(vd, a->vd);
+     if (neg_d) {
+         /* VFNMA, VFNMS */
+         gen_helper_vfp_negh(vd, vd);
+     }
+     fpst = fpstatus_ptr(FPST_FPCR_F16);
+     gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
+-    neon_store_reg32(vd, a->vd);
++    vfp_store_reg32(vd, a->vd);
+     tcg_temp_free_ptr(fpst);
+     tcg_temp_free_i32(vn);
+@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
+     vm = tcg_temp_new_i32();
+     vd = tcg_temp_new_i32();
+-    neon_load_reg32(vn, a->vn);
+-    neon_load_reg32(vm, a->vm);
++    vfp_load_reg32(vn, a->vn);
++    vfp_load_reg32(vm, a->vm);
+     if (neg_n) {
+         /* VFNMS, VFMS */
+         gen_helper_vfp_negs(vn, vn);
+     }
+-    neon_load_reg32(vd, a->vd);
++    vfp_load_reg32(vd, a->vd);
+     if (neg_d) {
+         /* VFNMA, VFNMS */
+         gen_helper_vfp_negs(vd, vd);
+     }
+     fpst = fpstatus_ptr(FPST_FPCR);
+     gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
+-    neon_store_reg32(vd, a->vd);
++    vfp_store_reg32(vd, a->vd);
+     tcg_temp_free_ptr(fpst);
+     tcg_temp_free_i32(vn);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
+     }
+     fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
+-    neon_store_reg32(fd, a->vd);
++    vfp_store_reg32(fd, a->vd);
+     tcg_temp_free_i32(fd);
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
+     fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
+     for (;;) {
+-        neon_store_reg32(fd, vd);
++        vfp_store_reg32(fd, vd);
+         if (veclen == 0) {
+             break;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
+     vd = tcg_temp_new_i32();
+     vm = tcg_temp_new_i32();
+-    neon_load_reg32(vd, a->vd);
++    vfp_load_reg32(vd, a->vd);
+     if (a->z) {
+         tcg_gen_movi_i32(vm, 0);
+     } else {
+-        neon_load_reg32(vm, a->vm);
++        vfp_load_reg32(vm, a->vm);
+     }
+     if (a->e) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
+     vd = tcg_temp_new_i32();
+     vm = tcg_temp_new_i32();
+-    neon_load_reg32(vd, a->vd);
++    vfp_load_reg32(vd, a->vd);
+     if (a->z) {
+         tcg_gen_movi_i32(vm, 0);
+     } else {
+-        neon_load_reg32(vm, a->vm);
++        vfp_load_reg32(vm, a->vm);
+     }
+     if (a->e) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
+     /* The T bit tells us if we want the low or high 16 bits of Vm */
+     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
+     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
+-    neon_store_reg32(tmp, a->vd);
++    vfp_store_reg32(tmp, a->vd);
+     tcg_temp_free_i32(ahp_mode);
+     tcg_temp_free_ptr(fpst);
+     tcg_temp_free_i32(tmp);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
+     ahp_mode = get_ahp_flag();
+     tmp = tcg_temp_new_i32();
+-    neon_load_reg32(tmp, a->vm);
++    vfp_load_reg32(tmp, a->vm);
+     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
+     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+     tcg_temp_free_i32(ahp_mode);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
+     }
+     tmp = tcg_temp_new_i32();
+-    neon_load_reg32(tmp, a->vm);
++    vfp_load_reg32(tmp, a->vm);
+     fpst = fpstatus_ptr(FPST_FPCR_F16);
+     gen_helper_rinth(tmp, tmp, fpst);
+-    neon_store_reg32(tmp, a->vd);
++    vfp_store_reg32(tmp, a->vd);
+     tcg_temp_free_ptr(fpst);
+     tcg_temp_free_i32(tmp);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
+     }
+     tmp = tcg_temp_new_i32();
+-    neon_load_reg32(tmp, a->vm);
++    vfp_load_reg32(tmp, a->vm);
+     fpst = fpstatus_ptr(FPST_FPCR);
+     gen_helper_rints(tmp, tmp, fpst);
+-    neon_store_reg32(tmp, a->vd);
++    vfp_store_reg32(tmp, a->vd);
+     tcg_temp_free_ptr(fpst);
+     tcg_temp_free_i32(tmp);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
+     }
+     tmp = tcg_temp_new_i32();
+-    neon_load_reg32(tmp, a->vm);
++    vfp_load_reg32(tmp, a->vm);
+     fpst = fpstatus_ptr(FPST_FPCR_F16);
+     tcg_rmode = tcg_const_i32(float_round_to_zero);
+     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+     gen_helper_rinth(tmp, tmp, fpst);
+     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+-    neon_store_reg32(tmp, a->vd);
++    vfp_store_reg32(tmp, a->vd);
+     tcg_temp_free_ptr(fpst);
+     tcg_temp_free_i32(tcg_rmode);
+     tcg_temp_free_i32(tmp);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
+     }
+     tmp = tcg_temp_new_i32();
+-    neon_load_reg32(tmp, a->vm);
++    vfp_load_reg32(tmp, a->vm);
+     fpst = fpstatus_ptr(FPST_FPCR);
+     tcg_rmode = tcg_const_i32(float_round_to_zero);
+     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+     gen_helper_rints(tmp, tmp, fpst);
+     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
+-    neon_store_reg32(tmp, a->vd);
++    vfp_store_reg32(tmp, a->vd);
+     tcg_temp_free_ptr(fpst);
+     tcg_temp_free_i32(tcg_rmode);
+     tcg_temp_free_i32(tmp);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
+     }
+     tmp = tcg_temp_new_i32();
+-    neon_load_reg32(tmp, a->vm);
++    vfp_load_reg32(tmp, a->vm);
+     fpst = fpstatus_ptr(FPST_FPCR_F16);
+     gen_helper_rinth_exact(tmp, tmp, fpst);
+-    neon_store_reg32(tmp, a->vd);
++    vfp_store_reg32(tmp, a->vd);
+     tcg_temp_free_ptr(fpst);
+     tcg_temp_free_i32(tmp);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
+     }
+     tmp = tcg_temp_new_i32();
+-    neon_load_reg32(tmp, a->vm);
++    vfp_load_reg32(tmp, a->vm);
+     fpst = fpstatus_ptr(FPST_FPCR);
+     gen_helper_rints_exact(tmp, tmp, fpst);
+-    neon_store_reg32(tmp, a->vd);
++    vfp_store_reg32(tmp, a->vd);
+     tcg_temp_free_ptr(fpst);
+     tcg_temp_free_i32(tmp);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
+     vm = tcg_temp_new_i32();
+     vd = tcg_temp_new_i64();
+-    neon_load_reg32(vm, a->vm);
++    vfp_load_reg32(vm, a->vm);
+     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
+     neon_store_reg64(vd, a->vd);
+     tcg_temp_free_i32(vm);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
+     vm = tcg_temp_new_i64();
+     neon_load_reg64(vm, a->vm);
+     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
+-    neon_store_reg32(vd, a->vd);
++    vfp_store_reg32(vd, a->vd);
+     tcg_temp_free_i32(vd);
+     tcg_temp_free_i64(vm);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
+     }
+     vm = tcg_temp_new_i32();
+-    neon_load_reg32(vm, a->vm);
++    vfp_load_reg32(vm, a->vm);
+     fpst = fpstatus_ptr(FPST_FPCR_F16);
+     if (a->s) {
+         /* i32 -> f16 */
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
+         /* u32 -> f16 */
+         gen_helper_vfp_uitoh(vm, vm, fpst);
+     }
+-    neon_store_reg32(vm, a->vd);
++    vfp_store_reg32(vm, a->vd);
+     tcg_temp_free_i32(vm);
+     tcg_temp_free_ptr(fpst);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
+     }
+     vm = tcg_temp_new_i32();
+-    neon_load_reg32(vm, a->vm);
++    vfp_load_reg32(vm, a->vm);
+     fpst = fpstatus_ptr(FPST_FPCR);
+     if (a->s) {
+         /* i32 -> f32 */
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
+         /* u32 -> f32 */
+         gen_helper_vfp_uitos(vm, vm, fpst);
+     }
+-    neon_store_reg32(vm, a->vd);
++    vfp_store_reg32(vm, a->vd);
+     tcg_temp_free_i32(vm);
+     tcg_temp_free_ptr(fpst);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
+     vm = tcg_temp_new_i32();
+     vd = tcg_temp_new_i64();
+-    neon_load_reg32(vm, a->vm);
++    vfp_load_reg32(vm, a->vm);
+     fpst = fpstatus_ptr(FPST_FPCR);
+     if (a->s) {
+         /* i32 -> f64 */
+@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
+     vd = tcg_temp_new_i32();
+     neon_load_reg64(vm, a->vm);
+     gen_helper_vjcvt(vd, vm, cpu_env);
+-    neon_store_reg32(vd, a->vd);
++    vfp_store_reg32(vd, a->vd);
+     tcg_temp_free_i64(vm);
+     tcg_temp_free_i32(vd);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
+     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
+     vd = tcg_temp_new_i32();
+-    neon_load_reg32(vd, a->vd);
++    vfp_load_reg32(vd, a->vd);
+     fpst = fpstatus_ptr(FPST_FPCR_F16);
+     shift = tcg_const_i32(frac_bits);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
+         g_assert_not_reached();
+     }
+-    neon_store_reg32(vd, a->vd);
++    vfp_store_reg32(vd, a->vd);
+     tcg_temp_free_i32(vd);
+     tcg_temp_free_i32(shift);
+     tcg_temp_free_ptr(fpst);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
+     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
+     vd = tcg_temp_new_i32();
+-    neon_load_reg32(vd, a->vd);
++    vfp_load_reg32(vd, a->vd);
+     fpst = fpstatus_ptr(FPST_FPCR);
+     shift = tcg_const_i32(frac_bits);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
+         g_assert_not_reached();
+     }
+-    neon_store_reg32(vd, a->vd);
++    vfp_store_reg32(vd, a->vd);
+     tcg_temp_free_i32(vd);
+     tcg_temp_free_i32(shift);
+     tcg_temp_free_ptr(fpst);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
+     fpst = fpstatus_ptr(FPST_FPCR_F16);
+     vm = tcg_temp_new_i32();
+-    neon_load_reg32(vm, a->vm);
++    vfp_load_reg32(vm, a->vm);
+     if (a->s) {
+         if (a->rz) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
+             gen_helper_vfp_touih(vm, vm, fpst);
+         }
+     }
+-    neon_store_reg32(vm, a->vd);
++    vfp_store_reg32(vm, a->vd);
+     tcg_temp_free_i32(vm);
+     tcg_temp_free_ptr(fpst);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
+     fpst = fpstatus_ptr(FPST_FPCR);
+     vm = tcg_temp_new_i32();
+-    neon_load_reg32(vm, a->vm);
++    vfp_load_reg32(vm, a->vm);
+     if (a->s) {
+         if (a->rz) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
+             gen_helper_vfp_touis(vm, vm, fpst);
+         }
+     }
+-    neon_store_reg32(vm, a->vd);
++    vfp_store_reg32(vm, a->vd);
+     tcg_temp_free_i32(vm);
+     tcg_temp_free_ptr(fpst);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
+             gen_helper_vfp_touid(vd, vm, fpst);
+         }
+     }
+-    neon_store_reg32(vd, a->vd);
++    vfp_store_reg32(vd, a->vd);
+     tcg_temp_free_i32(vd);
+     tcg_temp_free_i64(vm);
+     tcg_temp_free_ptr(fpst);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
+     /* Insert low half of Vm into high half of Vd */
+     rm = tcg_temp_new_i32();
+     rd = tcg_temp_new_i32();
+-    neon_load_reg32(rm, a->vm);
+-    neon_load_reg32(rd, a->vd);
++    vfp_load_reg32(rm, a->vm);
++    vfp_load_reg32(rd, a->vd);
+     tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
+-    neon_store_reg32(rd, a->vd);
++    vfp_store_reg32(rd, a->vd);
+     tcg_temp_free_i32(rm);
+     tcg_temp_free_i32(rd);
+     return true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
+     /* Set Vd to high half of Vm */
+     rm = tcg_temp_new_i32();
+-    neon_load_reg32(rm, a->vm);
++    vfp_load_reg32(rm, a->vm);
+     tcg_gen_shri_i32(rm, rm, 16);
+-    neon_store_reg32(rm, a->vd);
++    vfp_store_reg32(rm, a->vd);
+     tcg_temp_free_i32(rm);
+     return true;
+ }
 --
 .20.1

-[PULL 11/23] target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
+[PULL 08/26] target/arm: Add read/write_neon_element64
-Convert the float versions of VMLA, VMLS and VMUL in the Neon
+From: Richard Henderson <richard.henderson@linaro.org>
--reg-scalar group to decodetree.
+Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
-As noted in the comment on the WRAP_FP_FN macro, we could have
+ target/arm/translate.c          | 26 +++++++++
-had a do_2scalar_fp() function, but for 3 insns it seemed
+ target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
-simpler to just do the wrapping to get hold of the fpstatus ptr.
+files changed, 73 insertions(+), 47 deletions(-)
-(These are the only fp insns in the group.)
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/neon-dp.decode       |  3 ++
  target/arm/translate-neon.inc.c | 65 +++++++++++++++++++++++++++++++++
  target/arm/translate.c          | 37 ++-----------------
 files changed, 71 insertions(+), 34 deletions(-)
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-dp.decode
 +++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
                   &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
      VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
 +    VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
      VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
 +    VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
      VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
 +    VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
    ]
  }
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
      return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
  }
 +
 +/*
 + * Rather than have a float-specific version of do_2scalar just for
 + * three insns, we wrap a NeonGenTwoSingleOpFn to turn it into
 + * a NeonGenTwoOpFn.
 + */
 +#define WRAP_FP_FN(WRAPNAME, FUNC)                              \
 +    static void WRAPNAME(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm) \
 +    {                                                           \
 +        TCGv_ptr fpstatus = get_fpstatus_ptr(1);                \
 +        FUNC(rd, rn, rm, fpstatus);                             \
 +        tcg_temp_free_ptr(fpstatus);                            \
 +    }
 +
 +WRAP_FP_FN(gen_VMUL_F_mul, gen_helper_vfp_muls)
 +WRAP_FP_FN(gen_VMUL_F_add, gen_helper_vfp_adds)
 +WRAP_FP_FN(gen_VMUL_F_sub, gen_helper_vfp_subs)
 +
 +static bool trans_VMUL_F_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_mul,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VMLA_F_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_mul,
 +        NULL,
 +    };
 +    static NeonGenTwoOpFn * const accfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_add,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 +}
 +
 +static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_mul,
 +        NULL,
 +    };
 +    static NeonGenTwoOpFn * const accfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_sub,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
-                 case 0: /* Integer VMLA scalar */
+     }
-                 case 4: /* Integer VMLS scalar */
+ }
-                 case 8: /* Integer VMUL scalar */
--                    return 1; /* handled by decodetree */
++static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
--
++{
-                 case 1: /* Float VMLA scalar */
++    long off = neon_element_offset(reg, ele, memop);
                  case 5: /* Floating point VMLS scalar */
                  case 9: /* Floating point VMUL scalar */
 -                    if (size == 1) {
 -                        return 1;
 -                    }
 -                    /* fall through */
 +                    return 1; /* handled by decodetree */
 +
-                 case 12: /* VQDMULH scalar */
++    switch (memop) {
-                 case 13: /* VQRDMULH scalar */
++    case MO_Q:
-                     if (u && ((rd | rn) & 1)) {
++        tcg_gen_ld_i64(dest, cpu_env, off);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++        break;
-                             } else {
++    default:
-                                 gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
++        g_assert_not_reached();
-                             }
++    }
--                        } else if (op == 13) {
++}
-+                        } else {
++
-                             if (size == 1) {
+ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
-                                 gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
+ {
-                             } else {
+     long off = neon_element_offset(reg, ele, memop);
-                                 gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
+@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
-                             }
+     }
--                        } else {
+ }
--                            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
--                            gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
++static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
--                            tcg_temp_free_ptr(fpstatus);
++{
-                         }
++    long off = neon_element_offset(reg, ele, memop);
-                         tcg_temp_free_i32(tmp2);
++
--                        if (op < 8) {
++    switch (memop) {
--                            /* Accumulate.  */
++    case MO_64:
--                            tmp2 = neon_load_reg(rd, pass);
++        tcg_gen_st_i64(src, cpu_env, off);
--                            switch (op) {
++        break;
--                            case 1:
++    default:
--                            {
++        g_assert_not_reached();
--                                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
++    }
--                                gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
++}
--                                tcg_temp_free_ptr(fpstatus);
++
--                                break;
+ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
--                            }
+ {
--                            case 5:
+     TCGv_ptr ret = tcg_temp_new_ptr();
--                            {
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
--                                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
+index XXXXXXX..XXXXXXX 100644
--                                gen_helper_vfp_subs(tmp, tmp2, tmp, fpstatus);
+--- a/target/arm/translate-neon.c.inc
--                                tcg_temp_free_ptr(fpstatus);
++++ b/target/arm/translate-neon.c.inc
--                                break;
+@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
--                            }
+     for (pass = 0; pass < a->q + 1; pass++) {
--                            default:
+         TCGv_i64 tmp = tcg_temp_new_i64();
--                                abort();
--                            }
+-        neon_load_reg64(tmp, a->vm + pass);
--                            tcg_temp_free_i32(tmp2);
++        read_neon_element64(tmp, a->vm, pass, MO_64);
--                        }
+         fn(tmp, cpu_env, tmp, constimm);
-                         neon_store_reg(rd, pass, tmp);
+-        neon_store_reg64(tmp, a->vd + pass);
-                     }
++        write_neon_element64(tmp, a->vd, pass, MO_64);
-                     break;
+         tcg_temp_free_i64(tmp);
      }
      tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
      rd = tcg_temp_new_i32();
      /* Load both inputs first to avoid potential overwrite if rm == rd */
 -    neon_load_reg64(rm1, a->vm);
 -    neon_load_reg64(rm2, a->vm + 1);
 +    read_neon_element64(rm1, a->vm, 0, MO_64);
 +    read_neon_element64(rm2, a->vm, 1, MO_64);
      shiftfn(rm1, rm1, constimm);
      narrowfn(rd, cpu_env, rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
      }
 -    neon_store_reg64(tmp, a->vd);
 +    write_neon_element64(tmp, a->vd, 0, MO_64);
      widenfn(tmp, rm1);
      tcg_temp_free_i32(rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
      }
 -    neon_store_reg64(tmp, a->vd + 1);
 +    write_neon_element64(tmp, a->vd, 1, MO_64);
      tcg_temp_free_i64(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm_64 = tcg_temp_new_i64();
      if (src1_wide) {
 -        neon_load_reg64(rn0_64, a->vn);
 +        read_neon_element64(rn0_64, a->vn, 0, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
       * avoid incorrect results if a narrow input overlaps with the result.
       */
      if (src1_wide) {
 -        neon_load_reg64(rn1_64, a->vn + 1);
 +        read_neon_element64(rn1_64, a->vn, 1, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm = tcg_temp_new_i32();
      read_neon_element32(rm, a->vm, 1, MO_32);
 -    neon_store_reg64(rn0_64, a->vd);
 +    write_neon_element64(rn0_64, a->vd, 0, MO_64);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
 -    neon_store_reg64(rn1_64, a->vd + 1);
 +    write_neon_element64(rn1_64, a->vd, 1, MO_64);
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rn_64, a->vn);
 -    neon_load_reg64(rm_64, a->vm);
 +    read_neon_element64(rn_64, a->vn, 0, MO_64);
 +    read_neon_element64(rm_64, a->vm, 0, MO_64);
      opfn(rn_64, rn_64, rm_64);
      narrowfn(rd0, rn_64);
 -    neon_load_reg64(rn_64, a->vn + 1);
 -    neon_load_reg64(rm_64, a->vm + 1);
 +    read_neon_element64(rn_64, a->vn, 1, MO_64);
 +    read_neon_element64(rm_64, a->vm, 1, MO_64);
      opfn(rn_64, rn_64, rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      /* Don't store results until after all loads: they might overlap */
      if (accfn) {
          tmp = tcg_temp_new_i64();
 -        neon_load_reg64(tmp, a->vd);
 +        read_neon_element64(tmp, a->vd, 0, MO_64);
          accfn(tmp, tmp, rd0);
 -        neon_store_reg64(tmp, a->vd);
 -        neon_load_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 0, MO_64);
 +        read_neon_element64(tmp, a->vd, 1, MO_64);
          accfn(tmp, tmp, rd1);
 -        neon_store_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 1, MO_64);
          tcg_temp_free_i64(tmp);
      } else {
 -        neon_store_reg64(rd0, a->vd);
 -        neon_store_reg64(rd1, a->vd + 1);
 +        write_neon_element64(rd0, a->vd, 0, MO_64);
 +        write_neon_element64(rd1, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      if (accfn) {
          TCGv_i64 t64 = tcg_temp_new_i64();
 -        neon_load_reg64(t64, a->vd);
 +        read_neon_element64(t64, a->vd, 0, MO_64);
          accfn(t64, t64, rn0_64);
 -        neon_store_reg64(t64, a->vd);
 -        neon_load_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 0, MO_64);
 +        read_neon_element64(t64, a->vd, 1, MO_64);
          accfn(t64, t64, rn1_64);
 -        neon_store_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 1, MO_64);
          tcg_temp_free_i64(t64);
      } else {
 -        neon_store_reg64(rn0_64, a->vd);
 -        neon_store_reg64(rn1_64, a->vd + 1);
 +        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 +        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          right = tcg_temp_new_i64();
          dest = tcg_temp_new_i64();
 -        neon_load_reg64(right, a->vn);
 -        neon_load_reg64(left, a->vm);
 +        read_neon_element64(right, a->vn, 0, MO_64);
 +        read_neon_element64(left, a->vm, 0, MO_64);
          tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
 -        neon_store_reg64(dest, a->vd);
 +        write_neon_element64(dest, a->vd, 0, MO_64);
          tcg_temp_free_i64(left);
          tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          destright = tcg_temp_new_i64();
          if (a->imm < 8) {
 -            neon_load_reg64(right, a->vn);
 -            neon_load_reg64(middle, a->vn + 1);
 +            read_neon_element64(right, a->vn, 0, MO_64);
 +            read_neon_element64(middle, a->vn, 1, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
 -            neon_load_reg64(left, a->vm);
 +            read_neon_element64(left, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
          } else {
 -            neon_load_reg64(right, a->vn + 1);
 -            neon_load_reg64(middle, a->vm);
 +            read_neon_element64(right, a->vn, 1, MO_64);
 +            read_neon_element64(middle, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
 -            neon_load_reg64(left, a->vm + 1);
 +            read_neon_element64(left, a->vm, 1, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
          }
 -        neon_store_reg64(destright, a->vd);
 -        neon_store_reg64(destleft, a->vd + 1);
 +        write_neon_element64(destright, a->vd, 0, MO_64);
 +        write_neon_element64(destleft, a->vd, 1, MO_64);
          tcg_temp_free_i64(destright);
          tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          if (accfn) {
              TCGv_i64 tmp64 = tcg_temp_new_i64();
 -            neon_load_reg64(tmp64, a->vd + pass);
 +            read_neon_element64(tmp64, a->vd, pass, MO_64);
              accfn(rd_64, tmp64, rd_64);
              tcg_temp_free_i64(tmp64);
          }
 -        neon_store_reg64(rd_64, a->vd + pass);
 +        write_neon_element64(rd_64, a->vd, pass, MO_64);
          tcg_temp_free_i64(rd_64);
      }
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rm, a->vm);
 +    read_neon_element64(rm, a->vm, 0, MO_64);
      narrowfn(rd0, cpu_env, rm);
 -    neon_load_reg64(rm, a->vm + 1);
 +    read_neon_element64(rm, a->vm, 1, MO_64);
      narrowfn(rd1, cpu_env, rm);
      write_neon_element32(rd0, a->vd, 0, MO_32);
      write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd);
 +    write_neon_element64(rd, a->vd, 0, MO_64);
      widenfn(rd, rm1);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd + 1);
 +    write_neon_element64(rd, a->vd, 1, MO_64);
      tcg_temp_free_i64(rd);
      tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
      rm = tcg_temp_new_i64();
      rd = tcg_temp_new_i64();
      for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        neon_load_reg64(rm, a->vm + pass);
 -        neon_load_reg64(rd, a->vd + pass);
 -        neon_store_reg64(rm, a->vd + pass);
 -        neon_store_reg64(rd, a->vm + pass);
 +        read_neon_element64(rm, a->vm, pass, MO_64);
 +        read_neon_element64(rd, a->vd, pass, MO_64);
 +        write_neon_element64(rm, a->vd, pass, MO_64);
 +        write_neon_element64(rd, a->vm, pass, MO_64);
      }
      tcg_temp_free_i64(rm);
      tcg_temp_free_i64(rd);
 --
 .20.1

-[PULL 03/23] target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
+[PULL 09/26] target/arm: Rename neon_load_reg64 to vfp_load_reg64
-Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
+From: Richard Henderson <richard.henderson@linaro.org>
-VRSUBHN in the Neon 3-registers-different-lengths group to
-decodetree.
+The only uses of this function are for loading VFP
+double-precision values, and nothing to do with NEON.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  6 +++
+ target/arm/translate.c         |  8 ++--
- target/arm/translate-neon.inc.c | 87 +++++++++++++++++++++++++++++++
+ target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
- target/arm/translate.c          | 91 ++++-----------------------------
+files changed, 46 insertions(+), 46 deletions(-)
-files changed, 104 insertions(+), 80 deletions(-)
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-dp.decode
 +++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
      VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
      VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
 +
 +    VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
 +    VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
 +
 +    VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
 +    VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
    ]
  }
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_PREWIDEN(VADDW_S, s, ext, add, true)
  DO_PREWIDEN(VADDW_U, u, extu, add, true)
  DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
  DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
 +
 +static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
 +                         NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
 +{
 +    /* 3-regs different lengths, narrowing (VADDHN/VSUBHN/VRADDHN/VRSUBHN) */
 +    TCGv_i64 rn_64, rm_64;
 +    TCGv_i32 rd0, rd1;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!opfn || !narrowfn) {
 +        /* size == 3 case, which is an entirely different insn group */
 +        return false;
 +    }
 +
 +    if ((a->vn | a->vm) & 1) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    rn_64 = tcg_temp_new_i64();
 +    rm_64 = tcg_temp_new_i64();
 +    rd0 = tcg_temp_new_i32();
 +    rd1 = tcg_temp_new_i32();
 +
 +    neon_load_reg64(rn_64, a->vn);
 +    neon_load_reg64(rm_64, a->vm);
 +
 +    opfn(rn_64, rn_64, rm_64);
 +
 +    narrowfn(rd0, rn_64);
 +
 +    neon_load_reg64(rn_64, a->vn + 1);
 +    neon_load_reg64(rm_64, a->vm + 1);
 +
 +    opfn(rn_64, rn_64, rm_64);
 +
 +    narrowfn(rd1, rn_64);
 +
 +    neon_store_reg(a->vd, 0, rd0);
 +    neon_store_reg(a->vd, 1, rd1);
 +
 +    tcg_temp_free_i64(rn_64);
 +    tcg_temp_free_i64(rm_64);
 +
 +    return true;
 +}
 +
 +#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP)                       \
 +    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
 +    {                                                                   \
 +        static NeonGenTwo64OpFn * const addfn[] = {                     \
 +            gen_helper_neon_##OP##l_u16,                                \
 +            gen_helper_neon_##OP##l_u32,                                \
 +            tcg_gen_##OP##_i64,                                         \
 +            NULL,                                                       \
 +        };                                                              \
 +        static NeonGenNarrowFn * const narrowfn[] = {                   \
 +            gen_helper_neon_##NARROWTYPE##_high_u8,                     \
 +            gen_helper_neon_##NARROWTYPE##_high_u16,                    \
 +            EXTOP,                                                      \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]);   \
 +    }
 +
 +static void gen_narrow_round_high_u32(TCGv_i32 rd, TCGv_i64 rn)
 +{
 +    tcg_gen_addi_i64(rn, rn, 1u << 31);
 +    tcg_gen_extrh_i64_i32(rd, rn);
 +}
 +
 +DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
 +DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
 +DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
 +DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
+@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
      }
  }
--static inline void gen_neon_subl(int size)
+-static inline void neon_load_reg64(TCGv_i64 var, int reg)
--{
++static inline void vfp_load_reg64(TCGv_i64 var, int reg)
 -    switch (size) {
 -    case 0: gen_helper_neon_subl_u16(CPU_V001); break;
 -    case 1: gen_helper_neon_subl_u32(CPU_V001); break;
 -    case 2: tcg_gen_sub_i64(CPU_V001); break;
 -    default: abort();
 -    }
 -}
 -
  static inline void gen_neon_negl(TCGv_i64 var, int size)
  {
-     switch (size) {
+-    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
-             op = (insn >> 8) & 0xf;
+ }
-             if ((insn & (1 << 6)) == 0) {
-                 /* Three registers of different lengths.  */
+-static inline void neon_store_reg64(TCGv_i64 var, int reg)
--                int src1_wide;
++static inline void vfp_store_reg64(TCGv_i64 var, int reg)
--                int src2_wide;
+ {
-                 /* undefreq: bit 0 : UNDEF if size == 0
+-    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
-                  *           bit 1 : UNDEF if size == 1
++    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
-                  *           bit 2 : UNDEF if size == 2
+ }
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     {0, 0, 0, 7}, /* VADDW: handled by decodetree */
+ static inline void vfp_load_reg32(TCGv_i32 var, int reg)
-                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
-                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
+index XXXXXXX..XXXXXXX 100644
--                    {0, 1, 1, 0}, /* VADDHN */
+--- a/target/arm/translate-vfp.c.inc
-+                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
++++ b/target/arm/translate-vfp.c.inc
-                     {0, 0, 0, 0}, /* VABAL */
+@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
--                    {0, 1, 1, 0}, /* VSUBHN */
+         tcg_gen_ext_i32_i64(nf, cpu_NF);
-+                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
+         tcg_gen_ext_i32_i64(vf, cpu_VF);
-                     {0, 0, 0, 0}, /* VABDL */
-                     {0, 0, 0, 0}, /* VMLAL */
+-        neon_load_reg64(frn, rn);
-                     {0, 0, 0, 9}, /* VQDMLAL */
+-        neon_load_reg64(frm, rm);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++        vfp_load_reg64(frn, rn);
-                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
++        vfp_load_reg64(frm, rm);
-                 };
+         switch (a->cc) {
+         case 0: /* eq: Z */
--                src1_wide = neon_3reg_wide[op][1];
+             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
--                src2_wide = neon_3reg_wide[op][2];
+@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
-                 undefreq = neon_3reg_wide[op][3];
+             tcg_temp_free_i64(tmp);
+             break;
-                 if ((undefreq & (1 << size)) ||
+         }
-                     ((undefreq & 8) && u)) {
+-        neon_store_reg64(dest, rd);
-                     return 1;
++        vfp_store_reg64(dest, rd);
-                 }
+         tcg_temp_free_i64(frn);
--                if ((src1_wide && (rn & 1)) ||
+         tcg_temp_free_i64(frm);
--                    (src2_wide && (rm & 1)) ||
+         tcg_temp_free_i64(dest);
--                    (!src2_wide && (rd & 1))) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
-+                if (rd & 1) {
+         TCGv_i64 tcg_res;
-                     return 1;
+         tcg_op = tcg_temp_new_i64();
-                 }
+         tcg_res = tcg_temp_new_i64();
+-        neon_load_reg64(tcg_op, rm);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++        vfp_load_reg64(tcg_op, rm);
-                 /* Avoid overlapping operands.  Wide source operands are
+         gen_helper_rintd(tcg_res, tcg_op, fpst);
-                    always aligned so will never overlap with wide
+-        neon_store_reg64(tcg_res, rd);
-                    destinations in problematic ways.  */
++        vfp_store_reg64(tcg_res, rd);
--                if (rd == rm && !src2_wide) {
+         tcg_temp_free_i64(tcg_op);
-+                if (rd == rm) {
+         tcg_temp_free_i64(tcg_res);
-                     tmp = neon_load_reg(rm, 1);
+     } else {
-                     neon_store_scratch(2, tmp);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
--                } else if (rd == rn && !src1_wide) {
+         tcg_double = tcg_temp_new_i64();
-+                } else if (rd == rn) {
+         tcg_res = tcg_temp_new_i64();
-                     tmp = neon_load_reg(rn, 1);
+         tcg_tmp = tcg_temp_new_i32();
-                     neon_store_scratch(2, tmp);
+-        neon_load_reg64(tcg_double, rm);
-                 }
++        vfp_load_reg64(tcg_double, rm);
-                 tmp3 = NULL;
+         if (is_signed) {
-                 for (pass = 0; pass < 2; pass++) {
+             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
--                    if (src1_wide) {
+         } else {
--                        neon_load_reg64(cpu_V0, rn + pass);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
--                        tmp = NULL;
+     tmp = tcg_temp_new_i64();
-+                    if (pass == 1 && rd == rn) {
+     if (a->l) {
-+                        tmp = neon_load_scratch(2);
+         gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-                     } else {
+-        neon_store_reg64(tmp, a->vd);
--                        if (pass == 1 && rd == rn) {
++        vfp_store_reg64(tmp, a->vd);
--                            tmp = neon_load_scratch(2);
+     } else {
--                        } else {
+-        neon_load_reg64(tmp, a->vd);
--                            tmp = neon_load_reg(rn, pass);
++        vfp_load_reg64(tmp, a->vd);
--                        }
+         gen_aa32_st64(s, tmp, addr, get_mem_index(s));
-+                        tmp = neon_load_reg(rn, pass);
+     }
-                     }
+     tcg_temp_free_i64(tmp);
--                    if (src2_wide) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
--                        neon_load_reg64(cpu_V1, rm + pass);
+         if (a->l) {
--                        tmp2 = NULL;
+             /* load */
-+                    if (pass == 1 && rd == rm) {
+             gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-+                        tmp2 = neon_load_scratch(2);
+-            neon_store_reg64(tmp, a->vd + i);
-                     } else {
++            vfp_store_reg64(tmp, a->vd + i);
--                        if (pass == 1 && rd == rm) {
+         } else {
--                            tmp2 = neon_load_scratch(2);
+             /* store */
--                        } else {
+-            neon_load_reg64(tmp, a->vd + i);
--                            tmp2 = neon_load_reg(rm, pass);
++            vfp_load_reg64(tmp, a->vd + i);
--                        }
+             gen_aa32_st64(s, tmp, addr, get_mem_index(s));
-+                        tmp2 = neon_load_reg(rm, pass);
+         }
-                     }
+         tcg_gen_addi_i32(addr, addr, offset);
-                     switch (op) {
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
--                    case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
+     fd = tcg_temp_new_i64();
--                        gen_neon_addl(size);
+     fpst = fpstatus_ptr(FPST_FPCR);
--                        break;
--                    case 2: case 3: case 6: /* VSUBL, VSUBW, VSUBHN, VRSUBHN */
+-    neon_load_reg64(f0, vn);
--                        gen_neon_subl(size);
+-    neon_load_reg64(f1, vm);
--                        break;
++    vfp_load_reg64(f0, vn);
-                     case 5: case 7: /* VABAL, VABDL */
++    vfp_load_reg64(f1, vm);
-                         switch ((size << 1) | u) {
-                         case 0:
+     for (;;) {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+         if (reads_vd) {
-                             abort();
+-            neon_load_reg64(fd, vd);
-                         }
++            vfp_load_reg64(fd, vd);
-                         neon_store_reg64(cpu_V0, rd + pass);
+         }
--                    } else if (op == 4 || op == 6) {
+         fn(fd, f0, f1, fpst);
--                        /* Narrowing operation.  */
+-        neon_store_reg64(fd, vd);
--                        tmp = tcg_temp_new_i32();
++        vfp_store_reg64(fd, vd);
--                        if (!u) {
--                            switch (size) {
+         if (veclen == 0) {
--                            case 0:
+             break;
--                                gen_helper_neon_narrow_high_u8(tmp, cpu_V0);
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
--                                break;
+         veclen--;
--                            case 1:
+         vd = vfp_advance_dreg(vd, delta_d);
--                                gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
+         vn = vfp_advance_dreg(vn, delta_d);
--                                break;
+-        neon_load_reg64(f0, vn);
--                            case 2:
++        vfp_load_reg64(f0, vn);
--                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
+         if (delta_m) {
--                                break;
+             vm = vfp_advance_dreg(vm, delta_m);
--                            default: abort();
+-            neon_load_reg64(f1, vm);
--                            }
++            vfp_load_reg64(f1, vm);
--                        } else {
+         }
--                            switch (size) {
+     }
--                            case 0:
--                                gen_helper_neon_narrow_round_high_u8(tmp, cpu_V0);
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
--                                break;
+     f0 = tcg_temp_new_i64();
--                            case 1:
+     fd = tcg_temp_new_i64();
--                                gen_helper_neon_narrow_round_high_u16(tmp, cpu_V0);
--                                break;
+-    neon_load_reg64(f0, vm);
--                            case 2:
++    vfp_load_reg64(f0, vm);
--                                tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
--                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
+     for (;;) {
--                                break;
+         fn(fd, f0);
--                            default: abort();
+-        neon_store_reg64(fd, vd);
--                            }
++        vfp_store_reg64(fd, vd);
--                        }
--                        if (pass == 0) {
+         if (veclen == 0) {
--                            tmp3 = tmp;
+             break;
--                        } else {
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
--                            neon_store_reg(rd, 0, tmp3);
+             /* single source one-many */
--                            neon_store_reg(rd, 1, tmp);
+             while (veclen--) {
--                        }
+                 vd = vfp_advance_dreg(vd, delta_d);
-                     } else {
+-                neon_store_reg64(fd, vd);
-                         /* Write back the result.  */
++                vfp_store_reg64(fd, vd);
-                         neon_store_reg64(cpu_V0, rd + pass);
+             }
              break;
          }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          veclen--;
          vd = vfp_advance_dreg(vd, delta_d);
          vd = vfp_advance_dreg(vm, delta_m);
 -        neon_load_reg64(f0, vm);
 +        vfp_load_reg64(f0, vm);
      }
      tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vn, a->vn);
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vn, a->vn);
 +    vfp_load_reg64(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negd(vn, vn);
      }
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negd(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
      for (;;) {
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
      vd = tcg_temp_new_i64();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i64(vm, 0);
      } else {
 -        neon_load_reg64(vm, a->vm);
 +        vfp_load_reg64(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
      tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
      vd = tcg_temp_new_i64();
      gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(ahp_mode);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
      tmp = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
      tcg_temp_free_i64(vm);
      tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rintd(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd_exact(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vd = tcg_temp_new_i64();
      vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
          /* u32 -> f64 */
          gen_helper_vfp_uitod(vd, vm, fpst);
      }
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i32();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i64(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR);
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i32();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      if (a->s) {
          if (a->rz) {
 --
 .20.1

-New patch
+[PULL 10/26] target/arm: Simplify do_long_3d and do_2scalar_long
+From: Richard Henderson <richard.henderson@linaro.org>
+In both cases, we can sink the write-back and perform
+the accumulate into the normal destination temps.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/translate-neon.c.inc | 23 +++++++++--------------
+file changed, 9 insertions(+), 14 deletions(-)
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.c.inc
++++ b/target/arm/translate-neon.c.inc
+@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
+     if (accfn) {
+         tmp = tcg_temp_new_i64();
+         read_neon_element64(tmp, a->vd, 0, MO_64);
+-        accfn(tmp, tmp, rd0);
+-        write_neon_element64(tmp, a->vd, 0, MO_64);
++        accfn(rd0, tmp, rd0);
+         read_neon_element64(tmp, a->vd, 1, MO_64);
+-        accfn(tmp, tmp, rd1);
+-        write_neon_element64(tmp, a->vd, 1, MO_64);
++        accfn(rd1, tmp, rd1);
+         tcg_temp_free_i64(tmp);
+-    } else {
+-        write_neon_element64(rd0, a->vd, 0, MO_64);
+-        write_neon_element64(rd1, a->vd, 1, MO_64);
+     }
++    write_neon_element64(rd0, a->vd, 0, MO_64);
++    write_neon_element64(rd1, a->vd, 1, MO_64);
+     tcg_temp_free_i64(rd0);
+     tcg_temp_free_i64(rd1);
+@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
+     if (accfn) {
+         TCGv_i64 t64 = tcg_temp_new_i64();
+         read_neon_element64(t64, a->vd, 0, MO_64);
+-        accfn(t64, t64, rn0_64);
+-        write_neon_element64(t64, a->vd, 0, MO_64);
++        accfn(rn0_64, t64, rn0_64);
+         read_neon_element64(t64, a->vd, 1, MO_64);
+-        accfn(t64, t64, rn1_64);
+-        write_neon_element64(t64, a->vd, 1, MO_64);
++        accfn(rn1_64, t64, rn1_64);
+         tcg_temp_free_i64(t64);
+-    } else {
+-        write_neon_element64(rn0_64, a->vd, 0, MO_64);
+-        write_neon_element64(rn1_64, a->vd, 1, MO_64);
+     }
++
++    write_neon_element64(rn0_64, a->vd, 0, MO_64);
++    write_neon_element64(rn1_64, a->vd, 1, MO_64);
+     tcg_temp_free_i64(rn0_64);
+     tcg_temp_free_i64(rn1_64);
+     return true;
+--
+.20.1

-[PULL 16/23] target/arm: Convert Neon VTBL, VTBX to decodetree
+[PULL 11/26] target/arm: Improve do_prewiden_3d
-Convert the Neon VTBL, VTBX instructions to decodetree.  The actual
+From: Richard Henderson <richard.henderson@linaro.org>
 implementation of the insn is copied across to the new trans function
 unchanged except for renaming 'tmp5' to 'tmp4'.
+We can use proper widening loads to extend 32-bit inputs,
+and skip the "widenfn" step.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  3 ++
+ target/arm/translate.c          |  6 +++
- target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
+ target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
- target/arm/translate.c          | 41 +++---------------------
+files changed, 43 insertions(+), 29 deletions(-)
 files changed, 63 insertions(+), 37 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     ##################################################################
-     VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
-                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+    VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
-+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
-   ]
-   # Subgroup for size != 0b11
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
-     }
-     return true;
- }
-+
-+static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
-+{
-+    int n;
-+    TCGv_i32 tmp, tmp2, tmp3, tmp4;
-+    TCGv_ptr ptr1;
-+
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    n = a->len + 1;
-+    if ((a->vn + n) > 32) {
-+        /*
-+         * This is UNPREDICTABLE; we choose to UNDEF to avoid the
-+         * helper function running off the end of the register file.
-+         */
-+        return false;
-+    }
-+    n <<= 3;
-+    if (a->op) {
-+        tmp = neon_load_reg(a->vd, 0);
-+    } else {
-+        tmp = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(tmp, 0);
-+    }
-+    tmp2 = neon_load_reg(a->vm, 0);
-+    ptr1 = vfp_reg_ptr(true, a->vn);
-+    tmp4 = tcg_const_i32(n);
-+    gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
-+    tcg_temp_free_i32(tmp);
-+    if (a->op) {
-+        tmp = neon_load_reg(a->vd, 1);
-+    } else {
-+        tmp = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(tmp, 0);
-+    }
-+    tmp3 = neon_load_reg(a->vm, 1);
-+    gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
-+    tcg_temp_free_i32(tmp4);
-+    tcg_temp_free_ptr(ptr1);
-+    neon_store_reg(a->vd, 0, tmp2);
-+    neon_store_reg(a->vd, 1, tmp3);
-+    tcg_temp_free_i32(tmp);
-+    return true;
-+}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
      long off = neon_element_offset(reg, ele, memop);
      switch (memop) {
 +    case MO_SL:
 +        tcg_gen_ld32s_i64(dest, cpu_env, off);
 +        break;
 +    case MO_UL:
 +        tcg_gen_ld32u_i64(dest, cpu_env, off);
 +        break;
      case MO_Q:
          tcg_gen_ld_i64(dest, cpu_env, off);
          break;
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
  static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
                             NeonGenWidenFn *widenfn,
                             NeonGenTwo64OpFn *opfn,
 -                           bool src1_wide)
 +                           int src1_mop, int src2_mop)
  {
-     int op;
+     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
-     int q;
+     TCGv_i64 rn0_64, rn1_64, rm_64;
--    int rd, rn, rm, rd_ofs, rm_ofs;
+-    TCGv_i32 rm;
 +    int rd, rm, rd_ofs, rm_ofs;
      int size;
      int pass;
      int u;
      int vec_size;
 -    TCGv_i32 tmp, tmp2, tmp3, tmp5;
 -    TCGv_ptr ptr1;
 +    TCGv_i32 tmp, tmp2, tmp3;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-         return 1;
+         return false;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
-     q = (insn & (1 << 6)) != 0;
+         return false;
-     u = (insn >> 24) & 1;
+     }
-     VFP_DREG_D(rd, insn);
--    VFP_DREG_N(rn, insn);
+-    if (!widenfn || !opfn) {
-     VFP_DREG_M(rm, insn);
++    if (!opfn) {
-     size = (insn >> 20) & 3;
+         /* size == 3 case, which is an entirely different insn group */
-     vec_size = q ? 16 : 8;
+         return false;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+     }
-                     break;
-                 }
+-    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
-             } else if ((insn & (1 << 10)) == 0) {
++    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
--                /* VTBL, VTBX.  */
+         return false;
--                int n = ((insn >> 8) & 3) + 1;
+     }
--                if ((rn + n) > 32) {
--                    /* This is UNPREDICTABLE; we choose to UNDEF to avoid the
+@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
--                     * helper function running off the end of the register file.
+     rn1_64 = tcg_temp_new_i64();
--                     */
+     rm_64 = tcg_temp_new_i64();
--                    return 1;
--                }
+-    if (src1_wide) {
--                n <<= 3;
+-        read_neon_element64(rn0_64, a->vn, 0, MO_64);
--                if (insn & (1 << 6)) {
++    if (src1_mop >= 0) {
--                    tmp = neon_load_reg(rd, 0);
++        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
--                } else {
+     } else {
--                    tmp = tcg_temp_new_i32();
+         TCGv_i32 tmp = tcg_temp_new_i32();
--                    tcg_gen_movi_i32(tmp, 0);
+         read_neon_element32(tmp, a->vn, 0, MO_32);
--                }
+         widenfn(rn0_64, tmp);
--                tmp2 = neon_load_reg(rm, 0);
+         tcg_temp_free_i32(tmp);
--                ptr1 = vfp_reg_ptr(true, rn);
+     }
--                tmp5 = tcg_const_i32(n);
+-    rm = tcg_temp_new_i32();
--                gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
+-    read_neon_element32(rm, a->vm, 0, MO_32);
--                tcg_temp_free_i32(tmp);
++    if (src2_mop >= 0) {
--                if (insn & (1 << 6)) {
++        read_neon_element64(rm_64, a->vm, 0, src2_mop);
--                    tmp = neon_load_reg(rd, 1);
++    } else {
--                } else {
++        TCGv_i32 tmp = tcg_temp_new_i32();
--                    tmp = tcg_temp_new_i32();
++        read_neon_element32(tmp, a->vm, 0, MO_32);
--                    tcg_gen_movi_i32(tmp, 0);
++        widenfn(rm_64, tmp);
--                }
++        tcg_temp_free_i32(tmp);
--                tmp3 = neon_load_reg(rm, 1);
++    }
--                gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
--                tcg_temp_free_i32(tmp5);
+-    widenfn(rm_64, rm);
--                tcg_temp_free_ptr(ptr1);
+-    tcg_temp_free_i32(rm);
--                neon_store_reg(rd, 0, tmp2);
+     opfn(rn0_64, rn0_64, rm_64);
--                neon_store_reg(rd, 1, tmp3);
--                tcg_temp_free_i32(tmp);
+     /*
-+                /* VTBL, VTBX: handled by decodetree */
+      * Load second pass inputs before storing the first pass result, to
-+                return 1;
+      * avoid incorrect results if a narrow input overlaps with the result.
-             } else if ((insn & 0x380) == 0) {
+      */
-                 /* VDUP */
+-    if (src1_wide) {
-                 int element;
+-        read_neon_element64(rn1_64, a->vn, 1, MO_64);
 +    if (src1_mop >= 0) {
 +        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = tcg_temp_new_i32();
 -    read_neon_element32(rm, a->vm, 1, MO_32);
 +    if (src2_mop >= 0) {
 +        read_neon_element64(rm_64, a->vm, 1, src2_mop);
 +    } else {
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, 1, MO_32);
 +        widenfn(rm_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
      write_neon_element64(rn0_64, a->vd, 0, MO_64);
 -    widenfn(rm_64, rm);
 -    tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
      write_neon_element64(rn1_64, a->vd, 1, MO_64);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      return true;
  }
 -#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
 +#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
      static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
      {                                                                   \
          static NeonGenWidenFn * const widenfn[] = {                     \
              gen_helper_neon_widen_##S##8,                               \
              gen_helper_neon_widen_##S##16,                              \
 -            tcg_gen_##EXT##_i32_i64,                                    \
 -            NULL,                                                       \
 +            NULL, NULL,                                                 \
          };                                                              \
          static NeonGenTwo64OpFn * const addfn[] = {                     \
              gen_helper_neon_##OP##l_u16,                                \
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
              tcg_gen_##OP##_i64,                                         \
              NULL,                                                       \
          };                                                              \
 -        return do_prewiden_3d(s, a, widenfn[a->size],                   \
 -                              addfn[a->size], SRC1WIDE);                \
 +        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
 +        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
 +                              SRC1WIDE ? MO_Q : narrow_mop,             \
 +                              narrow_mop);                              \
      }
 -DO_PREWIDEN(VADDL_S, s, ext, add, false)
 -DO_PREWIDEN(VADDL_U, u, extu, add, false)
 -DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
 -DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
 -DO_PREWIDEN(VADDW_S, s, ext, add, true)
 -DO_PREWIDEN(VADDW_U, u, extu, add, true)
 -DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
 -DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
 +DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
 +DO_PREWIDEN(VADDL_U, u, add, false, 0)
 +DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
 +DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
 +DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
 +DO_PREWIDEN(VADDW_U, u, add, true, 0)
 +DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
 +DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
  static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
                           NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
 --
 .20.1

-[PULL 01/23] target/arm: Fix missing temp frees in do_vshll_2sh
+[PULL 12/26] target/arm: Fix float16 pairwise Neon ops on big-endian hosts
-The widenfn() in do_vshll_2sh() does not free the input 32-bit
+In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
-TCGv, so we need to do this in the calling code.
+meant we were using the H4() address swizzler macro rather than the
 H2() which is required for 2-byte data.  This had no effect on
 little-endian hosts but meant we put the result data into the
 destination Dreg in the wrong order on big-endian hosts.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
 ---
- target/arm/translate-neon.inc.c | 2 ++
+ target/arm/vec_helper.c | 8 ++++----
-file changed, 2 insertions(+)
+file changed, 4 insertions(+), 4 deletions(-)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/vec_helper.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
+@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
-     tmp = tcg_temp_new_i64();
+         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
+         r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
-     widenfn(tmp, rm0);
+                                                                         \
-+    tcg_temp_free_i32(rm0);
+-        d[H4(0)] = r0;                                                  \
-     if (a->shift != 0) {
+-        d[H4(1)] = r1;                                                  \
-         tcg_gen_shli_i64(tmp, tmp, a->shift);
+-        d[H4(2)] = r2;                                                  \
-         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
+-        d[H4(3)] = r3;                                                  \
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
++        d[H2(0)] = r0;                                                  \
-     neon_store_reg64(tmp, a->vd);
++        d[H2(1)] = r1;                                                  \
++        d[H2(2)] = r2;                                                  \
-     widenfn(tmp, rm1);
++        d[H2(3)] = r3;                                                  \
-+    tcg_temp_free_i32(rm1);
+     }
-     if (a->shift != 0) {
-         tcg_gen_shli_i64(tmp, tmp, a->shift);
+ DO_NEON_PAIRWISE(neon_padd, add)
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
 --
 .20.1

-[PULL 08/23] target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
+[PULL 13/26] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
-Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
+The helper functions for performing the udot/sdot operations against
-trans_VSHLL_U_2sh() as both 'static' and 'const'.
+a scalar were not using an address-swizzling macro when converting
 the index of the scalar element into a pointer into the vm array.
 This had no effect on little-endian hosts but meant we generated
 incorrect results on big-endian hosts.
 For these insns, the index is indexing over group of 4 8-bit values,
 so 32 bits per indexed entity, and H4() is therefore what we want.
 (For Neon the only possible input indexes are 0 and 1.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
 ---
- target/arm/translate-neon.inc.c | 4 ++--
+ target/arm/vec_helper.c | 4 ++--
 file changed, 2 insertions(+), 2 deletions(-)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/vec_helper.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
+     intptr_t index = simd_data(desc);
- static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
+     uint32_t *d = vd;
- {
+     int8_t *n = vn;
--    NeonGenWidenFn *widenfn[] = {
+-    int8_t *m_indexed = (int8_t *)vm + index * 4;
-+    static NeonGenWidenFn * const widenfn[] = {
++    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
-         gen_helper_neon_widen_s8,
-         gen_helper_neon_widen_s16,
+     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
-         tcg_gen_ext_i32_i64,
+      * Otherwise opr_sz is a multiple of 16.
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
+     intptr_t index = simd_data(desc);
- static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
+     uint32_t *d = vd;
- {
+     uint8_t *n = vn;
--    NeonGenWidenFn *widenfn[] = {
+-    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
-+    static NeonGenWidenFn * const widenfn[] = {
++    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
-         gen_helper_neon_widen_u8,
-         gen_helper_neon_widen_u16,
+     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
-         tcg_gen_extu_i32_i64,
+      * Otherwise opr_sz is a multiple of 16.
 --
 .20.1

-[PULL 21/23] hw/net/imx_fec: Convert debug fprintf() to trace events
+[PULL 14/26] target/arm: fix handling of HCR.FB
-From: Jean-Christophe Dubois <jcd@tribudubois.net>
+From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
+HCR should be applied when NS is set, not when it is cleared.
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-[PMD: Fixed 32-bit format string using PRIx32/PRIx64]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/net/imx_fec.c    | 106 +++++++++++++++++++-------------------------
+ target/arm/helper.c | 5 ++---
- hw/net/trace-events |  18 ++++++++
+file changed, 2 insertions(+), 3 deletions(-)
 files changed, 63 insertions(+), 61 deletions(-)
-diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/imx_fec.c
+--- a/target/arm/helper.c
-+++ b/hw/net/imx_fec.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
- #include "qemu/module.h"
- #include "net/checksum.h"
+ /*
- #include "net/eth.h"
+  * Non-IS variants of TLB operations are upgraded to
-+#include "trace.h"
+- * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
++ * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
- /* For crc32 */
+  * force broadcast of these operations.
  #include <zlib.h>
 -#ifndef DEBUG_IMX_FEC
 -#define DEBUG_IMX_FEC 0
 -#endif
 -
 -#define FEC_PRINTF(fmt, args...) \
 -    do { \
 -        if (DEBUG_IMX_FEC) { \
 -            fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_FEC, \
 -                                             __func__, ##args); \
 -        } \
 -    } while (0)
 -
 -#ifndef DEBUG_IMX_PHY
 -#define DEBUG_IMX_PHY 0
 -#endif
 -
 -#define PHY_PRINTF(fmt, args...) \
 -    do { \
 -        if (DEBUG_IMX_PHY) { \
 -            fprintf(stderr, "[%s.phy]%s: " fmt , TYPE_IMX_FEC, \
 -                                                 __func__, ##args); \
 -        } \
 -    } while (0)
 -
  #define IMX_MAX_DESC    1024
  static const char *imx_default_reg_name(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static void imx_eth_update(IMXFECState *s);
   * For now we don't handle any GPIO/interrupt line, so the OS will
   * have to poll for the PHY status.
   */
--static void phy_update_irq(IMXFECState *s)
+ static bool tlb_force_broadcast(CPUARMState *env)
 +static void imx_phy_update_irq(IMXFECState *s)
  {
-     imx_eth_update(s);
+-    return (env->cp15.hcr_el2 & HCR_FB) &&
 -        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
 +    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
  }
--static void phy_update_link(IMXFECState *s)
+ static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
 +static void imx_phy_update_link(IMXFECState *s)
  {
      /* Autonegotiation status mirrors link status.  */
      if (qemu_get_queue(s->nic)->link_down) {
 -        PHY_PRINTF("link is down\n");
 +        trace_imx_phy_update_link("down");
          s->phy_status &= ~0x0024;
          s->phy_int |= PHY_INT_DOWN;
      } else {
 -        PHY_PRINTF("link is up\n");
 +        trace_imx_phy_update_link("up");
          s->phy_status |= 0x0024;
          s->phy_int |= PHY_INT_ENERGYON;
          s->phy_int |= PHY_INT_AUTONEG_COMPLETE;
      }
 -    phy_update_irq(s);
 +    imx_phy_update_irq(s);
  }
  static void imx_eth_set_link(NetClientState *nc)
  {
 -    phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
 +    imx_phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
  }
 -static void phy_reset(IMXFECState *s)
 +static void imx_phy_reset(IMXFECState *s)
  {
 +    trace_imx_phy_reset();
 +
      s->phy_status = 0x7809;
      s->phy_control = 0x3000;
      s->phy_advertise = 0x01e1;
      s->phy_int_mask = 0;
      s->phy_int = 0;
 -    phy_update_link(s);
 +    imx_phy_update_link(s);
  }
 -static uint32_t do_phy_read(IMXFECState *s, int reg)
 +static uint32_t imx_phy_read(IMXFECState *s, int reg)
  {
      uint32_t val;
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
      case 29:    /* Interrupt source.  */
          val = s->phy_int;
          s->phy_int = 0;
 -        phy_update_irq(s);
 +        imx_phy_update_irq(s);
          break;
      case 30:    /* Interrupt mask */
          val = s->phy_int_mask;
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
          break;
      }
 -    PHY_PRINTF("read 0x%04x @ %d\n", val, reg);
 +    trace_imx_phy_read(val, reg);
      return val;
  }
 -static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
 +static void imx_phy_write(IMXFECState *s, int reg, uint32_t val)
  {
 -    PHY_PRINTF("write 0x%04x @ %d\n", val, reg);
 +    trace_imx_phy_write(val, reg);
      if (reg > 31) {
          /* we only advertise one phy */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
      switch (reg) {
      case 0:     /* Basic Control */
          if (val & 0x8000) {
 -            phy_reset(s);
 +            imx_phy_reset(s);
          } else {
              s->phy_control = val & 0x7980;
              /* Complete autonegotiation immediately.  */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
          break;
      case 30:    /* Interrupt mask */
          s->phy_int_mask = val & 0xff;
 -        phy_update_irq(s);
 +        imx_phy_update_irq(s);
          break;
      case 17:
      case 18:
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
  static void imx_fec_read_bd(IMXFECBufDesc *bd, dma_addr_t addr)
  {
      dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
 +
 +    trace_imx_fec_read_bd(addr, bd->flags, bd->length, bd->data);
  }
  static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
  static void imx_enet_read_bd(IMXENETBufDesc *bd, dma_addr_t addr)
  {
      dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
 +
 +    trace_imx_enet_read_bd(addr, bd->flags, bd->length, bd->data,
 +                   bd->option, bd->status);
  }
  static void imx_enet_write_bd(IMXENETBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_do_tx(IMXFECState *s)
          int len;
          imx_fec_read_bd(&bd, addr);
 -        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x\n",
 -                   addr, bd.flags, bd.length, bd.data);
          if ((bd.flags & ENET_BD_R) == 0) {
 +
              /* Run out of descriptors to transmit.  */
 -            FEC_PRINTF("tx_bd ran out of descriptors to transmit\n");
 +            trace_imx_eth_tx_bd_busy();
 +
              break;
          }
          len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_enet_do_tx(IMXFECState *s, uint32_t index)
          int len;
          imx_enet_read_bd(&bd, addr);
 -        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x option %04x "
 -                   "status %04x\n", addr, bd.flags, bd.length, bd.data,
 -                   bd.option, bd.status);
          if ((bd.flags & ENET_BD_R) == 0) {
              /* Run out of descriptors to transmit.  */
 +
 +            trace_imx_eth_tx_bd_busy();
 +
              break;
          }
          len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_eth_enable_rx(IMXFECState *s, bool flush)
      s->regs[ENET_RDAR] = (bd.flags & ENET_BD_E) ? ENET_RDAR_RDAR : 0;
      if (!s->regs[ENET_RDAR]) {
 -        FEC_PRINTF("RX buffer full\n");
 +        trace_imx_eth_rx_bd_full();
      } else if (flush) {
          qemu_flush_queued_packets(qemu_get_queue(s->nic));
      }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_reset(DeviceState *d)
      memset(s->tx_descriptor, 0, sizeof(s->tx_descriptor));
      /* We also reset the PHY */
 -    phy_reset(s);
 +    imx_phy_reset(s);
  }
  static uint32_t imx_default_read(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_eth_read(void *opaque, hwaddr offset, unsigned size)
          break;
      }
 -    FEC_PRINTF("reg[%s] => 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
 -                                              value);
 +    trace_imx_eth_read(index, imx_eth_reg_name(s, index), value);
      return value;
  }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
      const bool single_tx_ring = !imx_eth_is_multi_tx_ring(s);
      uint32_t index = offset >> 2;
 -    FEC_PRINTF("reg[%s] <= 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
 -                (uint32_t)value);
 +    trace_imx_eth_write(index, imx_eth_reg_name(s, index), value);
      switch (index) {
      case ENET_EIR:
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
          if (extract32(value, 29, 1)) {
              /* This is a read operation */
              s->regs[ENET_MMFR] = deposit32(s->regs[ENET_MMFR], 0, 16,
 -                                           do_phy_read(s,
 +                                           imx_phy_read(s,
                                                         extract32(value,
 , 10)));
          } else {
              /* This a write operation */
 -            do_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
 +            imx_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
          }
          /* raise the interrupt as the PHY operation is done */
          s->regs[ENET_EIR] |= ENET_INT_MII;
@@ -XXX,XX +XXX,XX @@ static bool imx_eth_can_receive(NetClientState *nc)
  {
      IMXFECState *s = IMX_FEC(qemu_get_nic_opaque(nc));
 -    FEC_PRINTF("\n");
 -
      return !!s->regs[ENET_RDAR];
  }
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
      unsigned int buf_len;
      size_t size = len;
 -    FEC_PRINTF("len %d\n", (int)size);
 +    trace_imx_fec_receive(size);
      if (!s->regs[ENET_RDAR]) {
          qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
          bd.length = buf_len;
          size -= buf_len;
 -        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
 +        trace_imx_fec_receive_len(addr, bd.length);
          /* The last 4 bytes are the CRC.  */
          if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
          if (size == 0) {
              /* Last buffer in frame.  */
              bd.flags |= flags | ENET_BD_L;
 -            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
 +
 +            trace_imx_fec_receive_last(bd.flags);
 +
              s->regs[ENET_EIR] |= ENET_INT_RXF;
          } else {
              s->regs[ENET_EIR] |= ENET_INT_RXB;
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
      size_t size = len;
      bool shift16 = s->regs[ENET_RACC] & ENET_RACC_SHIFT16;
 -    FEC_PRINTF("len %d\n", (int)size);
 +    trace_imx_enet_receive(size);
      if (!s->regs[ENET_RDAR]) {
          qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
          bd.length = buf_len;
          size -= buf_len;
 -        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
 +        trace_imx_enet_receive_len(addr, bd.length);
          /* The last 4 bytes are the CRC.  */
          if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
          if (size == 0) {
              /* Last buffer in frame.  */
              bd.flags |= flags | ENET_BD_L;
 -            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
 +
 +            trace_imx_enet_receive_last(bd.flags);
 +
              /* Indicate that we've updated the last buffer descriptor. */
              bd.last_buffer = ENET_BD_BDU;
              if (bd.option & ENET_BD_RX_INT) {
 diff --git a/hw/net/trace-events b/hw/net/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/net/trace-events
 +++ b/hw/net/trace-events
@@ -XXX,XX +XXX,XX @@ i82596_receive_packet(size_t sz) "len=%zu"
  i82596_new_mac(const char *id_with_mac) "New MAC for: %s"
  i82596_set_multicast(uint16_t count) "Added %d multicast entries"
  i82596_channel_attention(void *s) "%p: Received CHANNEL ATTENTION"
 +
 +# imx_fec.c
 +imx_phy_read(uint32_t val, int reg) "0x%04"PRIx32" <= reg[%d]"
 +imx_phy_write(uint32_t val, int reg) "0x%04"PRIx32" => reg[%d]"
 +imx_phy_update_link(const char *s) "%s"
 +imx_phy_reset(void) ""
 +imx_fec_read_bd(uint64_t addr, int flags, int len, int data) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x"
 +imx_enet_read_bd(uint64_t addr, int flags, int len, int data, int options, int status) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x option 0x%04x status 0x%04x"
 +imx_eth_tx_bd_busy(void) "tx_bd ran out of descriptors to transmit"
 +imx_eth_rx_bd_full(void) "RX buffer is full"
 +imx_eth_read(int reg, const char *reg_name, uint32_t value) "reg[%d:%s] => 0x%08"PRIx32
 +imx_eth_write(int reg, const char *reg_name, uint64_t value) "reg[%d:%s] <= 0x%08"PRIx64
 +imx_fec_receive(size_t size) "len %zu"
 +imx_fec_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
 +imx_fec_receive_last(int last) "rx frame flags 0x%04x"
 +imx_enet_receive(size_t size) "len %zu"
 +imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
 +imx_enet_receive_last(int last) "rx frame flags 0x%04x"
 --
 .20.1

-[PULL 18/23] hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+[PULL 15/26] target/arm: fix LORID_EL1 access check
-From: Jean-Christophe Dubois <jcd@tribudubois.net>
+From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-Some bits of the CCM registers are non writable.
+Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
 future HCR_EL2.TLOR when S-EL2 is enabled.
-This was left undone in the initial commit (all bits of registers were
+Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
 writable).
 This patch adds the required code to protect the non writable bits.
 Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
 Message-id: 20200608133508.550046-1-jcd@tribudubois.net
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/misc/imx6ul_ccm.c | 76 ++++++++++++++++++++++++++++++++++++--------
+ target/arm/helper.c | 19 +++++--------------
-file changed, 63 insertions(+), 13 deletions(-)
+file changed, 5 insertions(+), 14 deletions(-)
-diff --git a/hw/misc/imx6ul_ccm.c b/hw/misc/imx6ul_ccm.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/imx6ul_ccm.c
+--- a/target/arm/helper.c
-+++ b/hw/misc/imx6ul_ccm.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
+ #endif
- #include "trace.h"
+ /* Shared logic between LORID and the rest of the LOR* registers.
-+static const uint32_t ccm_mask[CCM_MAX] = {
+- * Secure state has already been delt with.
-+    [CCM_CCR] = 0xf01fef80,
++ * Secure state exclusion has already been dealt with.
-+    [CCM_CCDR] = 0xfffeffff,
+  */
-+    [CCM_CSR] = 0xffffffff,
+-static CPAccessResult access_lor_ns(CPUARMState *env)
-+    [CCM_CCSR] = 0xfffffef2,
++static CPAccessResult access_lor_ns(CPUARMState *env,
-+    [CCM_CACRR] = 0xfffffff8,
++                                    const ARMCPRegInfo *ri, bool isread)
 +    [CCM_CBCDR] = 0xc1f8e000,
 +    [CCM_CBCMR] = 0xfc03cfff,
 +    [CCM_CSCMR1] = 0x80700000,
 +    [CCM_CSCMR2] = 0xe01ff003,
 +    [CCM_CSCDR1] = 0xfe00c780,
 +    [CCM_CS1CDR] = 0xfe00fe00,
 +    [CCM_CS2CDR] = 0xf8007000,
 +    [CCM_CDCDR] = 0xf00fffff,
 +    [CCM_CHSCCDR] = 0xfffc01ff,
 +    [CCM_CSCDR2] = 0xfe0001ff,
 +    [CCM_CSCDR3] = 0xffffc1ff,
 +    [CCM_CDHIPR] = 0xffffffff,
 +    [CCM_CTOR] = 0x00000000,
 +    [CCM_CLPCR] = 0xf39ff01c,
 +    [CCM_CISR] = 0xfb85ffbe,
 +    [CCM_CIMR] = 0xfb85ffbf,
 +    [CCM_CCOSR] = 0xfe00fe00,
 +    [CCM_CGPR] = 0xfffc3fea,
 +    [CCM_CCGR0] = 0x00000000,
 +    [CCM_CCGR1] = 0x00000000,
 +    [CCM_CCGR2] = 0x00000000,
 +    [CCM_CCGR3] = 0x00000000,
 +    [CCM_CCGR4] = 0x00000000,
 +    [CCM_CCGR5] = 0x00000000,
 +    [CCM_CCGR6] = 0x00000000,
 +    [CCM_CMEOR] = 0xafffff1f,
 +};
 +
 +static const uint32_t analog_mask[CCM_ANALOG_MAX] = {
 +    [CCM_ANALOG_PLL_ARM] = 0xfff60f80,
 +    [CCM_ANALOG_PLL_USB1] = 0xfffe0fbc,
 +    [CCM_ANALOG_PLL_USB2] = 0xfffe0fbc,
 +    [CCM_ANALOG_PLL_SYS] = 0xfffa0ffe,
 +    [CCM_ANALOG_PLL_SYS_SS] = 0x00000000,
 +    [CCM_ANALOG_PLL_SYS_NUM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_SYS_DENOM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_AUDIO] = 0xffe20f80,
 +    [CCM_ANALOG_PLL_AUDIO_NUM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_AUDIO_DENOM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_VIDEO] = 0xffe20f80,
 +    [CCM_ANALOG_PLL_VIDEO_NUM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_VIDEO_DENOM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_ENET] = 0xffc20ff0,
 +    [CCM_ANALOG_PFD_480] = 0x40404040,
 +    [CCM_ANALOG_PFD_528] = 0x40404040,
 +    [PMU_MISC0] = 0x01fe8306,
 +    [PMU_MISC1] = 0x07fcede0,
 +    [PMU_MISC2] = 0x005f5f5f,
 +};
 +
  static const char *imx6ul_ccm_reg_name(uint32_t reg)
  {
-     static char unknown[20];
+     int el = arm_current_el(env);
-@@ -XXX,XX +XXX,XX @@ static void imx6ul_ccm_write(void *opaque, hwaddr offset, uint64_t value,
+@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
-     trace_ccm_write_reg(imx6ul_ccm_reg_name(index), (uint32_t)value);
+     return CP_ACCESS_OK;
 -    /*
 -     * We will do a better implementation later. In particular some bits
 -     * cannot be written to.
 -     */
 -    s->ccm[index] = (uint32_t)value;
 +    s->ccm[index] = (s->ccm[index] & ccm_mask[index]) |
 +                           ((uint32_t)value & ~ccm_mask[index]);
  }
- static uint64_t imx6ul_analog_read(void *opaque, hwaddr offset, unsigned size)
+-static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
-@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
+-                                   bool isread)
-          * the REG_NAME register. So we change the value of the
+-{
-          * REG_NAME register, setting bits passed in the value.
+-    if (arm_is_secure_below_el3(env)) {
-          */
+-        /* Access ok in secure mode.  */
--        s->analog[index - 1] |= value;
+-        return CP_ACCESS_OK;
-+        s->analog[index - 1] |= (value & ~analog_mask[index - 1]);
+-    }
-         break;
+-    return access_lor_ns(env);
-     case CCM_ANALOG_PLL_ARM_CLR:
+-}
-     case CCM_ANALOG_PLL_USB1_CLR:
+-
-@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
+ static CPAccessResult access_lor_other(CPUARMState *env,
-          * the REG_NAME register. So we change the value of the
+                                        const ARMCPRegInfo *ri, bool isread)
-          * REG_NAME register, unsetting bits passed in the value.
+ {
-          */
+@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
--        s->analog[index - 2] &= ~value;
+         /* Access denied in secure mode.  */
-+        s->analog[index - 2] &= ~(value & ~analog_mask[index - 2]);
+         return CP_ACCESS_TRAP;
          break;
      case CCM_ANALOG_PLL_ARM_TOG:
      case CCM_ANALOG_PLL_USB1_TOG:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
           * the REG_NAME register. So we change the value of the
           * REG_NAME register, toggling bits passed in the value.
           */
 -        s->analog[index - 3] ^= value;
 +        s->analog[index - 3] ^= (value & ~analog_mask[index - 3]);
          break;
      default:
 -        /*
 -         * We will do a better implementation later. In particular some bits
 -         * cannot be written to.
 -         */
 -        s->analog[index] = value;
 +        s->analog[index] = (s->analog[index] & analog_mask[index]) |
 +                           (value & ~analog_mask[index]);
          break;
      }
+-    return access_lor_ns(env);
++    return access_lor_ns(env, ri, isread);
  }
+ /*
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
+       .type = ARM_CP_CONST, .resetvalue = 0 },
+     { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
+-      .access = PL1_R, .accessfn = access_lorid,
++      .access = PL1_R, .accessfn = access_lor_ns,
+       .type = ARM_CP_CONST, .resetvalue = 0 },
+     REGINFO_SENTINEL
+ };
 --
 .20.1

-New patch
+[PULL 16/26] disas/capstone: Fix monitor disassembly of >32 bytes
+If we're using the capstone disassembler, disassembly of a run of
+instructions more than 32 bytes long disassembles the wrong data for
+instructions beyond the 32 byte mark:
+(qemu) xp /16x 0x100
+0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
+0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
+0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
+0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
+(qemu) xp /16i 0x100
+x00000100: 00000005 andeq r0, r0, r5
+x00000104: 54410001 strbpl r0, [r1], #-1
+x00000108: 00000001 andeq r0, r0, r1
+x0000010c: 00001000 andeq r1, r0, r0
+x00000110: 00000000 andeq r0, r0, r0
+x00000114: 00000004 andeq r0, r0, r4
+x00000118: 54410002 strbpl r0, [r1], #-2
+x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
+x00000120: 54410001 strbpl r0, [r1], #-1
+x00000124: 00000001 andeq r0, r0, r1
+x00000128: 00001000 andeq r1, r0, r0
+x0000012c: 00000000 andeq r0, r0, r0
+x00000130: 00000004 andeq r0, r0, r4
+x00000134: 54410002 strbpl r0, [r1], #-2
+x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
+x0000013c: 00000000 andeq r0, r0, r0
+Here the disassembly of 0x120..0x13f is using the data that is in
+x104..0x123.
+This is caused by passing the wrong value to the read_memory_func().
+The intention is that at this point in the loop the 'cap_buf' buffer
+already contains 'csize' bytes of data for the instruction at guest
+addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
+extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
+time through the loop 'csize' happens to be zero, so the initial read
+of 32 bytes into cap_buf is correct and as long as the disassembly
+never needs to read more data we return the correct information.
+Use the correct guest address in the call to read_memory_func().
+Cc: qemu-stable@nongnu.org
+Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
+---
+ disas/capstone.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/disas/capstone.c b/disas/capstone.c
+index XXXXXXX..XXXXXXX 100644
+--- a/disas/capstone.c
++++ b/disas/capstone.c
+@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
+         /* Make certain that we can make progress.  */
+         assert(tsize != 0);
+-        info->read_memory_func(pc, cap_buf + csize, tsize, info);
++        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
+         csize += tsize;
+         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
+--
+.20.1

-[PULL 19/23] Implement configurable descriptor size in ftgmac100
+[PULL 17/26] hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
-From: Erik Smit <erik.lucas.smit@gmail.com>
+From: Philippe Mathieu-Daudé <philmd@redhat.com>
-The hardware supports configurable descriptor sizes, configured in the DBLAC
+Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
-register.
+This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):
-Most drivers use the default 4 word descriptor, which is currently hardcoded,
+  CID 1432363 (#1 of 1): Unintentional integer overflow:
 but Aspeed SDK configures 8 words to store extra data.
-Signed-off-by: Erik Smit <erik.lucas.smit@gmail.com>
+  overflow_before_widen:
-Reviewed-by: Cédric Le Goater <clg@kaod.org>
+    Potentially overflowing expression 1 << scale with type int
-[PMM: removed unnecessary parens]
+    (32 bits, signed) is evaluated using 32-bit arithmetic, and
     then used in a context that expects an expression of type
     hwaddr (64 bits, unsigned).
 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Acked-by: Eric Auger <eric.auger@redhat.com>
 Message-id: 20201030144617.1535064-1-philmd@redhat.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/net/ftgmac100.c | 26 ++++++++++++++++++++++++--
+ hw/arm/smmuv3.c | 3 ++-
-file changed, 24 insertions(+), 2 deletions(-)
+file changed, 2 insertions(+), 1 deletion(-)
-diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
+diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/ftgmac100.c
+--- a/hw/arm/smmuv3.c
-+++ b/hw/net/ftgmac100.c
++++ b/hw/arm/smmuv3.c
 @@ -XXX,XX +XXX,XX @@
- #define FTGMAC100_APTC_TXPOLL_CNT(x)        (((x) >> 8) & 0xf)
- #define FTGMAC100_APTC_TXPOLL_TIME_SEL      (1 << 12)
-+/*
-+ * DMA burst length and arbitration control register
-+ */
-+#define FTGMAC100_DBLAC_RXBURST_SIZE(x)     (((x) >> 8) & 0x3)
-+#define FTGMAC100_DBLAC_TXBURST_SIZE(x)     (((x) >> 10) & 0x3)
-+#define FTGMAC100_DBLAC_RXDES_SIZE(x)       ((((x) >> 12) & 0xf) * 8)
-+#define FTGMAC100_DBLAC_TXDES_SIZE(x)       ((((x) >> 16) & 0xf) * 8)
-+#define FTGMAC100_DBLAC_IFG_CNT(x)          (((x) >> 20) & 0x7)
-+#define FTGMAC100_DBLAC_IFG_INC             (1 << 23)
-+
- /*
-  * PHY control register
   */
-@@ -XXX,XX +XXX,XX @@ static void ftgmac100_do_tx(FTGMAC100State *s, uint32_t tx_ring,
-         if (bd.des0 & s->txdes0_edotr) {
+ #include "qemu/osdep.h"
-             addr = tx_ring;
++#include "qemu/bitops.h"
-         } else {
+ #include "hw/irq.h"
--            addr += sizeof(FTGMAC100Desc);
+ #include "hw/sysbus.h"
-+            addr += FTGMAC100_DBLAC_TXDES_SIZE(s->dblac);
+ #include "migration/vmstate.h"
-         }
+@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
          scale = CMD_SCALE(cmd);
          num = CMD_NUM(cmd);
          ttl = CMD_TTL(cmd);
 -        num_pages = (num + 1) * (1 << (scale));
 +        num_pages = (num + 1) * BIT_ULL(scale);
      }
-@@ -XXX,XX +XXX,XX @@ static void ftgmac100_write(void *opaque, hwaddr addr,
+     if (type == SMMU_CMD_TLBI_NH_VA) {
          s->phydata = value & 0xffff;
          break;
      case FTGMAC100_DBLAC: /* DMA Burst Length and Arbitration Control */
 +        if (FTGMAC100_DBLAC_TXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
 +            qemu_log_mask(LOG_GUEST_ERROR,
 +                          "%s: transmit descriptor too small : %d bytes\n",
 +                          __func__, FTGMAC100_DBLAC_TXDES_SIZE(s->dblac));
 +            break;
 +        }
 +        if (FTGMAC100_DBLAC_RXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
 +            qemu_log_mask(LOG_GUEST_ERROR,
 +                          "%s: receive descriptor too small : %d bytes\n",
 +                          __func__, FTGMAC100_DBLAC_RXDES_SIZE(s->dblac));
 +            break;
 +        }
          s->dblac = value;
          break;
      case FTGMAC100_REVR:  /* Feature Register */
@@ -XXX,XX +XXX,XX @@ static ssize_t ftgmac100_receive(NetClientState *nc, const uint8_t *buf,
          if (bd.des0 & s->rxdes0_edorr) {
              addr = s->rx_ring;
          } else {
 -            addr += sizeof(FTGMAC100Desc);
 +            addr += FTGMAC100_DBLAC_RXDES_SIZE(s->dblac);
          }
      }
      s->rx_descriptor = addr;
 --
 .20.1

-[PULL 22/23] sd: sdhci: Implement basic vendor specific register support
+[PULL 18/26] hw/arm/boot: fix SVE for EL3 direct kernel boot
-From: Guenter Roeck <linux@roeck-us.net>
+From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-The Linux kernel's IMX code now uses vendor specific commands.
+When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
-This results in endless warnings when booting the Linux kernel.
+that SVE will not trap to EL3.
-sdhci-esdhc-imx 2194000.usdhc: esdhc_wait_for_card_clock_gate_off:
+Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-    card clock still not gate off in 100us!.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20201030151541.11976-1-remi@remlab.net
 Implement support for the vendor specific command implemented in IMX hardware
 to be able to avoid this warning.
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Guenter Roeck <linux@roeck-us.net>
 Message-id: 20200603145258.195920-2-linux@roeck-us.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/sd/sdhci-internal.h |  5 +++++
+ hw/arm/boot.c | 3 +++
- include/hw/sd/sdhci.h  |  5 +++++
+file changed, 3 insertions(+)
  hw/sd/sdhci.c          | 18 +++++++++++++++++-
 files changed, 27 insertions(+), 1 deletion(-)
-diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
+diff --git a/hw/arm/boot.c b/hw/arm/boot.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/sd/sdhci-internal.h
+--- a/hw/arm/boot.c
-+++ b/hw/sd/sdhci-internal.h
++++ b/hw/arm/boot.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
- #define SDHC_CMD_INHIBIT               0x00000001
+                     if (cpu_isar_feature(aa64_mte, cpu)) {
- #define SDHC_DATA_INHIBIT              0x00000002
+                         env->cp15.scr_el3 |= SCR_ATA;
- #define SDHC_DAT_LINE_ACTIVE           0x00000004
+                     }
-+#define SDHC_IMX_CLOCK_GATE_OFF        0x00000080
++                    if (cpu_isar_feature(aa64_sve, cpu)) {
- #define SDHC_DOING_WRITE               0x00000100
++                        env->cp15.cptr_el[3] |= CPTR_EZ;
- #define SDHC_DOING_READ                0x00000200
++                    }
- #define SDHC_SPACE_AVAILABLE           0x00000400
+                     /* AArch64 kernels never boot in secure mode */
-@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
+                     assert(!info->secure_boot);
+                     /* This hook is only supported for AArch32 currently:
  #define ESDHC_MIX_CTRL                  0x48
 +
  #define ESDHC_VENDOR_SPEC               0xc0
 +#define ESDHC_IMX_FRC_SDCLK_ON          (1 << 8)
 +
  #define ESDHC_DLL_CTRL                  0x60
  #define ESDHC_TUNING_CTRL               0xcc
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
  #define DEFINE_SDHCI_COMMON_PROPERTIES(_state) \
      DEFINE_PROP_UINT8("sd-spec-version", _state, sd_spec_version, 2), \
      DEFINE_PROP_UINT8("uhs", _state, uhs_mode, UHS_NOT_SUPPORTED), \
 +    DEFINE_PROP_UINT8("vendor", _state, vendor, SDHCI_VENDOR_NONE), \
      \
      /* Capabilities registers provide information on supported
       * features of this specific host controller implementation */ \
 diff --git a/include/hw/sd/sdhci.h b/include/hw/sd/sdhci.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/sd/sdhci.h
 +++ b/include/hw/sd/sdhci.h
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
      uint16_t acmd12errsts; /* Auto CMD12 error status register */
      uint16_t hostctl2;     /* Host Control 2 */
      uint64_t admasysaddr;  /* ADMA System Address Register */
 +    uint16_t vendor_spec;  /* Vendor specific register */
      /* Read-only registers */
      uint64_t capareg;      /* Capabilities Register */
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
      uint32_t quirks;
      uint8_t sd_spec_version;
      uint8_t uhs_mode;
 +    uint8_t vendor;        /* For vendor specific functionality */
  } SDHCIState;
 +#define SDHCI_VENDOR_NONE       0
 +#define SDHCI_VENDOR_IMX        1
 +
  /*
   * Controller does not provide transfer-complete interrupt when not
   * busy.
 diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/sd/sdhci.c
 +++ b/hw/sd/sdhci.c
@@ -XXX,XX +XXX,XX @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
          }
          break;
 +    case ESDHC_VENDOR_SPEC:
 +        ret = s->vendor_spec;
 +        break;
      case ESDHC_DLL_CTRL:
      case ESDHC_TUNE_CTRL_STATUS:
      case ESDHC_UNDOCUMENTED_REG27:
      case ESDHC_TUNING_CTRL:
 -    case ESDHC_VENDOR_SPEC:
      case ESDHC_MIX_CTRL:
      case ESDHC_WTMK_LVL:
          ret = 0;
@@ -XXX,XX +XXX,XX @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
      case ESDHC_UNDOCUMENTED_REG27:
      case ESDHC_TUNING_CTRL:
      case ESDHC_WTMK_LVL:
 +        break;
 +
      case ESDHC_VENDOR_SPEC:
 +        s->vendor_spec = value;
 +        switch (s->vendor) {
 +        case SDHCI_VENDOR_IMX:
 +            if (value & ESDHC_IMX_FRC_SDCLK_ON) {
 +                s->prnsts &= ~SDHC_IMX_CLOCK_GATE_OFF;
 +            } else {
 +                s->prnsts |= SDHC_IMX_CLOCK_GATE_OFF;
 +            }
 +            break;
 +        default:
 +            break;
 +        }
          break;
      case SDHC_HOSTCTL:
 --
 .20.1

-[PULL 23/23] hw: arm: Set vendor property for IMX SDHCI emulations
+[PULL 19/26] hw/display/omap_lcdc: Fix potential NULL pointer dereference
-From: Guenter Roeck <linux@roeck-us.net>
+From: AlexChen <alex.chen@huawei.com>
-Set vendor property to IMX to enable IMX specific functionality
+In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
-in sdhci code.
+being check if it is valid, which may lead to NULL pointer dereference.
 So move the assignment to surface after checking that the omap_lcd is valid
 and move surface_bits_per_pixel(surface) to after the surface assignment.
-Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reported-by: Euler Robot <euler.robot@huawei.com>
-Signed-off-by: Guenter Roeck <linux@roeck-us.net>
+Signed-off-by: AlexChen <alex.chen@huawei.com>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 5F9CDB8A.9000001@huawei.com
-Message-id: 20200603145258.195920-3-linux@roeck-us.net
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/fsl-imx25.c  | 6 ++++++
+ hw/display/omap_lcdc.c | 10 +++++++---
- hw/arm/fsl-imx6.c   | 6 ++++++
+file changed, 7 insertions(+), 3 deletions(-)
  hw/arm/fsl-imx6ul.c | 2 ++
  hw/arm/fsl-imx7.c   | 2 ++
 files changed, 16 insertions(+)
-diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
+diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/fsl-imx25.c
+--- a/hw/display/omap_lcdc.c
-+++ b/hw/arm/fsl-imx25.c
++++ b/hw/display/omap_lcdc.c
-@@ -XXX,XX +XXX,XX @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
-                                  &err);
+ static void omap_update_display(void *opaque)
-         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX25_ESDHC_CAPABILITIES,
+ {
-                                  "capareg", &err);
+     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
-+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+-    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
-+                                 "vendor", &err);
++    DisplaySurface *surface;
-+        if (err) {
+     draw_line_func draw_line;
-+            error_propagate(errp, err);
+     int size, height, first, last;
-+            return;
+     int width, linesize, step, bpp, frame_offset;
-+        }
+     hwaddr frame_base;
-         object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
-         if (err) {
+-    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
-             error_propagate(errp, err);
+-        !surface_bits_per_pixel(surface)) {
-diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
++    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
-index XXXXXXX..XXXXXXX 100644
++        return;
---- a/hw/arm/fsl-imx6.c
++    }
-+++ b/hw/arm/fsl-imx6.c
++
-@@ -XXX,XX +XXX,XX @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
++    surface = qemu_console_surface(omap_lcd->con);
-                                  &err);
++    if (!surface_bits_per_pixel(surface)) {
-         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX6_ESDHC_CAPABILITIES,
+         return;
-                                  "capareg", &err);
+     }
 +        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
 +                                 "vendor", &err);
 +        if (err) {
 +            error_propagate(errp, err);
 +            return;
 +        }
          object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
          if (err) {
              error_propagate(errp, err);
 diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/fsl-imx6ul.c
 +++ b/hw/arm/fsl-imx6ul.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
              FSL_IMX6UL_USDHC2_IRQ,
          };
 +        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
 +                                        "vendor", &error_abort);
          object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                   &error_abort);
 diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/fsl-imx7.c
 +++ b/hw/arm/fsl-imx7.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
              FSL_IMX7_USDHC3_IRQ,
          };
 +        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
 +                                 "vendor", &error_abort);
          object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                   &error_abort);
 --
 .20.1

-[PULL 20/23] target/arm/cpu: adjust virtual time for all KVM arm cpus
+[PULL 20/26] hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
-From: fangying <fangying1@huawei.com>
+From: AlexChen <alex.chen@huawei.com>
-Virtual time adjustment was implemented for virt-5.0 machine type,
+In exynos4210_fimd_update(), the pointer s is dereferinced before
-but the cpu property was enabled only for host-passthrough and max
+being check if it is valid, which may lead to NULL pointer dereference.
-cpu model.  Let's add it for any KVM arm cpu which has the generic
+So move the assignment to global_width after checking that the s is valid.
 timer feature enabled.
-Signed-off-by: Ying Fang <fangying1@huawei.com>
+Reported-by: Euler Robot <euler.robot@huawei.com>
-Reviewed-by: Andrew Jones <drjones@redhat.com>
+Signed-off-by: Alex Chen <alex.chen@huawei.com>
-Message-id: 20200608121243.2076-1-fangying1@huawei.com
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-[PMM: minor commit message tweak, removed inaccurate
+Message-id: 5F9F8D88.9030102@huawei.com
  suggested-by tag]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.c   |  6 ++++--
+ hw/display/exynos4210_fimd.c | 4 +++-
- target/arm/cpu64.c |  1 -
+file changed, 3 insertions(+), 1 deletion(-)
  target/arm/kvm.c   | 21 +++++++++++----------
 files changed, 15 insertions(+), 13 deletions(-)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/hw/display/exynos4210_fimd.c
-+++ b/target/arm/cpu.c
++++ b/hw/display/exynos4210_fimd.c
-@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
-     if (arm_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER)) {
+     bool blend = false;
-         qdev_property_add_static(DEVICE(cpu), &arm_cpu_gt_cntfrq_property);
+     uint8_t *host_fb_addr;
      bool is_dirty = false;
 -    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
 +    int global_width;
      if (!s || !s->console || !s->enabled ||
          surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
          return;
      }
 +
-+    if (kvm_enabled()) {
++    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
-+        kvm_arm_add_vcpu_properties(obj);
+     exynos4210_update_resolution(s);
-+    }
+     surface = qemu_console_surface(s->console);
- }
  static void arm_cpu_finalizefn(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
      if (kvm_enabled()) {
          kvm_arm_set_cpu_features_from_host(cpu);
 -        kvm_arm_add_vcpu_properties(obj);
      } else {
          cortex_a15_initfn(obj);
@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
      if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
          aarch64_add_sve_properties(obj);
      }
 -    kvm_arm_add_vcpu_properties(obj);
      arm_cpu_post_init(obj);
  }
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
      if (kvm_enabled()) {
          kvm_arm_set_cpu_features_from_host(cpu);
 -        kvm_arm_add_vcpu_properties(obj);
      } else {
          uint64_t t;
          uint32_t u;
 diff --git a/target/arm/kvm.c b/target/arm/kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm.c
 +++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp)
  /* KVM VCPU properties should be prefixed with "kvm-". */
  void kvm_arm_add_vcpu_properties(Object *obj)
  {
 -    if (!kvm_enabled()) {
 -        return;
 -    }
 +    ARMCPU *cpu = ARM_CPU(obj);
 +    CPUARMState *env = &cpu->env;
 -    ARM_CPU(obj)->kvm_adjvtime = true;
 -    object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
 -                             kvm_no_adjvtime_set);
 -    object_property_set_description(obj, "kvm-no-adjvtime",
 -                                    "Set on to disable the adjustment of "
 -                                    "the virtual counter. VM stopped time "
 -                                    "will be counted.");
 +    if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
 +        cpu->kvm_adjvtime = true;
 +        object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
 +                                 kvm_no_adjvtime_set);
 +        object_property_set_description(obj, "kvm-no-adjvtime",
 +                                        "Set on to disable the adjustment of "
 +                                        "the virtual counter. VM stopped time "
 +                                        "will be counted.");
 +    }
  }
  bool kvm_arm_pmu_supported(CPUState *cpu)
 --
 .20.1

-[PULL 13/23] target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
+[PULL 21/26] target/arm: Get correct MMU index for other-security-state
-Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
+In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
-group to decodetree.
+armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
 This is incorrect when the security state being queried is not the
 current one, because arm_current_el() uses the current security state
 to determine which of the banked CONTROL.nPRIV bits to look at.
 The effect was that if (for instance) Secure state was in privileged
 mode but Non-Secure was not then we would return the wrong MMU index.
 The only places where we are using this function in a way that could
 trigger this bug are for the stack loads during a v8M function-return
 and for the instruction fetch of a v8M SG insn.
 Fix the bug by expanding out the M-profile version of the
 arm_current_el() logic inline so it can use the passed in secstate
 rather than env->v7m.secure.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  3 ++
+ target/arm/m_helper.c | 3 ++-
- target/arm/translate-neon.inc.c | 74 +++++++++++++++++++++++++++++++++
+file changed, 2 insertions(+), 1 deletion(-)
  target/arm/translate.c          | 38 +----------------
 files changed, 79 insertions(+), 36 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/m_helper.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/m_helper.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
+ /* Return the MMU index for a v7M CPU in the specified security state */
-     VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
+ ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
-     VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
+ {
-+
+-    bool priv = arm_current_el(env) != 0;
-+    VQRDMLAH_2sc 1111 001 . 1 . .. .... .... 1110 . 1 . 0 .... @2scalar
++    bool priv = arm_v7m_is_handler_mode(env) ||
-+    VQRDMLSH_2sc 1111 001 . 1 . .. .... .... 1111 . 1 . 0 .... @2scalar
++        !(env->v7m.control[secstate] & 1);
-   ]
      return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
  }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
-     return do_2scalar(s, a, opfn[a->size], NULL);
- }
-+
-+static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
-+                            NeonGenThreeOpEnvFn *opfn)
-+{
-+    /*
-+     * VQRDMLAH/VQRDMLSH: this is like do_2scalar, but the opfn
-+     * performs a kind of fused op-then-accumulate using a helper
-+     * function that takes all of rd, rn and the scalar at once.
-+     */
-+    TCGv_i32 scalar;
-+    int pass;
-+
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    if (!dc_isar_feature(aa32_rdm, s)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!opfn) {
-+        /* Bad size (including size == 3, which is a different insn group) */
-+        return false;
-+    }
-+
-+    if (a->q && ((a->vd | a->vn) & 1)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    scalar = neon_get_scalar(a->size, a->vm);
-+
-+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-+        TCGv_i32 rn = neon_load_reg(a->vn, pass);
-+        TCGv_i32 rd = neon_load_reg(a->vd, pass);
-+        opfn(rd, cpu_env, rn, scalar, rd);
-+        tcg_temp_free_i32(rn);
-+        neon_store_reg(a->vd, pass, rd);
-+    }
-+    tcg_temp_free_i32(scalar);
-+
-+    return true;
-+}
-+
-+static bool trans_VQRDMLAH_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenThreeOpEnvFn *opfn[] = {
-+        NULL,
-+        gen_helper_neon_qrdmlah_s16,
-+        gen_helper_neon_qrdmlah_s32,
-+        NULL,
-+    };
-+    return do_vqrdmlah_2sc(s, a, opfn[a->size]);
-+}
-+
-+static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenThreeOpEnvFn *opfn[] = {
-+        NULL,
-+        gen_helper_neon_qrdmlsh_s16,
-+        gen_helper_neon_qrdmlsh_s32,
-+        NULL,
-+    };
-+    return do_vqrdmlah_2sc(s, a, opfn[a->size]);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                 case 9: /* Floating point VMUL scalar */
-                 case 12: /* VQDMULH scalar */
-                 case 13: /* VQRDMULH scalar */
-+                case 14: /* VQRDMLAH scalar */
-+                case 15: /* VQRDMLSH scalar */
-                     return 1; /* handled by decodetree */
-                 case 3: /* VQDMLAL scalar */
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         neon_store_reg64(cpu_V0, rd + pass);
-                     }
-                     break;
--                case 14: /* VQRDMLAH scalar */
--                case 15: /* VQRDMLSH scalar */
--                    {
--                        NeonGenThreeOpEnvFn *fn;
--
--                        if (!dc_isar_feature(aa32_rdm, s)) {
--                            return 1;
--                        }
--                        if (u && ((rd | rn) & 1)) {
--                            return 1;
--                        }
--                        if (op == 14) {
--                            if (size == 1) {
--                                fn = gen_helper_neon_qrdmlah_s16;
--                            } else {
--                                fn = gen_helper_neon_qrdmlah_s32;
--                            }
--                        } else {
--                            if (size == 1) {
--                                fn = gen_helper_neon_qrdmlsh_s16;
--                            } else {
--                                fn = gen_helper_neon_qrdmlsh_s32;
--                            }
--                        }
--
--                        tmp2 = neon_get_scalar(size, rm);
--                        for (pass = 0; pass < (u ? 4 : 2); pass++) {
--                            tmp = neon_load_reg(rn, pass);
--                            tmp3 = neon_load_reg(rd, pass);
--                            fn(tmp, cpu_env, tmp, tmp2, tmp3);
--                            tcg_temp_free_i32(tmp3);
--                            neon_store_reg(rd, pass, tmp);
--                        }
--                        tcg_temp_free_i32(tmp2);
--                    }
--                    break;
-                 default:
-                     g_assert_not_reached();
-                 }
 --
 .20.1

-[PULL 09/23] target/arm: Add missing TCG temp free in do_2shift_env_64()
+[PULL 22/26] configure: Test that gio libs from pkg-config work
-In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
+On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
-temporary in do_2shift_env_64(); free it.
+libraries for gio-2.0 which don't actually work when compiling
 statically. (Specifically, the returned library string includes
 -lmount, but not -lblkid which -lmount depends upon, so linking
 fails due to missing symbols.)
 Check that the libraries work, and don't enable gio if they don't,
 in the same way we do for gnutls.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
 ---
- target/arm/translate-neon.inc.c | 1 +
+ configure | 10 +++++++++-
-file changed, 1 insertion(+)
+file changed, 9 insertions(+), 1 deletion(-)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+diff --git a/configure b/configure
-index XXXXXXX..XXXXXXX 100644
+index XXXXXXX..XXXXXXX 100755
---- a/target/arm/translate-neon.inc.c
+--- a/configure
-+++ b/target/arm/translate-neon.inc.c
++++ b/configure
-@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
+@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
-         neon_load_reg64(tmp, a->vm + pass);
+ fi
-         fn(tmp, cpu_env, tmp, constimm);
-         neon_store_reg64(tmp, a->vd + pass);
+ if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
-+        tcg_temp_free_i64(tmp);
+-    gio=yes
-     }
+     gio_cflags=$($pkg_config --cflags gio-2.0)
-     tcg_temp_free_i64(constimm);
+     gio_libs=$($pkg_config --libs gio-2.0)
-     return true;
+     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
      if [ ! -x "$gdbus_codegen" ]; then
          gdbus_codegen=
      fi
 +    # Check that the libraries actually work -- Ubuntu 18.04 ships
 +    # with pkg-config --static --libs data for gio-2.0 that is missing
 +    # -lblkid and will give a link error.
 +    write_c_skeleton
 +    if compile_prog "" "gio_libs" ; then
 +        gio=yes
 +    else
 +        gio=no
 +    fi
  else
      gio=no
  fi
 --
 .20.1

-New patch
+[PULL 23/26] hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
+In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
+into the GICv3CPUState struct's maintenance_irq field.  This will
+only work if the board happens to have already wired up the CPU
+maintenance IRQ before the GIC was realized.  Unfortunately this is
+not the case for the 'virt' board, and so the value that gets copied
+is NULL (since a qemu_irq is really a pointer to an IRQState struct
+under the hood).  The effect is that the CPU interface code never
+actually raises the maintenance interrupt line.
+Instead, since the GICv3CPUState has a pointer to the CPUState, make
+the dereference at the point where we want to raise the interrupt, to
+avoid an implicit requirement on board code to wire things up in a
+particular order.
+Reported-by: Jose Martins <josemartins90@gmail.com>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
+Reviewed-by: Luc Michel <luc@lmichel.fr>
+---
+ include/hw/intc/arm_gicv3_common.h | 1 -
+ hw/intc/arm_gicv3_cpuif.c          | 5 ++---
+files changed, 2 insertions(+), 4 deletions(-)
+diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/hw/intc/arm_gicv3_common.h
++++ b/include/hw/intc/arm_gicv3_common.h
+@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
+     qemu_irq parent_fiq;
+     qemu_irq parent_virq;
+     qemu_irq parent_vfiq;
+-    qemu_irq maintenance_irq;
+     /* Redistributor */
+     uint32_t level;                  /* Current IRQ level */
+diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/intc/arm_gicv3_cpuif.c
++++ b/hw/intc/arm_gicv3_cpuif.c
+@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
+     int irqlevel = 0;
+     int fiqlevel = 0;
+     int maintlevel = 0;
++    ARMCPU *cpu = ARM_CPU(cs->cpu);
+     idx = hppvi_index(cs);
+     trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
+@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
+     qemu_set_irq(cs->parent_vfiq, fiqlevel);
+     qemu_set_irq(cs->parent_virq, irqlevel);
+-    qemu_set_irq(cs->maintenance_irq, maintlevel);
++    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
+ }
+ static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
+@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
+             && cpu->gic_num_lrs) {
+             int j;
+-            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
+-
+             cs->num_list_regs = cpu->gic_num_lrs;
+             cs->vpribits = cpu->gic_vpribits;
+             cs->vprebits = cpu->gic_vprebits;
+--
+.20.1

-[PULL 05/23] target/arm: Convert Neon 3-reg-diff long multiplies
+[PULL 24/26] scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
-Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
+The kerneldoc script currently emits Sphinx markup for a macro with
-a 32x32->64 multiply with possible accumulate.
+arguments that uses the c:function directive. This is correct for
 Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
 documentation of macros with arguments and c:function is not picky
 about the syntax of what it is passed. However, in Sphinx 3 the
 c:macro directive was enhanced to support macros with arguments,
 and c:function was made more picky about what syntax it accepted.
-Note that for VMLSL we do the accumulate directly with a subtraction
+When kerneldoc is told that it needs to produce output for Sphinx
-rather than doing a negate-then-add as the old code did.
+or later, make it emit c:function only for functions and c:macro
 for macros with arguments. We assume that anything with a return
 type is a function and anything without is a macro.
 This fixes the Sphinx error:
 /home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
 If declarator-id with parameters (e.g., 'void f(int arg)'):
   Invalid C declaration: Expected identifier in nested name. [error at 25]
     DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
     -------------------------^
 If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
   Error in declarator or parameters
   Invalid C declaration: Expecting "(" in parameters. [error at 39]
     DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
     ---------------------------------------^
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
 Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  9 +++++
+ scripts/kernel-doc | 18 +++++++++++++++++-
- target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
+file changed, 17 insertions(+), 1 deletion(-)
  target/arm/translate.c          | 21 +++-------
 files changed, 86 insertions(+), 15 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/scripts/kernel-doc b/scripts/kernel-doc
-index XXXXXXX..XXXXXXX 100644
+index XXXXXXX..XXXXXXX 100755
---- a/target/arm/neon-dp.decode
+--- a/scripts/kernel-doc
-+++ b/target/arm/neon-dp.decode
++++ b/scripts/kernel-doc
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
+     output_highlight_rst($args{'purpose'});
-     VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
+     $start = "\n\n**Syntax**\n\n  ``";
-     VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
+     } else {
-+
+-    print ".. c:function:: ";
-+    VMLAL_S_3d   1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
++        if ((split(/\./, $sphinx_version))[0] >= 3) {
-+    VMLAL_U_3d   1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
++            # Sphinx 3 and later distinguish macros and functions and
-+
++            # complain if you use c:function with something that's not
-+    VMLSL_S_3d   1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
++            # syntactically valid as a function declaration.
-+    VMLSL_U_3d   1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
++            # We assume that anything with a return type is a function
-+
++            # and anything without is a macro.
-+    VMULL_S_3d   1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
++            if ($args{'functiontype'} ne "") {
-+    VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
++                print ".. c:function:: ";
-   ]
++            } else {
- }
++                print ".. c:macro:: ";
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
++            }
-index XXXXXXX..XXXXXXX 100644
++        } else {
---- a/target/arm/translate-neon.inc.c
++            # Older Sphinx don't support documenting macros that take
-+++ b/target/arm/translate-neon.inc.c
++            # arguments with c:macro, and don't complain about the use
-@@ -XXX,XX +XXX,XX @@ static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
++            # of c:function for this.
++            print ".. c:function:: ";
-     return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
++        }
- }
+     }
-+
+     if ($args{'functiontype'} ne "") {
-+static void gen_mull_s32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
+     $start .= $args{'functiontype'} . " " . $args{'function'} . " (";
 +{
 +    TCGv_i32 lo = tcg_temp_new_i32();
 +    TCGv_i32 hi = tcg_temp_new_i32();
 +
 +    tcg_gen_muls2_i32(lo, hi, rn, rm);
 +    tcg_gen_concat_i32_i64(rd, lo, hi);
 +
 +    tcg_temp_free_i32(lo);
 +    tcg_temp_free_i32(hi);
 +}
 +
 +static void gen_mull_u32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
 +{
 +    TCGv_i32 lo = tcg_temp_new_i32();
 +    TCGv_i32 hi = tcg_temp_new_i32();
 +
 +    tcg_gen_mulu2_i32(lo, hi, rn, rm);
 +    tcg_gen_concat_i32_i64(rd, lo, hi);
 +
 +    tcg_temp_free_i32(lo);
 +    tcg_temp_free_i32(hi);
 +}
 +
 +static bool trans_VMULL_S_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_mull_s8,
 +        gen_helper_neon_mull_s16,
 +        gen_mull_s32,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VMULL_U_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_mull_u8,
 +        gen_helper_neon_mull_u16,
 +        gen_mull_u32,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], NULL);
 +}
 +
 +#define DO_VMLAL(INSN,MULL,ACC)                                         \
 +    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
 +    {                                                                   \
 +        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
 +            gen_helper_neon_##MULL##8,                                  \
 +            gen_helper_neon_##MULL##16,                                 \
 +            gen_##MULL##32,                                             \
 +            NULL,                                                       \
 +        };                                                              \
 +        static NeonGenTwo64OpFn * const accfn[] = {                     \
 +            gen_helper_neon_##ACC##l_u16,                               \
 +            gen_helper_neon_##ACC##l_u32,                               \
 +            tcg_gen_##ACC##_i64,                                        \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_long_3d(s, a, opfn[a->size], accfn[a->size]);         \
 +    }
 +
 +DO_VMLAL(VMLAL_S,mull_s,add)
 +DO_VMLAL(VMLAL_U,mull_u,add)
 +DO_VMLAL(VMLSL_S,mull_s,sub)
 +DO_VMLAL(VMLSL_U,mull_u,sub)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* VABAL */
                      {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                      {0, 0, 0, 7}, /* VABDL */
 -                    {0, 0, 0, 0}, /* VMLAL */
 +                    {0, 0, 0, 7}, /* VMLAL */
                      {0, 0, 0, 9}, /* VQDMLAL */
 -                    {0, 0, 0, 0}, /* VMLSL */
 +                    {0, 0, 0, 7}, /* VMLSL */
                      {0, 0, 0, 9}, /* VQDMLSL */
 -                    {0, 0, 0, 0}, /* Integer VMULL */
 +                    {0, 0, 0, 7}, /* Integer VMULL */
                      {0, 0, 0, 9}, /* VQDMULL */
                      {0, 0, 0, 0xa}, /* Polynomial VMULL */
                      {0, 0, 0, 7}, /* Reserved: always UNDEF */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          tmp2 = neon_load_reg(rm, pass);
                      }
                      switch (op) {
 -                    case 8: case 9: case 10: case 11: case 12: case 13:
 -                        /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
 +                    case 9: case 11: case 13:
 +                        /* VQDMLAL, VQDMLSL, VQDMULL */
                          gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
                          break;
                      default: /* 15 is RESERVED: caught earlier  */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          /* VQDMULL */
                          gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
                          neon_store_reg64(cpu_V0, rd + pass);
 -                    } else if (op == 5 || (op >= 8 && op <= 11)) {
 +                    } else {
                          /* Accumulate.  */
                          neon_load_reg64(cpu_V1, rd + pass);
                          switch (op) {
 -                        case 10: /* VMLSL */
 -                            gen_neon_negl(cpu_V0, size);
 -                            /* Fall through */
 -                        case 8: /* VABAL, VMLAL */
 -                            gen_neon_addl(size);
 -                            break;
                          case 9: case 11: /* VQDMLAL, VQDMLSL */
                              gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
                              if (op == 11) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              abort();
                          }
                          neon_store_reg64(cpu_V0, rd + pass);
 -                    } else {
 -                        /* Write back the result.  */
 -                        neon_store_reg64(cpu_V0, rd + pass);
                      }
                  }
              } else {
 --
 .20.1

-[PULL 04/23] target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
+[PULL 25/26] qemu-option-trace.rst.inc: Don't use option:: markup
-Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
+Sphinx 3.2 is pickier than earlier versions about the option:: markup,
-Like almost all the remaining insns in this group, these are
+and complains about our usage in qemu-option-trace.rst:
-a combination of a two-input operation which returns a double width
-result and then a possible accumulation of that double width
+../../docs/qemu-option-trace.rst.inc:4:Malformed option description
-result into the destination.
+  '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
   "/opt args" or "+opt args"
 In this file, we're really trying to document the different parts of
 the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
 have already introduced with an option:: markup.  So it's not right
 to use option:: here anyway.  Switch to a different markup
 (definition lists) which gives about the same formatted output.
 (Unlike option::, this markup doesn't produce index entries; but
 at the moment we don't do anything much with indexes anyway, and
 in any case I think it doesn't make much sense to have individual
 index entries for the sub-parts of the --trace option.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
 Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
 ---
- target/arm/translate.h          |   1 +
+ docs/qemu-option-trace.rst.inc | 6 +++---
- target/arm/neon-dp.decode       |   6 ++
+file changed, 3 insertions(+), 3 deletions(-)
  target/arm/translate-neon.inc.c | 132 ++++++++++++++++++++++++++++++++
  target/arm/translate.c          |  31 +-------
 files changed, 142 insertions(+), 28 deletions(-)
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
+--- a/docs/qemu-option-trace.rst.inc
-+++ b/target/arm/translate.h
++++ b/docs/qemu-option-trace.rst.inc
-@@ -XXX,XX +XXX,XX @@ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
+@@ -XXX,XX +XXX,XX @@
- typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
- typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
+ Specify tracing options.
- typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
-+typedef void NeonGenTwoOpWidenFn(TCGv_i64, TCGv_i32, TCGv_i32);
+-.. option:: [enable=]PATTERN
- typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
++``[enable=]PATTERN``
- typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
- typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
+   Immediately enable events matching *PATTERN*
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+   (either event name or a globbing pattern).  This option is only
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ Specify tracing options.
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
+   Use :option:`-trace help` to print a list of names of trace points.
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+-.. option:: events=FILE
-     VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
++``events=FILE``
-+    VABAL_S_3d   1111 001 0 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+   Immediately enable events listed in *FILE*.
-+    VABAL_U_3d   1111 001 1 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+   The file must contain one event name (as listed in the ``trace-events-all``
-+
+@@ -XXX,XX +XXX,XX @@ Specify tracing options.
-     VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+   available if QEMU has been compiled with the ``simple``, ``log`` or
-     VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+   ``ftrace`` tracing backend.
-+
-+    VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
+-.. option:: file=FILE
-+    VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
++``file=FILE``
-   ]
- }
+   Log output traces to *FILE*.
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+   This option is only available if QEMU has been compiled with
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
  DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
  DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
  DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
 +
 +static bool do_long_3d(DisasContext *s, arg_3diff *a,
 +                       NeonGenTwoOpWidenFn *opfn,
 +                       NeonGenTwo64OpFn *accfn)
 +{
 +    /*
 +     * 3-regs different lengths, long operations.
 +     * These perform an operation on two inputs that returns a double-width
 +     * result, and then possibly perform an accumulation operation of
 +     * that result into the double-width destination.
 +     */
 +    TCGv_i64 rd0, rd1, tmp;
 +    TCGv_i32 rn, rm;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!opfn) {
 +        /* size == 3 case, which is an entirely different insn group */
 +        return false;
 +    }
 +
 +    if (a->vd & 1) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    rd0 = tcg_temp_new_i64();
 +    rd1 = tcg_temp_new_i64();
 +
 +    rn = neon_load_reg(a->vn, 0);
 +    rm = neon_load_reg(a->vm, 0);
 +    opfn(rd0, rn, rm);
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rm);
 +
 +    rn = neon_load_reg(a->vn, 1);
 +    rm = neon_load_reg(a->vm, 1);
 +    opfn(rd1, rn, rm);
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rm);
 +
 +    /* Don't store results until after all loads: they might overlap */
 +    if (accfn) {
 +        tmp = tcg_temp_new_i64();
 +        neon_load_reg64(tmp, a->vd);
 +        accfn(tmp, tmp, rd0);
 +        neon_store_reg64(tmp, a->vd);
 +        neon_load_reg64(tmp, a->vd + 1);
 +        accfn(tmp, tmp, rd1);
 +        neon_store_reg64(tmp, a->vd + 1);
 +        tcg_temp_free_i64(tmp);
 +    } else {
 +        neon_store_reg64(rd0, a->vd);
 +        neon_store_reg64(rd1, a->vd + 1);
 +    }
 +
 +    tcg_temp_free_i64(rd0);
 +    tcg_temp_free_i64(rd1);
 +
 +    return true;
 +}
 +
 +static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_abdl_s16,
 +        gen_helper_neon_abdl_s32,
 +        gen_helper_neon_abdl_s64,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VABDL_U_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_abdl_u16,
 +        gen_helper_neon_abdl_u32,
 +        gen_helper_neon_abdl_u64,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_abdl_s16,
 +        gen_helper_neon_abdl_s32,
 +        gen_helper_neon_abdl_s64,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const addfn[] = {
 +        gen_helper_neon_addl_u16,
 +        gen_helper_neon_addl_u32,
 +        tcg_gen_add_i64,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
 +}
 +
 +static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_abdl_u16,
 +        gen_helper_neon_abdl_u32,
 +        gen_helper_neon_abdl_u64,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const addfn[] = {
 +        gen_helper_neon_addl_u16,
 +        gen_helper_neon_addl_u32,
 +        tcg_gen_add_i64,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                      {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                      {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
 -                    {0, 0, 0, 0}, /* VABAL */
 +                    {0, 0, 0, 7}, /* VABAL */
                      {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
 -                    {0, 0, 0, 0}, /* VABDL */
 +                    {0, 0, 0, 7}, /* VABDL */
                      {0, 0, 0, 0}, /* VMLAL */
                      {0, 0, 0, 9}, /* VQDMLAL */
                      {0, 0, 0, 0}, /* VMLSL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          tmp2 = neon_load_reg(rm, pass);
                      }
                      switch (op) {
 -                    case 5: case 7: /* VABAL, VABDL */
 -                        switch ((size << 1) | u) {
 -                        case 0:
 -                            gen_helper_neon_abdl_s16(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 1:
 -                            gen_helper_neon_abdl_u16(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 2:
 -                            gen_helper_neon_abdl_s32(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 3:
 -                            gen_helper_neon_abdl_u32(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 4:
 -                            gen_helper_neon_abdl_s64(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 5:
 -                            gen_helper_neon_abdl_u64(cpu_V0, tmp, tmp2);
 -                            break;
 -                        default: abort();
 -                        }
 -                        tcg_temp_free_i32(tmp2);
 -                        tcg_temp_free_i32(tmp);
 -                        break;
                      case 8: case 9: case 10: case 11: case 12: case 13:
                          /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
                          gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          case 10: /* VMLSL */
                              gen_neon_negl(cpu_V0, size);
                              /* Fall through */
 -                        case 5: case 8: /* VABAL, VMLAL */
 +                        case 8: /* VABAL, VMLAL */
                              gen_neon_addl(size);
                              break;
                          case 9: case 11: /* VQDMLAL, VQDMLSL */
 --
 .20.1

-[PULL 02/23] target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
+[PULL 26/26] tests/qtest/npcm7xx_rng-test: Disable randomness tests
-Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
+The randomness tests in the NPCM7xx RNG test fail intermittently
-in the Neon 3-registers-different-lengths group to decodetree.
+but fairly frequently. On my machine running the test in a loop:
-These insns work by widening one or both inputs to double their
+ while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done
 size, performing an add or subtract at the doubled size and
 then storing the double-size result.
-As usual, rather than copying the loop of the original decoder
+will fail in less than a minute with an error like:
-(which needs awkward code to avoid problems when source and
+ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
-destination registers overlap) we just unroll the two passes.
+assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)
 (Failures have been observed on all 4 of the randomness tests,
 not just first_byte_runs.)
 It's not clear why these tests are failing like this, but intermittent
 failures make CI and merge testing awkward, so disable running them
 unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
 running the test suite, until we work out the cause.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
 Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
 ---
- target/arm/neon-dp.decode       |  43 +++++++++++++
+ tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
- target/arm/translate-neon.inc.c | 104 ++++++++++++++++++++++++++++++++
+file changed, 10 insertions(+), 4 deletions(-)
  target/arm/translate.c          |  16 ++---
 files changed, 151 insertions(+), 12 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/tests/qtest/npcm7xx_rng-test.c
-+++ b/target/arm/neon-dp.decode
++++ b/tests/qtest/npcm7xx_rng-test.c
-@@ -XXX,XX +XXX,XX @@ VCVT_FU_2sh      1111 001 1 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
+@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
- # So we have a single decode line and check the cmode/op in the
- # trans function.
+     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
- Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
-+
+-    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
-+######################################################################
+-    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
-+# Within the "two registers, or three registers of different lengths"
+-    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
-+# grouping ([23,4]=0b10), bits [21:20] are either part of the opcode
+-    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
-+# decode: 0b11 for VEXT, two-reg-misc, VTBL, and duplicate-scalar;
++    /*
-+# or they are a size field for the three-reg-different-lengths and
++     * These tests fail intermittently; only run them on explicit
-+# two-reg-and-scalar insn groups (where size cannot be 0b11). This
++     * request until we figure out why.
-+# is slightly awkward for decodetree: we handle it with this
++     */
-+# non-exclusive group which contains within it two exclusive groups:
++    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
-+# one for the size=0b11 patterns, and one for the size-not-0b11
++        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
-+# patterns. This allows us to check that none of the insns within
++        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
-+# each subgroup accidentally overlap each other. Note that all the
++        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
-+# trans functions for the size-not-0b11 patterns must check and
++        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
 +# return false for size==3.
 +######################################################################
 +{
 +  # 0b11 subgroup will go here
 +
 +  # Subgroup for size != 0b11
 +  [
 +    ##################################################################
 +    # 3-reg-different-length grouping:
 +    # 1111 001 U 1 D sz!=11 Vn:4 Vd:4 opc:4 N 0 M 0 Vm:4
 +    ##################################################################
 +
 +    &3diff vm vn vd size
 +
 +    @3diff       .... ... . . . size:2 .... .... .... . . . . .... \
 +                 &3diff vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +    VADDL_S_3d   1111 001 0 1 . .. .... .... 0000 . 0 . 0 .... @3diff
 +    VADDL_U_3d   1111 001 1 1 . .. .... .... 0000 . 0 . 0 .... @3diff
 +
 +    VADDW_S_3d   1111 001 0 1 . .. .... .... 0001 . 0 . 0 .... @3diff
 +    VADDW_U_3d   1111 001 1 1 . .. .... .... 0001 . 0 . 0 .... @3diff
 +
 +    VSUBL_S_3d   1111 001 0 1 . .. .... .... 0010 . 0 . 0 .... @3diff
 +    VSUBL_U_3d   1111 001 1 1 . .. .... .... 0010 . 0 . 0 .... @3diff
 +
 +    VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
 +    VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
 +  ]
 +}
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
      }
      return do_1reg_imm(s, a, fn);
  }
 +
 +static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
 +                           NeonGenWidenFn *widenfn,
 +                           NeonGenTwo64OpFn *opfn,
 +                           bool src1_wide)
 +{
 +    /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
 +    TCGv_i64 rn0_64, rn1_64, rm_64;
 +    TCGv_i32 rm;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+     qtest_start("-machine npcm750-evb");
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+     ret = g_test_run();
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!widenfn || !opfn) {
 +        /* size == 3 case, which is an entirely different insn group */
 +        return false;
 +    }
 +
 +    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    rn0_64 = tcg_temp_new_i64();
 +    rn1_64 = tcg_temp_new_i64();
 +    rm_64 = tcg_temp_new_i64();
 +
 +    if (src1_wide) {
 +        neon_load_reg64(rn0_64, a->vn);
 +    } else {
 +        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
 +        widenfn(rn0_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
 +    rm = neon_load_reg(a->vm, 0);
 +
 +    widenfn(rm_64, rm);
 +    tcg_temp_free_i32(rm);
 +    opfn(rn0_64, rn0_64, rm_64);
 +
 +    /*
 +     * Load second pass inputs before storing the first pass result, to
 +     * avoid incorrect results if a narrow input overlaps with the result.
 +     */
 +    if (src1_wide) {
 +        neon_load_reg64(rn1_64, a->vn + 1);
 +    } else {
 +        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
 +        widenfn(rn1_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
 +    rm = neon_load_reg(a->vm, 1);
 +
 +    neon_store_reg64(rn0_64, a->vd);
 +
 +    widenfn(rm_64, rm);
 +    tcg_temp_free_i32(rm);
 +    opfn(rn1_64, rn1_64, rm_64);
 +    neon_store_reg64(rn1_64, a->vd + 1);
 +
 +    tcg_temp_free_i64(rn0_64);
 +    tcg_temp_free_i64(rn1_64);
 +    tcg_temp_free_i64(rm_64);
 +
 +    return true;
 +}
 +
 +#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
 +    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
 +    {                                                                   \
 +        static NeonGenWidenFn * const widenfn[] = {                     \
 +            gen_helper_neon_widen_##S##8,                               \
 +            gen_helper_neon_widen_##S##16,                              \
 +            tcg_gen_##EXT##_i32_i64,                                    \
 +            NULL,                                                       \
 +        };                                                              \
 +        static NeonGenTwo64OpFn * const addfn[] = {                     \
 +            gen_helper_neon_##OP##l_u16,                                \
 +            gen_helper_neon_##OP##l_u32,                                \
 +            tcg_gen_##OP##_i64,                                         \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_prewiden_3d(s, a, widenfn[a->size],                   \
 +                              addfn[a->size], SRC1WIDE);                \
 +    }
 +
 +DO_PREWIDEN(VADDL_S, s, ext, add, false)
 +DO_PREWIDEN(VADDL_U, u, extu, add, false)
 +DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
 +DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
 +DO_PREWIDEN(VADDW_S, s, ext, add, true)
 +DO_PREWIDEN(VADDW_U, u, extu, add, true)
 +DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
 +DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  /* Three registers of different lengths.  */
                  int src1_wide;
                  int src2_wide;
 -                int prewiden;
                  /* undefreq: bit 0 : UNDEF if size == 0
                   *           bit 1 : UNDEF if size == 1
                   *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  int undefreq;
                  /* prewiden, src1_wide, src2_wide, undefreq */
                  static const int neon_3reg_wide[16][4] = {
 -                    {1, 0, 0, 0}, /* VADDL */
 -                    {1, 1, 0, 0}, /* VADDW */
 -                    {1, 0, 0, 0}, /* VSUBL */
 -                    {1, 1, 0, 0}, /* VSUBW */
 +                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
 +                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
 +                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
 +                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                      {0, 1, 1, 0}, /* VADDHN */
                      {0, 0, 0, 0}, /* VABAL */
                      {0, 1, 1, 0}, /* VSUBHN */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* Reserved: always UNDEF */
                  };
 -                prewiden = neon_3reg_wide[op][0];
                  src1_wide = neon_3reg_wide[op][1];
                  src2_wide = neon_3reg_wide[op][2];
                  undefreq = neon_3reg_wide[op][3];
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          } else {
                              tmp = neon_load_reg(rn, pass);
                          }
 -                        if (prewiden) {
 -                            gen_neon_widen(cpu_V0, tmp, size, u);
 -                        }
                      }
                      if (src2_wide) {
                          neon_load_reg64(cpu_V1, rm + pass);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          } else {
                              tmp2 = neon_load_reg(rm, pass);
                          }
 -                        if (prewiden) {
 -                            gen_neon_widen(cpu_V1, tmp2, size, u);
 -                        }
                      }
                      switch (op) {
                      case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
 --
 .20.1

Mostly my decodetree stuff, but also some patches for various
smaller bugs/features from others.

thanks
-- PMM

The following changes since commit 53550e81e2cafe7c03a39526b95cd21b5194d9b1:

Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2020-06-15 16:36:34 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200616

for you to fetch changes up to 64b397417a26509bcdff44ab94356a35c7901c79:

hw: arm: Set vendor property for IMX SDHCI emulations (2020-06-16 10:32:29 +0100)

----------------------------------------------------------------
 * hw: arm: Set vendor property for IMX SDHCI emulations
 * sd: sdhci: Implement basic vendor specific register support
 * hw/net/imx_fec: Convert debug fprintf() to trace events
 * target/arm/cpu: adjust virtual time for all KVM arm cpus
 * Implement configurable descriptor size in ftgmac100
 * hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
 * target/arm: More Neon decodetree conversion work

----------------------------------------------------------------
Erik Smit (1):
      Implement configurable descriptor size in ftgmac100

Guenter Roeck (2):
      sd: sdhci: Implement basic vendor specific register support
      hw: arm: Set vendor property for IMX SDHCI emulations

Jean-Christophe Dubois (2):
      hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
      hw/net/imx_fec: Convert debug fprintf() to trace events

Peter Maydell (17):
      target/arm: Fix missing temp frees in do_vshll_2sh
      target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
      target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
      target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
      target/arm: Convert Neon 3-reg-diff long multiplies
      target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
      target/arm: Convert Neon 3-reg-diff polynomial VMULL
      target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
      target/arm: Add missing TCG temp free in do_2shift_env_64()
      target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
      target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
      target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
      target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
      target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
      target/arm: Convert Neon VEXT to decodetree
      target/arm: Convert Neon VTBL, VTBX to decodetree
      target/arm: Convert Neon VDUP (scalar) to decodetree

fangying (1):
      target/arm/cpu: adjust virtual time for all KVM arm cpus

The widenfn() in do_vshll_2sh() does not free the input 32-bit
TCGv, so we need to do this in the calling code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 target/arm/translate-neon.inc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
     tmp = tcg_temp_new_i64();
 
     widenfn(tmp, rm0);
+    tcg_temp_free_i32(rm0);
     if (a->shift != 0) {
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
     neon_store_reg64(tmp, a->vd);
 
     widenfn(tmp, rm1);
+    tcg_temp_free_i32(rm1);
     if (a->shift != 0) {
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
-- 
2.20.1

Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
in the Neon 3-registers-different-lengths group to decodetree.
These insns work by widening one or both inputs to double their
size, performing an add or subtract at the doubled size and
then storing the double-size result.

As usual, rather than copying the loop of the original decoder
(which needs awkward code to avoid problems when source and
destination registers overlap) we just unroll the two passes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  43 +++++++++++++
 target/arm/translate-neon.inc.c | 104 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  16 ++---
 3 files changed, 151 insertions(+), 12 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_FU_2sh      1111 001 1 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
 # So we have a single decode line and check the cmode/op in the
 # trans function.
 Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+
+######################################################################
+# Within the "two registers, or three registers of different lengths"
+# grouping ([23,4]=0b10), bits [21:20] are either part of the opcode
+# decode: 0b11 for VEXT, two-reg-misc, VTBL, and duplicate-scalar;
+# or they are a size field for the three-reg-different-lengths and
+# two-reg-and-scalar insn groups (where size cannot be 0b11). This
+# is slightly awkward for decodetree: we handle it with this
+# non-exclusive group which contains within it two exclusive groups:
+# one for the size=0b11 patterns, and one for the size-not-0b11
+# patterns. This allows us to check that none of the insns within
+# each subgroup accidentally overlap each other. Note that all the
+# trans functions for the size-not-0b11 patterns must check and
+# return false for size==3.
+######################################################################
+{
+  # 0b11 subgroup will go here
+
+  # Subgroup for size != 0b11
+  [
+    ##################################################################
+    # 3-reg-different-length grouping:
+    # 1111 001 U 1 D sz!=11 Vn:4 Vd:4 opc:4 N 0 M 0 Vm:4
+    ##################################################################
+
+    &3diff vm vn vd size
+
+    @3diff       .... ... . . . size:2 .... .... .... . . . . .... \
+                 &3diff vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VADDL_S_3d   1111 001 0 1 . .. .... .... 0000 . 0 . 0 .... @3diff
+    VADDL_U_3d   1111 001 1 1 . .. .... .... 0000 . 0 . 0 .... @3diff
+
+    VADDW_S_3d   1111 001 0 1 . .. .... .... 0001 . 0 . 0 .... @3diff
+    VADDW_U_3d   1111 001 1 1 . .. .... .... 0001 . 0 . 0 .... @3diff
+
+    VSUBL_S_3d   1111 001 0 1 . .. .... .... 0010 . 0 . 0 .... @3diff
+    VSUBL_U_3d   1111 001 1 1 . .. .... .... 0010 . 0 . 0 .... @3diff
+
+    VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+    VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+  ]
+}
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
     }
     return do_1reg_imm(s, a, fn);
 }
+
+static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+                           NeonGenWidenFn *widenfn,
+                           NeonGenTwo64OpFn *opfn,
+                           bool src1_wide)
+{
+    /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
+    TCGv_i64 rn0_64, rn1_64, rm_64;
+    TCGv_i32 rm;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!widenfn || !opfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rn0_64 = tcg_temp_new_i64();
+    rn1_64 = tcg_temp_new_i64();
+    rm_64 = tcg_temp_new_i64();
+
+    if (src1_wide) {
+        neon_load_reg64(rn0_64, a->vn);
+    } else {
+        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
+        widenfn(rn0_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
+    rm = neon_load_reg(a->vm, 0);
+
+    widenfn(rm_64, rm);
+    tcg_temp_free_i32(rm);
+    opfn(rn0_64, rn0_64, rm_64);
+
+    /*
+     * Load second pass inputs before storing the first pass result, to
+     * avoid incorrect results if a narrow input overlaps with the result.
+     */
+    if (src1_wide) {
+        neon_load_reg64(rn1_64, a->vn + 1);
+    } else {
+        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
+        widenfn(rn1_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
+    rm = neon_load_reg(a->vm, 1);
+
+    neon_store_reg64(rn0_64, a->vd);
+
+    widenfn(rm_64, rm);
+    tcg_temp_free_i32(rm);
+    opfn(rn1_64, rn1_64, rm_64);
+    neon_store_reg64(rn1_64, a->vd + 1);
+
+    tcg_temp_free_i64(rn0_64);
+    tcg_temp_free_i64(rn1_64);
+    tcg_temp_free_i64(rm_64);
+
+    return true;
+}
+
+#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+    {                                                                   \
+        static NeonGenWidenFn * const widenfn[] = {                     \
+            gen_helper_neon_widen_##S##8,                               \
+            gen_helper_neon_widen_##S##16,                              \
+            tcg_gen_##EXT##_i32_i64,                                    \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenTwo64OpFn * const addfn[] = {                     \
+            gen_helper_neon_##OP##l_u16,                                \
+            gen_helper_neon_##OP##l_u32,                                \
+            tcg_gen_##OP##_i64,                                         \
+            NULL,                                                       \
+        };                                                              \
+        return do_prewiden_3d(s, a, widenfn[a->size],                   \
+                              addfn[a->size], SRC1WIDE);                \
+    }
+
+DO_PREWIDEN(VADDL_S, s, ext, add, false)
+DO_PREWIDEN(VADDL_U, u, extu, add, false)
+DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
+DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
+DO_PREWIDEN(VADDW_S, s, ext, add, true)
+DO_PREWIDEN(VADDW_U, u, extu, add, true)
+DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
+DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 /* Three registers of different lengths.  */
                 int src1_wide;
                 int src2_wide;
-                int prewiden;
                 /* undefreq: bit 0 : UNDEF if size == 0
                  *           bit 1 : UNDEF if size == 1
                  *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 int undefreq;
                 /* prewiden, src1_wide, src2_wide, undefreq */
                 static const int neon_3reg_wide[16][4] = {
-                    {1, 0, 0, 0}, /* VADDL */
-                    {1, 1, 0, 0}, /* VADDW */
-                    {1, 0, 0, 0}, /* VSUBL */
-                    {1, 1, 0, 0}, /* VSUBW */
+                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
+                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
+                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
+                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                     {0, 1, 1, 0}, /* VADDHN */
                     {0, 0, 0, 0}, /* VABAL */
                     {0, 1, 1, 0}, /* VSUBHN */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
 
-                prewiden = neon_3reg_wide[op][0];
                 src1_wide = neon_3reg_wide[op][1];
                 src2_wide = neon_3reg_wide[op][2];
                 undefreq = neon_3reg_wide[op][3];
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         } else {
                             tmp = neon_load_reg(rn, pass);
                         }
-                        if (prewiden) {
-                            gen_neon_widen(cpu_V0, tmp, size, u);
-                        }
                     }
                     if (src2_wide) {
                         neon_load_reg64(cpu_V1, rm + pass);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         } else {
                             tmp2 = neon_load_reg(rm, pass);
                         }
-                        if (prewiden) {
-                            gen_neon_widen(cpu_V1, tmp2, size, u);
-                        }
                     }
                     switch (op) {
                     case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
-- 
2.20.1

Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
VRSUBHN in the Neon 3-registers-different-lengths group to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  6 +++
 target/arm/translate-neon.inc.c | 87 +++++++++++++++++++++++++++++++
 target/arm/translate.c          | 91 ++++-----------------------------
 3 files changed, 104 insertions(+), 80 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
     VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+
+    VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+    VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+
+    VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+    VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_PREWIDEN(VADDW_S, s, ext, add, true)
 DO_PREWIDEN(VADDW_U, u, extu, add, true)
 DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
 DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+
+static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
+                         NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
+{
+    /* 3-regs different lengths, narrowing (VADDHN/VSUBHN/VRADDHN/VRSUBHN) */
+    TCGv_i64 rn_64, rm_64;
+    TCGv_i32 rd0, rd1;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn || !narrowfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if ((a->vn | a->vm) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rn_64 = tcg_temp_new_i64();
+    rm_64 = tcg_temp_new_i64();
+    rd0 = tcg_temp_new_i32();
+    rd1 = tcg_temp_new_i32();
+
+    neon_load_reg64(rn_64, a->vn);
+    neon_load_reg64(rm_64, a->vm);
+
+    opfn(rn_64, rn_64, rm_64);
+
+    narrowfn(rd0, rn_64);
+
+    neon_load_reg64(rn_64, a->vn + 1);
+    neon_load_reg64(rm_64, a->vm + 1);
+
+    opfn(rn_64, rn_64, rm_64);
+
+    narrowfn(rd1, rn_64);
+
+    neon_store_reg(a->vd, 0, rd0);
+    neon_store_reg(a->vd, 1, rd1);
+
+    tcg_temp_free_i64(rn_64);
+    tcg_temp_free_i64(rm_64);
+
+    return true;
+}
+
+#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP)                       \
+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+    {                                                                   \
+        static NeonGenTwo64OpFn * const addfn[] = {                     \
+            gen_helper_neon_##OP##l_u16,                                \
+            gen_helper_neon_##OP##l_u32,                                \
+            tcg_gen_##OP##_i64,                                         \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenNarrowFn * const narrowfn[] = {                   \
+            gen_helper_neon_##NARROWTYPE##_high_u8,                     \
+            gen_helper_neon_##NARROWTYPE##_high_u16,                    \
+            EXTOP,                                                      \
+            NULL,                                                       \
+        };                                                              \
+        return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]);   \
+    }
+
+static void gen_narrow_round_high_u32(TCGv_i32 rd, TCGv_i64 rn)
+{
+    tcg_gen_addi_i64(rn, rn, 1u << 31);
+    tcg_gen_extrh_i64_i32(rd, rn);
+}
+
+DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
+DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
+DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
+DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
     }
 }
 
-static inline void gen_neon_subl(int size)
-{
-    switch (size) {
-    case 0: gen_helper_neon_subl_u16(CPU_V001); break;
-    case 1: gen_helper_neon_subl_u32(CPU_V001); break;
-    case 2: tcg_gen_sub_i64(CPU_V001); break;
-    default: abort();
-    }
-}
-
 static inline void gen_neon_negl(TCGv_i64 var, int size)
 {
     switch (size) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             op = (insn >> 8) & 0xf;
             if ((insn & (1 << 6)) == 0) {
                 /* Three registers of different lengths.  */
-                int src1_wide;
-                int src2_wide;
                 /* undefreq: bit 0 : UNDEF if size == 0
                  *           bit 1 : UNDEF if size == 1
                  *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VADDW: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
-                    {0, 1, 1, 0}, /* VADDHN */
+                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
                     {0, 0, 0, 0}, /* VABAL */
-                    {0, 1, 1, 0}, /* VSUBHN */
+                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                     {0, 0, 0, 0}, /* VABDL */
                     {0, 0, 0, 0}, /* VMLAL */
                     {0, 0, 0, 9}, /* VQDMLAL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
 
-                src1_wide = neon_3reg_wide[op][1];
-                src2_wide = neon_3reg_wide[op][2];
                 undefreq = neon_3reg_wide[op][3];
 
                 if ((undefreq & (1 << size)) ||
                     ((undefreq & 8) && u)) {
                     return 1;
                 }
-                if ((src1_wide && (rn & 1)) ||
-                    (src2_wide && (rm & 1)) ||
-                    (!src2_wide && (rd & 1))) {
+                if (rd & 1) {
                     return 1;
                 }
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 /* Avoid overlapping operands.  Wide source operands are
                    always aligned so will never overlap with wide
                    destinations in problematic ways.  */
-                if (rd == rm && !src2_wide) {
+                if (rd == rm) {
                     tmp = neon_load_reg(rm, 1);
                     neon_store_scratch(2, tmp);
-                } else if (rd == rn && !src1_wide) {
+                } else if (rd == rn) {
                     tmp = neon_load_reg(rn, 1);
                     neon_store_scratch(2, tmp);
                 }
                 tmp3 = NULL;
                 for (pass = 0; pass < 2; pass++) {
-                    if (src1_wide) {
-                        neon_load_reg64(cpu_V0, rn + pass);
-                        tmp = NULL;
+                    if (pass == 1 && rd == rn) {
+                        tmp = neon_load_scratch(2);
                     } else {
-                        if (pass == 1 && rd == rn) {
-                            tmp = neon_load_scratch(2);
-                        } else {
-                            tmp = neon_load_reg(rn, pass);
-                        }
+                        tmp = neon_load_reg(rn, pass);
                     }
-                    if (src2_wide) {
-                        neon_load_reg64(cpu_V1, rm + pass);
-                        tmp2 = NULL;
+                    if (pass == 1 && rd == rm) {
+                        tmp2 = neon_load_scratch(2);
                     } else {
-                        if (pass == 1 && rd == rm) {
-                            tmp2 = neon_load_scratch(2);
-                        } else {
-                            tmp2 = neon_load_reg(rm, pass);
-                        }
+                        tmp2 = neon_load_reg(rm, pass);
                     }
                     switch (op) {
-                    case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
-                        gen_neon_addl(size);
-                        break;
-                    case 2: case 3: case 6: /* VSUBL, VSUBW, VSUBHN, VRSUBHN */
-                        gen_neon_subl(size);
-                        break;
                     case 5: case 7: /* VABAL, VABDL */
                         switch ((size << 1) | u) {
                         case 0:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             abort();
                         }
                         neon_store_reg64(cpu_V0, rd + pass);
-                    } else if (op == 4 || op == 6) {
-                        /* Narrowing operation.  */
-                        tmp = tcg_temp_new_i32();
-                        if (!u) {
-                            switch (size) {
-                            case 0:
-                                gen_helper_neon_narrow_high_u8(tmp, cpu_V0);
-                                break;
-                            case 1:
-                                gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
-                                break;
-                            case 2:
-                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
-                                break;
-                            default: abort();
-                            }
-                        } else {
-                            switch (size) {
-                            case 0:
-                                gen_helper_neon_narrow_round_high_u8(tmp, cpu_V0);
-                                break;
-                            case 1:
-                                gen_helper_neon_narrow_round_high_u16(tmp, cpu_V0);
-                                break;
-                            case 2:
-                                tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
-                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
-                                break;
-                            default: abort();
-                            }
-                        }
-                        if (pass == 0) {
-                            tmp3 = tmp;
-                        } else {
-                            neon_store_reg(rd, 0, tmp3);
-                            neon_store_reg(rd, 1, tmp);
-                        }
                     } else {
                         /* Write back the result.  */
                         neon_store_reg64(cpu_V0, rd + pass);
-- 
2.20.1

Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
Like almost all the remaining insns in this group, these are
a combination of a two-input operation which returns a double width
result and then a possible accumulation of that double width
result into the destination.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.h          |   1 +
 target/arm/neon-dp.decode       |   6 ++
 target/arm/translate-neon.inc.c | 132 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  31 +-------
 4 files changed, 142 insertions(+), 28 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
 typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
 typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
 typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
+typedef void NeonGenTwoOpWidenFn(TCGv_i64, TCGv_i32, TCGv_i32);
 typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
 typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
     VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
 
+    VABAL_S_3d   1111 001 0 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+    VABAL_U_3d   1111 001 1 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+
     VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
     VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+
+    VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
+    VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
 DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
 DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
 DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
+
+static bool do_long_3d(DisasContext *s, arg_3diff *a,
+                       NeonGenTwoOpWidenFn *opfn,
+                       NeonGenTwo64OpFn *accfn)
+{
+    /*
+     * 3-regs different lengths, long operations.
+     * These perform an operation on two inputs that returns a double-width
+     * result, and then possibly perform an accumulation operation of
+     * that result into the double-width destination.
+     */
+    TCGv_i64 rd0, rd1, tmp;
+    TCGv_i32 rn, rm;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rd0 = tcg_temp_new_i64();
+    rd1 = tcg_temp_new_i64();
+
+    rn = neon_load_reg(a->vn, 0);
+    rm = neon_load_reg(a->vm, 0);
+    opfn(rd0, rn, rm);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rm);
+
+    rn = neon_load_reg(a->vn, 1);
+    rm = neon_load_reg(a->vm, 1);
+    opfn(rd1, rn, rm);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rm);
+
+    /* Don't store results until after all loads: they might overlap */
+    if (accfn) {
+        tmp = tcg_temp_new_i64();
+        neon_load_reg64(tmp, a->vd);
+        accfn(tmp, tmp, rd0);
+        neon_store_reg64(tmp, a->vd);
+        neon_load_reg64(tmp, a->vd + 1);
+        accfn(tmp, tmp, rd1);
+        neon_store_reg64(tmp, a->vd + 1);
+        tcg_temp_free_i64(tmp);
+    } else {
+        neon_store_reg64(rd0, a->vd);
+        neon_store_reg64(rd1, a->vd + 1);
+    }
+
+    tcg_temp_free_i64(rd0);
+    tcg_temp_free_i64(rd1);
+
+    return true;
+}
+
+static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_s16,
+        gen_helper_neon_abdl_s32,
+        gen_helper_neon_abdl_s64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VABDL_U_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_u16,
+        gen_helper_neon_abdl_u32,
+        gen_helper_neon_abdl_u64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_s16,
+        gen_helper_neon_abdl_s32,
+        gen_helper_neon_abdl_s64,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const addfn[] = {
+        gen_helper_neon_addl_u16,
+        gen_helper_neon_addl_u32,
+        tcg_gen_add_i64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
+}
+
+static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_u16,
+        gen_helper_neon_abdl_u32,
+        gen_helper_neon_abdl_u64,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const addfn[] = {
+        gen_helper_neon_addl_u16,
+        gen_helper_neon_addl_u32,
+        tcg_gen_add_i64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                     {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
-                    {0, 0, 0, 0}, /* VABAL */
+                    {0, 0, 0, 7}, /* VABAL */
                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
-                    {0, 0, 0, 0}, /* VABDL */
+                    {0, 0, 0, 7}, /* VABDL */
                     {0, 0, 0, 0}, /* VMLAL */
                     {0, 0, 0, 9}, /* VQDMLAL */
                     {0, 0, 0, 0}, /* VMLSL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         tmp2 = neon_load_reg(rm, pass);
                     }
                     switch (op) {
-                    case 5: case 7: /* VABAL, VABDL */
-                        switch ((size << 1) | u) {
-                        case 0:
-                            gen_helper_neon_abdl_s16(cpu_V0, tmp, tmp2);
-                            break;
-                        case 1:
-                            gen_helper_neon_abdl_u16(cpu_V0, tmp, tmp2);
-                            break;
-                        case 2:
-                            gen_helper_neon_abdl_s32(cpu_V0, tmp, tmp2);
-                            break;
-                        case 3:
-                            gen_helper_neon_abdl_u32(cpu_V0, tmp, tmp2);
-                            break;
-                        case 4:
-                            gen_helper_neon_abdl_s64(cpu_V0, tmp, tmp2);
-                            break;
-                        case 5:
-                            gen_helper_neon_abdl_u64(cpu_V0, tmp, tmp2);
-                            break;
-                        default: abort();
-                        }
-                        tcg_temp_free_i32(tmp2);
-                        tcg_temp_free_i32(tmp);
-                        break;
                     case 8: case 9: case 10: case 11: case 12: case 13:
                         /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
                         gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         case 10: /* VMLSL */
                             gen_neon_negl(cpu_V0, size);
                             /* Fall through */
-                        case 5: case 8: /* VABAL, VMLAL */
+                        case 8: /* VABAL, VMLAL */
                             gen_neon_addl(size);
                             break;
                         case 9: case 11: /* VQDMLAL, VQDMLSL */
-- 
2.20.1

Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
a 32x32->64 multiply with possible accumulate.

Note that for VMLSL we do the accumulate directly with a subtraction
rather than doing a negate-then-add as the old code did.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  9 +++++
 target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 21 +++-------
 3 files changed, 86 insertions(+), 15 deletions(-)

Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
these are all saturating doubling long multiplies with a possible
accumulate step.

These are the last insns in the group which use the pass-over-each
elements loop, so we can delete that code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  6 +++
 target/arm/translate-neon.inc.c | 82 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 59 ++----------------------
 3 files changed, 92 insertions(+), 55 deletions(-)

Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
insn in this group to be converted.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  2 ++
 target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
 target/arm/translate.c          | 60 ++-------------------------------
 3 files changed, 48 insertions(+), 57 deletions(-)

Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
trans_VSHLL_U_2sh() as both 'static' and 'const'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-neon.inc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
 
 static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
 {
-    NeonGenWidenFn *widenfn[] = {
+    static NeonGenWidenFn * const widenfn[] = {
         gen_helper_neon_widen_s8,
         gen_helper_neon_widen_s16,
         tcg_gen_ext_i32_i64,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
 
 static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
 {
-    NeonGenWidenFn *widenfn[] = {
+    static NeonGenWidenFn * const widenfn[] = {
         gen_helper_neon_widen_u8,
         gen_helper_neon_widen_u16,
         tcg_gen_extu_i32_i64,
-- 
2.20.1

Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
scalar" group to decodetree.  These are 32x32->32 operations where
one of the inputs is the scalar, followed by a possible accumulate
operation of the 32-bit result.

The refactoring removes some of the oddities of the old decoder:
 * operands to the operation and accumulation were often
   reversed (taking advantage of the fact that most of these ops
   are commutative); the new code follows the pseudocode order
 * the Q bit in the insn was in a local variable 'u'; in the
   new code it is decoded into a->q

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  15 ++++
 target/arm/translate-neon.inc.c | 133 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  77 ++----------------
 3 files changed, 154 insertions(+), 71 deletions(-)

Convert the float versions of VMLA, VMLS and VMUL in the Neon
2-reg-scalar group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
As noted in the comment on the WRAP_FP_FN macro, we could have
had a do_2scalar_fp() function, but for 3 insns it seemed
simpler to just do the wrapping to get hold of the fpstatus ptr.
(These are the only fp insns in the group.)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 65 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 37 ++-----------------
 3 files changed, 71 insertions(+), 34 deletions(-)

Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 +++
 target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
 target/arm/translate.c          | 42 ++-------------------------------
 3 files changed, 34 insertions(+), 40 deletions(-)

Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 74 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 38 +----------------
 3 files changed, 79 insertions(+), 36 deletions(-)

Convert the Neon 2-reg-scalar long multiplies to decodetree.
These are the last instructions in the group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  18 ++++
 target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
 target/arm/translate.c          | 182 ++------------------------------
 3 files changed, 187 insertions(+), 176 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
+    # For the 'long' ops the Q bit is part of insn decode
+    @2scalar_q0  .... ... . . . size:2 .... .... .... . . . . .... \
+                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
 
     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
     VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
 
+    VMLAL_S_2sc  1111 001 0 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+    VMLAL_U_2sc  1111 001 1 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+
+    VQDMLAL_2sc  1111 001 0 1 . .. .... .... 0011 . 1 . 0 .... @2scalar_q0
+
     VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
     VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
 
+    VMLSL_S_2sc  1111 001 0 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
+    VMLSL_U_2sc  1111 001 1 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
+
+    VQDMLSL_2sc  1111 001 0 1 . .. .... .... 0111 . 1 . 0 .... @2scalar_q0
+
     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
     VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
 
+    VMULL_S_2sc  1111 001 0 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
+    VMULL_U_2sc  1111 001 1 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
+
+    VQDMULL_2sc  1111 001 0 1 . .. .... .... 1011 . 1 . 0 .... @2scalar_q0
+
     VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
     VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
 
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
     };
     return do_vqrdmlah_2sc(s, a, opfn[a->size]);
 }
+
+static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
+                            NeonGenTwoOpWidenFn *opfn,
+                            NeonGenTwo64OpFn *accfn)
+{
+    /*
+     * Two registers and a scalar, long operations: perform an
+     * operation on the input elements and the scalar which produces
+     * a double-width result, and then possibly perform an accumulation
+     * operation of that result into the destination.
+     */
+    TCGv_i32 scalar, rn;
+    TCGv_i64 rn0_64, rn1_64;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* Bad size (including size == 3, which is a different insn group) */
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    scalar = neon_get_scalar(a->size, a->vm);
+
+    /* Load all inputs before writing any outputs, in case of overlap */
+    rn = neon_load_reg(a->vn, 0);
+    rn0_64 = tcg_temp_new_i64();
+    opfn(rn0_64, rn, scalar);
+    tcg_temp_free_i32(rn);
+
+    rn = neon_load_reg(a->vn, 1);
+    rn1_64 = tcg_temp_new_i64();
+    opfn(rn1_64, rn, scalar);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(scalar);
+
+    if (accfn) {
+        TCGv_i64 t64 = tcg_temp_new_i64();
+        neon_load_reg64(t64, a->vd);
+        accfn(t64, t64, rn0_64);
+        neon_store_reg64(t64, a->vd);
+        neon_load_reg64(t64, a->vd + 1);
+        accfn(t64, t64, rn1_64);
+        neon_store_reg64(t64, a->vd + 1);
+        tcg_temp_free_i64(t64);
+    } else {
+        neon_store_reg64(rn0_64, a->vd);
+        neon_store_reg64(rn1_64, a->vd + 1);
+    }
+    tcg_temp_free_i64(rn0_64);
+    tcg_temp_free_i64(rn1_64);
+    return true;
+}
+
+static bool trans_VMULL_S_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mull_s16,
+        gen_mull_s32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mull_u16,
+        gen_mull_u32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+#define DO_VMLAL_2SC(INSN, MULL, ACC)                                   \
+    static bool trans_##INSN##_2sc(DisasContext *s, arg_2scalar *a)     \
+    {                                                                   \
+        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
+            NULL,                                                       \
+            gen_helper_neon_##MULL##16,                                 \
+            gen_##MULL##32,                                             \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenTwo64OpFn * const accfn[] = {                     \
+            NULL,                                                       \
+            gen_helper_neon_##ACC##l_u32,                               \
+            tcg_gen_##ACC##_i64,                                        \
+            NULL,                                                       \
+        };                                                              \
+        return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);    \
+    }
+
+DO_VMLAL_2SC(VMLAL_S, mull_s, add)
+DO_VMLAL_2SC(VMLAL_U, mull_u, add)
+DO_VMLAL_2SC(VMLSL_S, mull_s, sub)
+DO_VMLAL_2SC(VMLSL_U, mull_u, sub)
+
+static bool trans_VQDMULL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VQDMLAL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const accfn[] = {
+        NULL,
+        gen_VQDMLAL_acc_16,
+        gen_VQDMLAL_acc_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
+}
+
+static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const accfn[] = {
+        NULL,
+        gen_VQDMLSL_acc_16,
+        gen_VQDMLSL_acc_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
     tcg_gen_ext16s_i32(dest, var);
 }
 
-/* 32x32->64 multiply.  Marks inputs as dead.  */
-static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
-{
-    TCGv_i32 lo = tcg_temp_new_i32();
-    TCGv_i32 hi = tcg_temp_new_i32();
-    TCGv_i64 ret;
-
-    tcg_gen_mulu2_i32(lo, hi, a, b);
-    tcg_temp_free_i32(a);
-    tcg_temp_free_i32(b);
-
-    ret = tcg_temp_new_i64();
-    tcg_gen_concat_i32_i64(ret, lo, hi);
-    tcg_temp_free_i32(lo);
-    tcg_temp_free_i32(hi);
-
-    return ret;
-}
-
-static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
-{
-    TCGv_i32 lo = tcg_temp_new_i32();
-    TCGv_i32 hi = tcg_temp_new_i32();
-    TCGv_i64 ret;
-
-    tcg_gen_muls2_i32(lo, hi, a, b);
-    tcg_temp_free_i32(a);
-    tcg_temp_free_i32(b);
-
-    ret = tcg_temp_new_i64();
-    tcg_gen_concat_i32_i64(ret, lo, hi);
-    tcg_temp_free_i32(lo);
-    tcg_temp_free_i32(hi);
-
-    return ret;
-}
-
 /* Swap low and high halfwords.  */
 static void gen_swap_half(TCGv_i32 var)
 {
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
     }
 }
 
-static inline void gen_neon_negl(TCGv_i64 var, int size)
-{
-    switch (size) {
-    case 0: gen_helper_neon_negl_u16(var, var); break;
-    case 1: gen_helper_neon_negl_u32(var, var); break;
-    case 2:
-        tcg_gen_neg_i64(var, var);
-        break;
-    default: abort();
-    }
-}
-
-static inline void gen_neon_addl_saturate(TCGv_i64 op0, TCGv_i64 op1, int size)
-{
-    switch (size) {
-    case 1: gen_helper_neon_addl_saturate_s32(op0, cpu_env, op0, op1); break;
-    case 2: gen_helper_neon_addl_saturate_s64(op0, cpu_env, op0, op1); break;
-    default: abort();
-    }
-}
-
-static inline void gen_neon_mull(TCGv_i64 dest, TCGv_i32 a, TCGv_i32 b,
-                                 int size, int u)
-{
-    TCGv_i64 tmp;
-
-    switch ((size << 1) | u) {
-    case 0: gen_helper_neon_mull_s8(dest, a, b); break;
-    case 1: gen_helper_neon_mull_u8(dest, a, b); break;
-    case 2: gen_helper_neon_mull_s16(dest, a, b); break;
-    case 3: gen_helper_neon_mull_u16(dest, a, b); break;
-    case 4:
-        tmp = gen_muls_i64_i32(a, b);
-        tcg_gen_mov_i64(dest, tmp);
-        tcg_temp_free_i64(tmp);
-        break;
-    case 5:
-        tmp = gen_mulu_i64_i32(a, b);
-        tcg_gen_mov_i64(dest, tmp);
-        tcg_temp_free_i64(tmp);
-        break;
-    default: abort();
-    }
-
-    /* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
-       Don't forget to clean them now.  */
-    if (size < 2) {
-        tcg_temp_free_i32(a);
-        tcg_temp_free_i32(b);
-    }
-}
-
 static void gen_neon_narrow_op(int op, int u, int size,
                                TCGv_i32 dest, TCGv_i64 src)
 {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int u;
     int vec_size;
     uint32_t imm;
-    TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
+    TCGv_i32 tmp, tmp2, tmp3, tmp5;
     TCGv_ptr ptr1;
     TCGv_i64 tmp64;
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         return 1;
     } else { /* (insn & 0x00800010 == 0x00800000) */
         if (size != 3) {
-            op = (insn >> 8) & 0xf;
-            if ((insn & (1 << 6)) == 0) {
-                /* Three registers of different lengths: handled by decodetree */
-                return 1;
-            } else {
-                /* Two registers and a scalar. NB that for ops of this form
-                 * the ARM ARM labels bit 24 as Q, but it is in our variable
-                 * 'u', not 'q'.
-                 */
-                if (size == 0) {
-                    return 1;
-                }
-                switch (op) {
-                case 0: /* Integer VMLA scalar */
-                case 4: /* Integer VMLS scalar */
-                case 8: /* Integer VMUL scalar */
-                case 1: /* Float VMLA scalar */
-                case 5: /* Floating point VMLS scalar */
-                case 9: /* Floating point VMUL scalar */
-                case 12: /* VQDMULH scalar */
-                case 13: /* VQRDMULH scalar */
-                case 14: /* VQRDMLAH scalar */
-                case 15: /* VQRDMLSH scalar */
-                    return 1; /* handled by decodetree */
-
-                case 3: /* VQDMLAL scalar */
-                case 7: /* VQDMLSL scalar */
-                case 11: /* VQDMULL scalar */
-                    if (u == 1) {
-                        return 1;
-                    }
-                    /* fall through */
-                case 2: /* VMLAL sclar */
-                case 6: /* VMLSL scalar */
-                case 10: /* VMULL scalar */
-                    if (rd & 1) {
-                        return 1;
-                    }
-                    tmp2 = neon_get_scalar(size, rm);
-                    /* We need a copy of tmp2 because gen_neon_mull
-                     * deletes it during pass 0.  */
-                    tmp4 = tcg_temp_new_i32();
-                    tcg_gen_mov_i32(tmp4, tmp2);
-                    tmp3 = neon_load_reg(rn, 1);
-
-                    for (pass = 0; pass < 2; pass++) {
-                        if (pass == 0) {
-                            tmp = neon_load_reg(rn, 0);
-                        } else {
-                            tmp = tmp3;
-                            tmp2 = tmp4;
-                        }
-                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
-                        if (op != 11) {
-                            neon_load_reg64(cpu_V1, rd + pass);
-                        }
-                        switch (op) {
-                        case 6:
-                            gen_neon_negl(cpu_V0, size);
-                            /* Fall through */
-                        case 2:
-                            gen_neon_addl(size);
-                            break;
-                        case 3: case 7:
-                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                            if (op == 7) {
-                                gen_neon_negl(cpu_V0, size);
-                            }
-                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
-                            break;
-                        case 10:
-                            /* no-op */
-                            break;
-                        case 11:
-                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                            break;
-                        default:
-                            abort();
-                        }
-                        neon_store_reg64(cpu_V0, rd + pass);
-                    }
-                    break;
-                default:
-                    g_assert_not_reached();
-                }
-            }
+            /*
+             * Three registers of different lengths, or two registers and
+             * a scalar: handled by decodetree
+             */
+            return 1;
         } else { /* size == 3 */
             if (!u) {
                 /* Extract.  */
-- 
2.20.1

Convert the Neon VEXT insn to decodetree. Rather than keeping the
old implementation which used fixed temporaries cpu_V0 and cpu_V1
and did the extraction with by-hand shift and logic ops, we use
the TCG extract2 insn.

We don't need to special case 0 or 8 immediates any more as the
optimizer is smart enough to throw away the dead code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  8 +++-
 target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 58 +------------------------
 3 files changed, 85 insertions(+), 57 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 # return false for size==3.
 ######################################################################
 {
-  # 0b11 subgroup will go here
+  [
+    ##################################################################
+    # Miscellaneous size=0b11 insns
+    ##################################################################
+    VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+  ]
 
   # Subgroup for size != 0b11
   [
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
 
     return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
 }
+
+static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (a->imm > 7 && !a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (!a->q) {
+        /* Extract 64 bits from <Vm:Vn> */
+        TCGv_i64 left, right, dest;
+
+        left = tcg_temp_new_i64();
+        right = tcg_temp_new_i64();
+        dest = tcg_temp_new_i64();
+
+        neon_load_reg64(right, a->vn);
+        neon_load_reg64(left, a->vm);
+        tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
+        neon_store_reg64(dest, a->vd);
+
+        tcg_temp_free_i64(left);
+        tcg_temp_free_i64(right);
+        tcg_temp_free_i64(dest);
+    } else {
+        /* Extract 128 bits from <Vm+1:Vm:Vn+1:Vn> */
+        TCGv_i64 left, middle, right, destleft, destright;
+
+        left = tcg_temp_new_i64();
+        middle = tcg_temp_new_i64();
+        right = tcg_temp_new_i64();
+        destleft = tcg_temp_new_i64();
+        destright = tcg_temp_new_i64();
+
+        if (a->imm < 8) {
+            neon_load_reg64(right, a->vn);
+            neon_load_reg64(middle, a->vn + 1);
+            tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
+            neon_load_reg64(left, a->vm);
+            tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
+        } else {
+            neon_load_reg64(right, a->vn + 1);
+            neon_load_reg64(middle, a->vm);
+            tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
+            neon_load_reg64(left, a->vm + 1);
+            tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
+        }
+
+        neon_store_reg64(destright, a->vd);
+        neon_store_reg64(destleft, a->vd + 1);
+
+        tcg_temp_free_i64(destright);
+        tcg_temp_free_i64(destleft);
+        tcg_temp_free_i64(right);
+        tcg_temp_free_i64(middle);
+        tcg_temp_free_i64(left);
+    }
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int pass;
     int u;
     int vec_size;
-    uint32_t imm;
     TCGv_i32 tmp, tmp2, tmp3, tmp5;
     TCGv_ptr ptr1;
-    TCGv_i64 tmp64;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         } else { /* size == 3 */
             if (!u) {
-                /* Extract.  */
-                imm = (insn >> 8) & 0xf;
-
-                if (imm > 7 && !q)
-                    return 1;
-
-                if (q && ((rd | rn | rm) & 1)) {
-                    return 1;
-                }
-
-                if (imm == 0) {
-                    neon_load_reg64(cpu_V0, rn);
-                    if (q) {
-                        neon_load_reg64(cpu_V1, rn + 1);
-                    }
-                } else if (imm == 8) {
-                    neon_load_reg64(cpu_V0, rn + 1);
-                    if (q) {
-                        neon_load_reg64(cpu_V1, rm);
-                    }
-                } else if (q) {
-                    tmp64 = tcg_temp_new_i64();
-                    if (imm < 8) {
-                        neon_load_reg64(cpu_V0, rn);
-                        neon_load_reg64(tmp64, rn + 1);
-                    } else {
-                        neon_load_reg64(cpu_V0, rn + 1);
-                        neon_load_reg64(tmp64, rm);
-                    }
-                    tcg_gen_shri_i64(cpu_V0, cpu_V0, (imm & 7) * 8);
-                    tcg_gen_shli_i64(cpu_V1, tmp64, 64 - ((imm & 7) * 8));
-                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
-                    if (imm < 8) {
-                        neon_load_reg64(cpu_V1, rm);
-                    } else {
-                        neon_load_reg64(cpu_V1, rm + 1);
-                        imm -= 8;
-                    }
-                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
-                    tcg_gen_shri_i64(tmp64, tmp64, imm * 8);
-                    tcg_gen_or_i64(cpu_V1, cpu_V1, tmp64);
-                    tcg_temp_free_i64(tmp64);
-                } else {
-                    /* BUGFIX */
-                    neon_load_reg64(cpu_V0, rn);
-                    tcg_gen_shri_i64(cpu_V0, cpu_V0, imm * 8);
-                    neon_load_reg64(cpu_V1, rm);
-                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
-                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
-                }
-                neon_store_reg64(cpu_V0, rd);
-                if (q) {
-                    neon_store_reg64(cpu_V1, rd + 1);
-                }
+                /* Extract: handled by decodetree */
+                return 1;
             } else if ((insn & (1 << 11)) == 0) {
                 /* Two register misc.  */
                 op = ((insn >> 12) & 0x30) | ((insn >> 7) & 0xf);
-- 
2.20.1

Convert the Neon VTBL, VTBX instructions to decodetree.  The actual
implementation of the insn is copied across to the new trans function
unchanged except for renaming 'tmp5' to 'tmp4'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 41 +++---------------------
 3 files changed, 63 insertions(+), 37 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     ##################################################################
     VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
   ]
 
   # Subgroup for size != 0b11
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
     }
     return true;
 }
+
+static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
+{
+    int n;
+    TCGv_i32 tmp, tmp2, tmp3, tmp4;
+    TCGv_ptr ptr1;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    n = a->len + 1;
+    if ((a->vn + n) > 32) {
+        /*
+         * This is UNPREDICTABLE; we choose to UNDEF to avoid the
+         * helper function running off the end of the register file.
+         */
+        return false;
+    }
+    n <<= 3;
+    if (a->op) {
+        tmp = neon_load_reg(a->vd, 0);
+    } else {
+        tmp = tcg_temp_new_i32();
+        tcg_gen_movi_i32(tmp, 0);
+    }
+    tmp2 = neon_load_reg(a->vm, 0);
+    ptr1 = vfp_reg_ptr(true, a->vn);
+    tmp4 = tcg_const_i32(n);
+    gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp);
+    if (a->op) {
+        tmp = neon_load_reg(a->vd, 1);
+    } else {
+        tmp = tcg_temp_new_i32();
+        tcg_gen_movi_i32(tmp, 0);
+    }
+    tmp3 = neon_load_reg(a->vm, 1);
+    gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp4);
+    tcg_temp_free_ptr(ptr1);
+    neon_store_reg(a->vd, 0, tmp2);
+    neon_store_reg(a->vd, 1, tmp3);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
 {
     int op;
     int q;
-    int rd, rn, rm, rd_ofs, rm_ofs;
+    int rd, rm, rd_ofs, rm_ofs;
     int size;
     int pass;
     int u;
     int vec_size;
-    TCGv_i32 tmp, tmp2, tmp3, tmp5;
-    TCGv_ptr ptr1;
+    TCGv_i32 tmp, tmp2, tmp3;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     q = (insn & (1 << 6)) != 0;
     u = (insn >> 24) & 1;
     VFP_DREG_D(rd, insn);
-    VFP_DREG_N(rn, insn);
     VFP_DREG_M(rm, insn);
     size = (insn >> 20) & 3;
     vec_size = q ? 16 : 8;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     break;
                 }
             } else if ((insn & (1 << 10)) == 0) {
-                /* VTBL, VTBX.  */
-                int n = ((insn >> 8) & 3) + 1;
-                if ((rn + n) > 32) {
-                    /* This is UNPREDICTABLE; we choose to UNDEF to avoid the
-                     * helper function running off the end of the register file.
-                     */
-                    return 1;
-                }
-                n <<= 3;
-                if (insn & (1 << 6)) {
-                    tmp = neon_load_reg(rd, 0);
-                } else {
-                    tmp = tcg_temp_new_i32();
-                    tcg_gen_movi_i32(tmp, 0);
-                }
-                tmp2 = neon_load_reg(rm, 0);
-                ptr1 = vfp_reg_ptr(true, rn);
-                tmp5 = tcg_const_i32(n);
-                gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
-                tcg_temp_free_i32(tmp);
-                if (insn & (1 << 6)) {
-                    tmp = neon_load_reg(rd, 1);
-                } else {
-                    tmp = tcg_temp_new_i32();
-                    tcg_gen_movi_i32(tmp, 0);
-                }
-                tmp3 = neon_load_reg(rm, 1);
-                gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
-                tcg_temp_free_i32(tmp5);
-                tcg_temp_free_ptr(ptr1);
-                neon_store_reg(rd, 0, tmp2);
-                neon_store_reg(rd, 1, tmp3);
-                tcg_temp_free_i32(tmp);
+                /* VTBL, VTBX: handled by decodetree */
+                return 1;
             } else if ((insn & 0x380) == 0) {
                 /* VDUP */
                 int element;
-- 
2.20.1

Convert the Neon VDUP (scalar) insn to decodetree.  (Note that we
can't call this just "VDUP" as we used that already in vfp.decode for
the "VDUP (general purpose register" insn.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  7 +++++++
 target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
 target/arm/translate.c          | 25 +------------------------
 3 files changed, 34 insertions(+), 24 deletions(-)

From: Jean-Christophe Dubois <jcd@tribudubois.net>

Some bits of the CCM registers are non writable.

This was left undone in the initial commit (all bits of registers were
writable).

This patch adds the required code to protect the non writable bits.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 20200608133508.550046-1-jcd@tribudubois.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/imx6ul_ccm.c | 76 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 13 deletions(-)

diff --git a/hw/misc/imx6ul_ccm.c b/hw/misc/imx6ul_ccm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/imx6ul_ccm.c
+++ b/hw/misc/imx6ul_ccm.c
@@ -XXX,XX +XXX,XX @@
 
 #include "trace.h"
 
+static const uint32_t ccm_mask[CCM_MAX] = {
+    [CCM_CCR] = 0xf01fef80,
+    [CCM_CCDR] = 0xfffeffff,
+    [CCM_CSR] = 0xffffffff,
+    [CCM_CCSR] = 0xfffffef2,
+    [CCM_CACRR] = 0xfffffff8,
+    [CCM_CBCDR] = 0xc1f8e000,
+    [CCM_CBCMR] = 0xfc03cfff,
+    [CCM_CSCMR1] = 0x80700000,
+    [CCM_CSCMR2] = 0xe01ff003,
+    [CCM_CSCDR1] = 0xfe00c780,
+    [CCM_CS1CDR] = 0xfe00fe00,
+    [CCM_CS2CDR] = 0xf8007000,
+    [CCM_CDCDR] = 0xf00fffff,
+    [CCM_CHSCCDR] = 0xfffc01ff,
+    [CCM_CSCDR2] = 0xfe0001ff,
+    [CCM_CSCDR3] = 0xffffc1ff,
+    [CCM_CDHIPR] = 0xffffffff,
+    [CCM_CTOR] = 0x00000000,
+    [CCM_CLPCR] = 0xf39ff01c,
+    [CCM_CISR] = 0xfb85ffbe,
+    [CCM_CIMR] = 0xfb85ffbf,
+    [CCM_CCOSR] = 0xfe00fe00,
+    [CCM_CGPR] = 0xfffc3fea,
+    [CCM_CCGR0] = 0x00000000,
+    [CCM_CCGR1] = 0x00000000,
+    [CCM_CCGR2] = 0x00000000,
+    [CCM_CCGR3] = 0x00000000,
+    [CCM_CCGR4] = 0x00000000,
+    [CCM_CCGR5] = 0x00000000,
+    [CCM_CCGR6] = 0x00000000,
+    [CCM_CMEOR] = 0xafffff1f,
+};
+
+static const uint32_t analog_mask[CCM_ANALOG_MAX] = {
+    [CCM_ANALOG_PLL_ARM] = 0xfff60f80,
+    [CCM_ANALOG_PLL_USB1] = 0xfffe0fbc,
+    [CCM_ANALOG_PLL_USB2] = 0xfffe0fbc,
+    [CCM_ANALOG_PLL_SYS] = 0xfffa0ffe,
+    [CCM_ANALOG_PLL_SYS_SS] = 0x00000000,
+    [CCM_ANALOG_PLL_SYS_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_SYS_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_AUDIO] = 0xffe20f80,
+    [CCM_ANALOG_PLL_AUDIO_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_AUDIO_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_VIDEO] = 0xffe20f80,
+    [CCM_ANALOG_PLL_VIDEO_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_VIDEO_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_ENET] = 0xffc20ff0,
+    [CCM_ANALOG_PFD_480] = 0x40404040,
+    [CCM_ANALOG_PFD_528] = 0x40404040,
+    [PMU_MISC0] = 0x01fe8306,
+    [PMU_MISC1] = 0x07fcede0,
+    [PMU_MISC2] = 0x005f5f5f,
+};
+
 static const char *imx6ul_ccm_reg_name(uint32_t reg)
 {
     static char unknown[20];
@@ -XXX,XX +XXX,XX @@ static void imx6ul_ccm_write(void *opaque, hwaddr offset, uint64_t value,
 
     trace_ccm_write_reg(imx6ul_ccm_reg_name(index), (uint32_t)value);
 
-    /*
-     * We will do a better implementation later. In particular some bits
-     * cannot be written to.
-     */
-    s->ccm[index] = (uint32_t)value;
+    s->ccm[index] = (s->ccm[index] & ccm_mask[index]) |
+                           ((uint32_t)value & ~ccm_mask[index]);
 }
 
 static uint64_t imx6ul_analog_read(void *opaque, hwaddr offset, unsigned size)
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, setting bits passed in the value.
          */
-        s->analog[index - 1] |= value;
+        s->analog[index - 1] |= (value & ~analog_mask[index - 1]);
         break;
     case CCM_ANALOG_PLL_ARM_CLR:
     case CCM_ANALOG_PLL_USB1_CLR:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, unsetting bits passed in the value.
          */
-        s->analog[index - 2] &= ~value;
+        s->analog[index - 2] &= ~(value & ~analog_mask[index - 2]);
         break;
     case CCM_ANALOG_PLL_ARM_TOG:
     case CCM_ANALOG_PLL_USB1_TOG:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, toggling bits passed in the value.
          */
-        s->analog[index - 3] ^= value;
+        s->analog[index - 3] ^= (value & ~analog_mask[index - 3]);
         break;
     default:
-        /*
-         * We will do a better implementation later. In particular some bits
-         * cannot be written to.
-         */
-        s->analog[index] = value;
+        s->analog[index] = (s->analog[index] & analog_mask[index]) |
+                           (value & ~analog_mask[index]);
         break;
     }
 }
-- 
2.20.1

From: Erik Smit <erik.lucas.smit@gmail.com>

The hardware supports configurable descriptor sizes, configured in the DBLAC
register.

Most drivers use the default 4 word descriptor, which is currently hardcoded,
but Aspeed SDK configures 8 words to store extra data.

Signed-off-by: Erik Smit <erik.lucas.smit@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[PMM: removed unnecessary parens]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/ftgmac100.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/ftgmac100.c
+++ b/hw/net/ftgmac100.c
@@ -XXX,XX +XXX,XX @@
 #define FTGMAC100_APTC_TXPOLL_CNT(x)        (((x) >> 8) & 0xf)
 #define FTGMAC100_APTC_TXPOLL_TIME_SEL      (1 << 12)
 
+/*
+ * DMA burst length and arbitration control register
+ */
+#define FTGMAC100_DBLAC_RXBURST_SIZE(x)     (((x) >> 8) & 0x3)
+#define FTGMAC100_DBLAC_TXBURST_SIZE(x)     (((x) >> 10) & 0x3)
+#define FTGMAC100_DBLAC_RXDES_SIZE(x)       ((((x) >> 12) & 0xf) * 8)
+#define FTGMAC100_DBLAC_TXDES_SIZE(x)       ((((x) >> 16) & 0xf) * 8)
+#define FTGMAC100_DBLAC_IFG_CNT(x)          (((x) >> 20) & 0x7)
+#define FTGMAC100_DBLAC_IFG_INC             (1 << 23)
+
 /*
  * PHY control register
  */
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_do_tx(FTGMAC100State *s, uint32_t tx_ring,
         if (bd.des0 & s->txdes0_edotr) {
             addr = tx_ring;
         } else {
-            addr += sizeof(FTGMAC100Desc);
+            addr += FTGMAC100_DBLAC_TXDES_SIZE(s->dblac);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_write(void *opaque, hwaddr addr,
         s->phydata = value & 0xffff;
         break;
     case FTGMAC100_DBLAC: /* DMA Burst Length and Arbitration Control */
+        if (FTGMAC100_DBLAC_TXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s: transmit descriptor too small : %d bytes\n",
+                          __func__, FTGMAC100_DBLAC_TXDES_SIZE(s->dblac));
+            break;
+        }
+        if (FTGMAC100_DBLAC_RXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s: receive descriptor too small : %d bytes\n",
+                          __func__, FTGMAC100_DBLAC_RXDES_SIZE(s->dblac));
+            break;
+        }
         s->dblac = value;
         break;
     case FTGMAC100_REVR:  /* Feature Register */
@@ -XXX,XX +XXX,XX @@ static ssize_t ftgmac100_receive(NetClientState *nc, const uint8_t *buf,
         if (bd.des0 & s->rxdes0_edorr) {
             addr = s->rx_ring;
         } else {
-            addr += sizeof(FTGMAC100Desc);
+            addr += FTGMAC100_DBLAC_RXDES_SIZE(s->dblac);
         }
     }
     s->rx_descriptor = addr;
-- 
2.20.1

From: fangying <fangying1@huawei.com>

Virtual time adjustment was implemented for virt-5.0 machine type,
but the cpu property was enabled only for host-passthrough and max
cpu model.  Let's add it for any KVM arm cpu which has the generic
timer feature enabled.

Signed-off-by: Ying Fang <fangying1@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 20200608121243.2076-1-fangying1@huawei.com
[PMM: minor commit message tweak, removed inaccurate
 suggested-by tag]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c   |  6 ++++--
 target/arm/cpu64.c |  1 -
 target/arm/kvm.c   | 21 +++++++++++----------
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
     if (arm_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER)) {
         qdev_property_add_static(DEVICE(cpu), &arm_cpu_gt_cntfrq_property);
     }
+
+    if (kvm_enabled()) {
+        kvm_arm_add_vcpu_properties(obj);
+    }
 }
 
 static void arm_cpu_finalizefn(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
-        kvm_arm_add_vcpu_properties(obj);
     } else {
         cortex_a15_initfn(obj);
 
@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
         aarch64_add_sve_properties(obj);
     }
-    kvm_arm_add_vcpu_properties(obj);
     arm_cpu_post_init(obj);
 }
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
-        kvm_arm_add_vcpu_properties(obj);
     } else {
         uint64_t t;
         uint32_t u;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp)
 /* KVM VCPU properties should be prefixed with "kvm-". */
 void kvm_arm_add_vcpu_properties(Object *obj)
 {
-    if (!kvm_enabled()) {
-        return;
-    }
+    ARMCPU *cpu = ARM_CPU(obj);
+    CPUARMState *env = &cpu->env;
 
-    ARM_CPU(obj)->kvm_adjvtime = true;
-    object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
-                             kvm_no_adjvtime_set);
-    object_property_set_description(obj, "kvm-no-adjvtime",
-                                    "Set on to disable the adjustment of "
-                                    "the virtual counter. VM stopped time "
-                                    "will be counted.");
+    if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
+        cpu->kvm_adjvtime = true;
+        object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
+                                 kvm_no_adjvtime_set);
+        object_property_set_description(obj, "kvm-no-adjvtime",
+                                        "Set on to disable the adjustment of "
+                                        "the virtual counter. VM stopped time "
+                                        "will be counted.");
+    }
 }
 
 bool kvm_arm_pmu_supported(CPUState *cpu)
-- 
2.20.1

From: Jean-Christophe Dubois <jcd@tribudubois.net>

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
[PMD: Fixed 32-bit format string using PRIx32/PRIx64]
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/imx_fec.c    | 106 +++++++++++++++++++-------------------------
 hw/net/trace-events |  18 ++++++++
 2 files changed, 63 insertions(+), 61 deletions(-)

diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/imx_fec.c
+++ b/hw/net/imx_fec.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/module.h"
 #include "net/checksum.h"
 #include "net/eth.h"
+#include "trace.h"
 
 /* For crc32 */
 #include <zlib.h>
 
-#ifndef DEBUG_IMX_FEC
-#define DEBUG_IMX_FEC 0
-#endif
-
-#define FEC_PRINTF(fmt, args...) \
-    do { \
-        if (DEBUG_IMX_FEC) { \
-            fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_FEC, \
-                                             __func__, ##args); \
-        } \
-    } while (0)
-
-#ifndef DEBUG_IMX_PHY
-#define DEBUG_IMX_PHY 0
-#endif
-
-#define PHY_PRINTF(fmt, args...) \
-    do { \
-        if (DEBUG_IMX_PHY) { \
-            fprintf(stderr, "[%s.phy]%s: " fmt , TYPE_IMX_FEC, \
-                                                 __func__, ##args); \
-        } \
-    } while (0)
-
 #define IMX_MAX_DESC    1024
 
 static const char *imx_default_reg_name(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static void imx_eth_update(IMXFECState *s);
  * For now we don't handle any GPIO/interrupt line, so the OS will
  * have to poll for the PHY status.
  */
-static void phy_update_irq(IMXFECState *s)
+static void imx_phy_update_irq(IMXFECState *s)
 {
     imx_eth_update(s);
 }
 
-static void phy_update_link(IMXFECState *s)
+static void imx_phy_update_link(IMXFECState *s)
 {
     /* Autonegotiation status mirrors link status.  */
     if (qemu_get_queue(s->nic)->link_down) {
-        PHY_PRINTF("link is down\n");
+        trace_imx_phy_update_link("down");
         s->phy_status &= ~0x0024;
         s->phy_int |= PHY_INT_DOWN;
     } else {
-        PHY_PRINTF("link is up\n");
+        trace_imx_phy_update_link("up");
         s->phy_status |= 0x0024;
         s->phy_int |= PHY_INT_ENERGYON;
         s->phy_int |= PHY_INT_AUTONEG_COMPLETE;
     }
-    phy_update_irq(s);
+    imx_phy_update_irq(s);
 }
 
 static void imx_eth_set_link(NetClientState *nc)
 {
-    phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
+    imx_phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
 }
 
-static void phy_reset(IMXFECState *s)
+static void imx_phy_reset(IMXFECState *s)
 {
+    trace_imx_phy_reset();
+
     s->phy_status = 0x7809;
     s->phy_control = 0x3000;
     s->phy_advertise = 0x01e1;
     s->phy_int_mask = 0;
     s->phy_int = 0;
-    phy_update_link(s);
+    imx_phy_update_link(s);
 }
 
-static uint32_t do_phy_read(IMXFECState *s, int reg)
+static uint32_t imx_phy_read(IMXFECState *s, int reg)
 {
     uint32_t val;
 
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
     case 29:    /* Interrupt source.  */
         val = s->phy_int;
         s->phy_int = 0;
-        phy_update_irq(s);
+        imx_phy_update_irq(s);
         break;
     case 30:    /* Interrupt mask */
         val = s->phy_int_mask;
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
         break;
     }
 
-    PHY_PRINTF("read 0x%04x @ %d\n", val, reg);
+    trace_imx_phy_read(val, reg);
 
     return val;
 }
 
-static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
+static void imx_phy_write(IMXFECState *s, int reg, uint32_t val)
 {
-    PHY_PRINTF("write 0x%04x @ %d\n", val, reg);
+    trace_imx_phy_write(val, reg);
 
     if (reg > 31) {
         /* we only advertise one phy */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
     switch (reg) {
     case 0:     /* Basic Control */
         if (val & 0x8000) {
-            phy_reset(s);
+            imx_phy_reset(s);
         } else {
             s->phy_control = val & 0x7980;
             /* Complete autonegotiation immediately.  */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
         break;
     case 30:    /* Interrupt mask */
         s->phy_int_mask = val & 0xff;
-        phy_update_irq(s);
+        imx_phy_update_irq(s);
         break;
     case 17:
     case 18:
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
 static void imx_fec_read_bd(IMXFECBufDesc *bd, dma_addr_t addr)
 {
     dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
+
+    trace_imx_fec_read_bd(addr, bd->flags, bd->length, bd->data);
 }
 
 static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
 static void imx_enet_read_bd(IMXENETBufDesc *bd, dma_addr_t addr)
 {
     dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
+
+    trace_imx_enet_read_bd(addr, bd->flags, bd->length, bd->data,
+                   bd->option, bd->status);
 }
 
 static void imx_enet_write_bd(IMXENETBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_do_tx(IMXFECState *s)
         int len;
 
         imx_fec_read_bd(&bd, addr);
-        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x\n",
-                   addr, bd.flags, bd.length, bd.data);
         if ((bd.flags & ENET_BD_R) == 0) {
+
             /* Run out of descriptors to transmit.  */
-            FEC_PRINTF("tx_bd ran out of descriptors to transmit\n");
+            trace_imx_eth_tx_bd_busy();
+
             break;
         }
         len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_enet_do_tx(IMXFECState *s, uint32_t index)
         int len;
 
         imx_enet_read_bd(&bd, addr);
-        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x option %04x "
-                   "status %04x\n", addr, bd.flags, bd.length, bd.data,
-                   bd.option, bd.status);
         if ((bd.flags & ENET_BD_R) == 0) {
             /* Run out of descriptors to transmit.  */
+
+            trace_imx_eth_tx_bd_busy();
+
             break;
         }
         len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_eth_enable_rx(IMXFECState *s, bool flush)
     s->regs[ENET_RDAR] = (bd.flags & ENET_BD_E) ? ENET_RDAR_RDAR : 0;
 
     if (!s->regs[ENET_RDAR]) {
-        FEC_PRINTF("RX buffer full\n");
+        trace_imx_eth_rx_bd_full();
     } else if (flush) {
         qemu_flush_queued_packets(qemu_get_queue(s->nic));
     }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_reset(DeviceState *d)
     memset(s->tx_descriptor, 0, sizeof(s->tx_descriptor));
 
     /* We also reset the PHY */
-    phy_reset(s);
+    imx_phy_reset(s);
 }
 
 static uint32_t imx_default_read(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_eth_read(void *opaque, hwaddr offset, unsigned size)
         break;
     }
 
-    FEC_PRINTF("reg[%s] => 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
-                                              value);
+    trace_imx_eth_read(index, imx_eth_reg_name(s, index), value);
 
     return value;
 }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
     const bool single_tx_ring = !imx_eth_is_multi_tx_ring(s);
     uint32_t index = offset >> 2;
 
-    FEC_PRINTF("reg[%s] <= 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
-                (uint32_t)value);
+    trace_imx_eth_write(index, imx_eth_reg_name(s, index), value);
 
     switch (index) {
     case ENET_EIR:
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
         if (extract32(value, 29, 1)) {
             /* This is a read operation */
             s->regs[ENET_MMFR] = deposit32(s->regs[ENET_MMFR], 0, 16,
-                                           do_phy_read(s,
+                                           imx_phy_read(s,
                                                        extract32(value,
                                                                  18, 10)));
         } else {
             /* This a write operation */
-            do_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
+            imx_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
         }
         /* raise the interrupt as the PHY operation is done */
         s->regs[ENET_EIR] |= ENET_INT_MII;
@@ -XXX,XX +XXX,XX @@ static bool imx_eth_can_receive(NetClientState *nc)
 {
     IMXFECState *s = IMX_FEC(qemu_get_nic_opaque(nc));
 
-    FEC_PRINTF("\n");
-
     return !!s->regs[ENET_RDAR];
 }
 
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
     unsigned int buf_len;
     size_t size = len;
 
-    FEC_PRINTF("len %d\n", (int)size);
+    trace_imx_fec_receive(size);
 
     if (!s->regs[ENET_RDAR]) {
         qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
         bd.length = buf_len;
         size -= buf_len;
 
-        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
+        trace_imx_fec_receive_len(addr, bd.length);
 
         /* The last 4 bytes are the CRC.  */
         if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
         if (size == 0) {
             /* Last buffer in frame.  */
             bd.flags |= flags | ENET_BD_L;
-            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
+
+            trace_imx_fec_receive_last(bd.flags);
+
             s->regs[ENET_EIR] |= ENET_INT_RXF;
         } else {
             s->regs[ENET_EIR] |= ENET_INT_RXB;
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
     size_t size = len;
     bool shift16 = s->regs[ENET_RACC] & ENET_RACC_SHIFT16;
 
-    FEC_PRINTF("len %d\n", (int)size);
+    trace_imx_enet_receive(size);
 
     if (!s->regs[ENET_RDAR]) {
         qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
         bd.length = buf_len;
         size -= buf_len;
 
-        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
+        trace_imx_enet_receive_len(addr, bd.length);
 
         /* The last 4 bytes are the CRC.  */
         if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
         if (size == 0) {
             /* Last buffer in frame.  */
             bd.flags |= flags | ENET_BD_L;
-            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
+
+            trace_imx_enet_receive_last(bd.flags);
+
             /* Indicate that we've updated the last buffer descriptor. */
             bd.last_buffer = ENET_BD_BDU;
             if (bd.option & ENET_BD_RX_INT) {
diff --git a/hw/net/trace-events b/hw/net/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -XXX,XX +XXX,XX @@ i82596_receive_packet(size_t sz) "len=%zu"
 i82596_new_mac(const char *id_with_mac) "New MAC for: %s"
 i82596_set_multicast(uint16_t count) "Added %d multicast entries"
 i82596_channel_attention(void *s) "%p: Received CHANNEL ATTENTION"
+
+# imx_fec.c
+imx_phy_read(uint32_t val, int reg) "0x%04"PRIx32" <= reg[%d]"
+imx_phy_write(uint32_t val, int reg) "0x%04"PRIx32" => reg[%d]"
+imx_phy_update_link(const char *s) "%s"
+imx_phy_reset(void) ""
+imx_fec_read_bd(uint64_t addr, int flags, int len, int data) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x"
+imx_enet_read_bd(uint64_t addr, int flags, int len, int data, int options, int status) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x option 0x%04x status 0x%04x"
+imx_eth_tx_bd_busy(void) "tx_bd ran out of descriptors to transmit"
+imx_eth_rx_bd_full(void) "RX buffer is full"
+imx_eth_read(int reg, const char *reg_name, uint32_t value) "reg[%d:%s] => 0x%08"PRIx32
+imx_eth_write(int reg, const char *reg_name, uint64_t value) "reg[%d:%s] <= 0x%08"PRIx64
+imx_fec_receive(size_t size) "len %zu"
+imx_fec_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
+imx_fec_receive_last(int last) "rx frame flags 0x%04x"
+imx_enet_receive(size_t size) "len %zu"
+imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
+imx_enet_receive_last(int last) "rx frame flags 0x%04x"
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

The Linux kernel's IMX code now uses vendor specific commands.
This results in endless warnings when booting the Linux kernel.

sdhci-esdhc-imx 2194000.usdhc: esdhc_wait_for_card_clock_gate_off:
	card clock still not gate off in 100us!.

Implement support for the vendor specific command implemented in IMX hardware
to be able to avoid this warning.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Message-id: 20200603145258.195920-2-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/sd/sdhci-internal.h |  5 +++++
 include/hw/sd/sdhci.h  |  5 +++++
 hw/sd/sdhci.c          | 18 +++++++++++++++++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/sdhci-internal.h
+++ b/hw/sd/sdhci-internal.h
@@ -XXX,XX +XXX,XX @@
 #define SDHC_CMD_INHIBIT               0x00000001
 #define SDHC_DATA_INHIBIT              0x00000002
 #define SDHC_DAT_LINE_ACTIVE           0x00000004
+#define SDHC_IMX_CLOCK_GATE_OFF        0x00000080
 #define SDHC_DOING_WRITE               0x00000100
 #define SDHC_DOING_READ                0x00000200
 #define SDHC_SPACE_AVAILABLE           0x00000400
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
 
 
 #define ESDHC_MIX_CTRL                  0x48
+
 #define ESDHC_VENDOR_SPEC               0xc0
+#define ESDHC_IMX_FRC_SDCLK_ON          (1 << 8)
+
 #define ESDHC_DLL_CTRL                  0x60
 
 #define ESDHC_TUNING_CTRL               0xcc
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
 #define DEFINE_SDHCI_COMMON_PROPERTIES(_state) \
     DEFINE_PROP_UINT8("sd-spec-version", _state, sd_spec_version, 2), \
     DEFINE_PROP_UINT8("uhs", _state, uhs_mode, UHS_NOT_SUPPORTED), \
+    DEFINE_PROP_UINT8("vendor", _state, vendor, SDHCI_VENDOR_NONE), \
     \
     /* Capabilities registers provide information on supported
      * features of this specific host controller implementation */ \
diff --git a/include/hw/sd/sdhci.h b/include/hw/sd/sdhci.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/sd/sdhci.h
+++ b/include/hw/sd/sdhci.h
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
     uint16_t acmd12errsts; /* Auto CMD12 error status register */
     uint16_t hostctl2;     /* Host Control 2 */
     uint64_t admasysaddr;  /* ADMA System Address Register */
+    uint16_t vendor_spec;  /* Vendor specific register */
 
     /* Read-only registers */
     uint64_t capareg;      /* Capabilities Register */
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
     uint32_t quirks;
     uint8_t sd_spec_version;
     uint8_t uhs_mode;
+    uint8_t vendor;        /* For vendor specific functionality */
 } SDHCIState;
 
+#define SDHCI_VENDOR_NONE       0
+#define SDHCI_VENDOR_IMX        1
+
 /*
  * Controller does not provide transfer-complete interrupt when not
  * busy.
diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -XXX,XX +XXX,XX @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
         }
         break;
 
+    case ESDHC_VENDOR_SPEC:
+        ret = s->vendor_spec;
+        break;
     case ESDHC_DLL_CTRL:
     case ESDHC_TUNE_CTRL_STATUS:
     case ESDHC_UNDOCUMENTED_REG27:
     case ESDHC_TUNING_CTRL:
-    case ESDHC_VENDOR_SPEC:
     case ESDHC_MIX_CTRL:
     case ESDHC_WTMK_LVL:
         ret = 0;
@@ -XXX,XX +XXX,XX @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
     case ESDHC_UNDOCUMENTED_REG27:
     case ESDHC_TUNING_CTRL:
     case ESDHC_WTMK_LVL:
+        break;
+
     case ESDHC_VENDOR_SPEC:
+        s->vendor_spec = value;
+        switch (s->vendor) {
+        case SDHCI_VENDOR_IMX:
+            if (value & ESDHC_IMX_FRC_SDCLK_ON) {
+                s->prnsts &= ~SDHC_IMX_CLOCK_GATE_OFF;
+            } else {
+                s->prnsts |= SDHC_IMX_CLOCK_GATE_OFF;
+            }
+            break;
+        default:
+            break;
+        }
         break;
 
     case SDHC_HOSTCTL:
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

Set vendor property to IMX to enable IMX specific functionality
in sdhci code.

Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200603145258.195920-3-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/fsl-imx25.c  | 6 ++++++
 hw/arm/fsl-imx6.c   | 6 ++++++
 hw/arm/fsl-imx6ul.c | 2 ++
 hw/arm/fsl-imx7.c   | 2 ++
 4 files changed, 16 insertions(+)

diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx25.c
+++ b/hw/arm/fsl-imx25.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
                                  &err);
         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX25_ESDHC_CAPABILITIES,
                                  "capareg", &err);
+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
         object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
         if (err) {
             error_propagate(errp, err);
diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx6.c
+++ b/hw/arm/fsl-imx6.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
                                  &err);
         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX6_ESDHC_CAPABILITIES,
                                  "capareg", &err);
+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
         object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
         if (err) {
             error_propagate(errp, err);
diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx6ul.c
+++ b/hw/arm/fsl-imx6ul.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
             FSL_IMX6UL_USDHC2_IRQ,
         };
 
+        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
+                                        "vendor", &error_abort);
         object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                  &error_abort);
 
diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx7.c
+++ b/hw/arm/fsl-imx7.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
             FSL_IMX7_USDHC3_IRQ,
         };
 
+        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &error_abort);
         object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                  &error_abort);
 
-- 
2.20.1

Small pile of bug fixes for rc1. I've included my patches to get
our docs building with Sphinx 3, just for convenience...

-- PMM

The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:

Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102

for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:

tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)

----------------------------------------------------------------
target-arm queue:
 * target/arm: Fix Neon emulation bugs on big-endian hosts
 * target/arm: fix handling of HCR.FB
 * target/arm: fix LORID_EL1 access check
 * disas/capstone: Fix monitor disassembly of >32 bytes
 * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
 * hw/arm/boot: fix SVE for EL3 direct kernel boot
 * hw/display/omap_lcdc: Fix potential NULL pointer dereference
 * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
 * target/arm: Get correct MMU index for other-security-state
 * configure: Test that gio libs from pkg-config work
 * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
 * docs: Fix building with Sphinx 3
 * tests/qtest/npcm7xx_rng-test: Disable randomness tests

----------------------------------------------------------------
AlexChen (2):
      hw/display/omap_lcdc: Fix potential NULL pointer dereference
      hw/display/exynos4210_fimd: Fix potential NULL pointer dereference

Peter Maydell (9):
      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
      target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
      disas/capstone: Fix monitor disassembly of >32 bytes
      target/arm: Get correct MMU index for other-security-state
      configure: Test that gio libs from pkg-config work
      hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
      scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
      qemu-option-trace.rst.inc: Don't use option:: markup
      tests/qtest/npcm7xx_rng-test: Disable randomness tests

Philippe Mathieu-Daudé (1):
      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)

Richard Henderson (11):
      target/arm: Introduce neon_full_reg_offset
      target/arm: Move neon_element_offset to translate.c
      target/arm: Use neon_element_offset in neon_load/store_reg
      target/arm: Use neon_element_offset in vfp_reg_offset
      target/arm: Add read/write_neon_element32
      target/arm: Expand read/write_neon_element32 to all MemOp
      target/arm: Rename neon_load_reg32 to vfp_load_reg32
      target/arm: Add read/write_neon_element64
      target/arm: Rename neon_load_reg64 to vfp_load_reg64
      target/arm: Simplify do_long_3d and do_2scalar_long
      target/arm: Improve do_prewiden_3d

Rémi Denis-Courmont (3):
      target/arm: fix handling of HCR.FB
      target/arm: fix LORID_EL1 access check
      hw/arm/boot: fix SVE for EL3 direct kernel boot