Series comparison

-[PULL 00/23] target-arm queue
+[PULL 00/24] target-arm queue
-Mostly my decodetree stuff, but also some patches for various
+Hi; here's the latest round of arm patches. I have included also
-smaller bugs/features from others.
+my patchset for the RTC devices to avoid keeping time_t and
 time_t diffs in 32-bit variables.
 thanks
 -- PMM
-The following changes since commit 53550e81e2cafe7c03a39526b95cd21b5194d9b1:
+The following changes since commit 156618d9ea67f2f2e31d9dedd97f2dcccbe6808c:
-  Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2020-06-15 16:36:34 +0100)
+  Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging (2023-08-30 09:20:27 -0400)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200616
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230831
-for you to fetch changes up to 64b397417a26509bcdff44ab94356a35c7901c79:
+for you to fetch changes up to e73b8bb8a3e9a162f70e9ffbf922d4fafc96bbfb:
-  hw: arm: Set vendor property for IMX SDHCI emulations (2020-06-16 10:32:29 +0100)
+  hw/arm: Set number of MPU regions correctly for an505, an521, an524 (2023-08-31 11:07:02 +0100)
 ----------------------------------------------------------------
- * hw: arm: Set vendor property for IMX SDHCI emulations
+target-arm queue:
- * sd: sdhci: Implement basic vendor specific register support
+ * Some of the preliminary patches for Cortex-A710 support
- * hw/net/imx_fec: Convert debug fprintf() to trace events
+ * i.MX7 and i.MX6UL refactoring
- * target/arm/cpu: adjust virtual time for all KVM arm cpus
+ * Implement SRC device for i.MX7
- * Implement configurable descriptor size in ftgmac100
+ * Catch illegal-exception-return from EL3 with bad NSE/NS
- * hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+ * Use 64-bit offsets for holding time_t differences in RTC devices
- * target/arm: More Neon decodetree conversion work
+ * Model correct number of MPU regions for an505, an521, an524 boards
 ----------------------------------------------------------------
-Erik Smit (1):
+Alex Bennée (1):
-      Implement configurable descriptor size in ftgmac100
+      target/arm: properly document FEAT_CRC32
-Guenter Roeck (2):
+Jean-Christophe Dubois (6):
-      sd: sdhci: Implement basic vendor specific register support
+      Remove i.MX7 IOMUX GPR device from i.MX6UL
-      hw: arm: Set vendor property for IMX SDHCI emulations
+      Refactor i.MX6UL processor code
       Add i.MX6UL missing devices.
       Refactor i.MX7 processor code
       Add i.MX7 missing TZ devices and memory regions
       Add i.MX7 SRC device implementation
-Jean-Christophe Dubois (2):
+Peter Maydell (8):
-      hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+      target/arm: Catch illegal-exception-return from EL3 with bad NSE/NS
-      hw/net/imx_fec: Convert debug fprintf() to trace events
+      hw/rtc/m48t59: Use 64-bit arithmetic in set_alarm()
       hw/rtc/twl92230: Use int64_t for sec_offset and alm_sec
       hw/rtc/aspeed_rtc: Use 64-bit offset for holding time_t difference
       rtc: Use time_t for passing and returning time offsets
       target/arm: Do all "ARM_FEATURE_X implies Y" checks in post_init
       hw/arm/armv7m: Add mpu-ns-regions and mpu-s-regions properties
       hw/arm: Set number of MPU regions correctly for an505, an521, an524
-Peter Maydell (17):
+Richard Henderson (9):
-      target/arm: Fix missing temp frees in do_vshll_2sh
+      target/arm: Reduce dcz_blocksize to uint8_t
-      target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
+      target/arm: Allow cpu to configure GM blocksize
-      target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
+      target/arm: Support more GM blocksizes
-      target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
+      target/arm: When tag memory is not present, set MTE=1
-      target/arm: Convert Neon 3-reg-diff long multiplies
+      target/arm: Introduce make_ccsidr64
-      target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
+      target/arm: Apply access checks to neoverse-n1 special registers
-      target/arm: Convert Neon 3-reg-diff polynomial VMULL
+      target/arm: Apply access checks to neoverse-v1 special registers
-      target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
+      target/arm: Suppress FEAT_TRBE (Trace Buffer Extension)
-      target/arm: Add missing TCG temp free in do_2shift_env_64()
+      target/arm: Implement FEAT_HPDS2 as a no-op
       target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
       target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
       target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
       target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
       target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
       target/arm: Convert Neon VEXT to decodetree
       target/arm: Convert Neon VTBL, VTBX to decodetree
       target/arm: Convert Neon VDUP (scalar) to decodetree
-fangying (1):
+ docs/system/arm/emulation.rst  |   2 +
-      target/arm/cpu: adjust virtual time for all KVM arm cpus
+ include/hw/arm/armsse.h        |   5 +
  include/hw/arm/armv7m.h        |   8 +
  include/hw/arm/fsl-imx6ul.h    | 158 ++++++++++++++++---
  include/hw/arm/fsl-imx7.h      | 338 ++++++++++++++++++++++++++++++-----------
  include/hw/misc/imx7_src.h     |  66 ++++++++
  include/hw/rtc/aspeed_rtc.h    |   2 +-
  include/sysemu/rtc.h           |   4 +-
  target/arm/cpregs.h            |   2 +
  target/arm/cpu.h               |   5 +-
  target/arm/internals.h         |   6 -
  target/arm/tcg/translate.h     |   2 +
  hw/arm/armsse.c                |  16 ++
  hw/arm/armv7m.c                |  21 +++
  hw/arm/fsl-imx6ul.c            | 174 +++++++++++++--------
  hw/arm/fsl-imx7.c              | 201 +++++++++++++++++++-----
  hw/arm/mps2-tz.c               |  29 ++++
  hw/misc/imx7_src.c             | 276 +++++++++++++++++++++++++++++++++
  hw/rtc/aspeed_rtc.c            |   5 +-
  hw/rtc/m48t59.c                |   2 +-
  hw/rtc/twl92230.c              |   4 +-
  softmmu/rtc.c                  |   4 +-
  target/arm/cpu.c               | 207 ++++++++++++++-----------
  target/arm/helper.c            |  15 +-
  target/arm/tcg/cpu32.c         |   2 +-
  target/arm/tcg/cpu64.c         | 102 +++++++++----
  target/arm/tcg/helper-a64.c    |   9 ++
  target/arm/tcg/mte_helper.c    |  90 ++++++++---
  target/arm/tcg/translate-a64.c |   5 +-
  hw/misc/meson.build            |   1 +
  hw/misc/trace-events           |   4 +
 files changed, 1393 insertions(+), 372 deletions(-)
  create mode 100644 include/hw/misc/imx7_src.h
  create mode 100644 hw/misc/imx7_src.c
- hw/sd/sdhci-internal.h          |    5 +
- include/hw/sd/sdhci.h           |    5 +
- target/arm/translate.h          |    1 +
- target/arm/neon-dp.decode       |  130 +++++
- hw/arm/fsl-imx25.c              |    6 +
- hw/arm/fsl-imx6.c               |    6 +
- hw/arm/fsl-imx6ul.c             |    2 +
- hw/arm/fsl-imx7.c               |    2 +
- hw/misc/imx6ul_ccm.c            |   76 ++-
- hw/net/ftgmac100.c              |   26 +-
- hw/net/imx_fec.c                |  106 ++--
- hw/sd/sdhci.c                   |   18 +-
- target/arm/cpu.c                |    6 +-
- target/arm/cpu64.c              |    1 -
- target/arm/kvm.c                |   21 +-
- target/arm/translate-neon.inc.c | 1148 ++++++++++++++++++++++++++++++++++++++-
- target/arm/translate.c          |  684 +----------------------
- hw/net/trace-events             |   18 +
-files changed, 1495 insertions(+), 766 deletions(-)

-[PULL 16/23] target/arm: Convert Neon VTBL, VTBX to decodetree
+[PULL 01/24] target/arm: Reduce dcz_blocksize to uint8_t
-Convert the Neon VTBL, VTBX instructions to decodetree.  The actual
+From: Richard Henderson <richard.henderson@linaro.org>
 implementation of the insn is copied across to the new trans function
 unchanged except for renaming 'tmp5' to 'tmp4'.
+This value is only 4 bits wide.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20230811214031.171020-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  3 ++
+ target/arm/cpu.h | 3 ++-
- target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
+file changed, 2 insertions(+), 1 deletion(-)
  target/arm/translate.c          | 41 +++---------------------
 files changed, 63 insertions(+), 37 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/cpu.h
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
-     ##################################################################
+     bool prop_lpa2;
-     VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
-                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+     /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
 -    uint32_t dcz_blocksize;
 +    uint8_t dcz_blocksize;
 +
-+    VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
+     uint64_t rvbar_prop; /* Property/input signals.  */
-+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
-   ]
+     /* Configurable aspects of GIC cpu interface (which is part of the CPU) */
    # Subgroup for size != 0b11
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
      }
      return true;
  }
 +
 +static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
 +{
 +    int n;
 +    TCGv_i32 tmp, tmp2, tmp3, tmp4;
 +    TCGv_ptr ptr1;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    n = a->len + 1;
 +    if ((a->vn + n) > 32) {
 +        /*
 +         * This is UNPREDICTABLE; we choose to UNDEF to avoid the
 +         * helper function running off the end of the register file.
 +         */
 +        return false;
 +    }
 +    n <<= 3;
 +    if (a->op) {
 +        tmp = neon_load_reg(a->vd, 0);
 +    } else {
 +        tmp = tcg_temp_new_i32();
 +        tcg_gen_movi_i32(tmp, 0);
 +    }
 +    tmp2 = neon_load_reg(a->vm, 0);
 +    ptr1 = vfp_reg_ptr(true, a->vn);
 +    tmp4 = tcg_const_i32(n);
 +    gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
 +    tcg_temp_free_i32(tmp);
 +    if (a->op) {
 +        tmp = neon_load_reg(a->vd, 1);
 +    } else {
 +        tmp = tcg_temp_new_i32();
 +        tcg_gen_movi_i32(tmp, 0);
 +    }
 +    tmp3 = neon_load_reg(a->vm, 1);
 +    gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
 +    tcg_temp_free_i32(tmp4);
 +    tcg_temp_free_ptr(ptr1);
 +    neon_store_reg(a->vd, 0, tmp2);
 +    neon_store_reg(a->vd, 1, tmp3);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
  {
      int op;
      int q;
 -    int rd, rn, rm, rd_ofs, rm_ofs;
 +    int rd, rm, rd_ofs, rm_ofs;
      int size;
      int pass;
      int u;
      int vec_size;
 -    TCGv_i32 tmp, tmp2, tmp3, tmp5;
 -    TCGv_ptr ptr1;
 +    TCGv_i32 tmp, tmp2, tmp3;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
      q = (insn & (1 << 6)) != 0;
      u = (insn >> 24) & 1;
      VFP_DREG_D(rd, insn);
 -    VFP_DREG_N(rn, insn);
      VFP_DREG_M(rm, insn);
      size = (insn >> 20) & 3;
      vec_size = q ? 16 : 8;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      break;
                  }
              } else if ((insn & (1 << 10)) == 0) {
 -                /* VTBL, VTBX.  */
 -                int n = ((insn >> 8) & 3) + 1;
 -                if ((rn + n) > 32) {
 -                    /* This is UNPREDICTABLE; we choose to UNDEF to avoid the
 -                     * helper function running off the end of the register file.
 -                     */
 -                    return 1;
 -                }
 -                n <<= 3;
 -                if (insn & (1 << 6)) {
 -                    tmp = neon_load_reg(rd, 0);
 -                } else {
 -                    tmp = tcg_temp_new_i32();
 -                    tcg_gen_movi_i32(tmp, 0);
 -                }
 -                tmp2 = neon_load_reg(rm, 0);
 -                ptr1 = vfp_reg_ptr(true, rn);
 -                tmp5 = tcg_const_i32(n);
 -                gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
 -                tcg_temp_free_i32(tmp);
 -                if (insn & (1 << 6)) {
 -                    tmp = neon_load_reg(rd, 1);
 -                } else {
 -                    tmp = tcg_temp_new_i32();
 -                    tcg_gen_movi_i32(tmp, 0);
 -                }
 -                tmp3 = neon_load_reg(rm, 1);
 -                gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
 -                tcg_temp_free_i32(tmp5);
 -                tcg_temp_free_ptr(ptr1);
 -                neon_store_reg(rd, 0, tmp2);
 -                neon_store_reg(rd, 1, tmp3);
 -                tcg_temp_free_i32(tmp);
 +                /* VTBL, VTBX: handled by decodetree */
 +                return 1;
              } else if ((insn & 0x380) == 0) {
                  /* VDUP */
                  int element;
 --
-.20.1
+.34.1

-[PULL 14/23] target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
+[PULL 02/24] target/arm: Allow cpu to configure GM blocksize
-Convert the Neon 2-reg-scalar long multiplies to decodetree.
+From: Richard Henderson <richard.henderson@linaro.org>
-These are the last instructions in the group.
+Previously we hard-coded the blocksize with GMID_EL1_BS.
 But the value we choose for -cpu max does not match the
 value that cortex-a710 uses.
 Mirror the way we handle dcz_blocksize.
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20230811214031.171020-3-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  18 ++++
+ target/arm/cpu.h               |  2 ++
- target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
+ target/arm/internals.h         |  6 -----
- target/arm/translate.c          | 182 ++------------------------------
+ target/arm/tcg/translate.h     |  2 ++
-files changed, 187 insertions(+), 176 deletions(-)
+ target/arm/helper.c            | 11 +++++---
+ target/arm/tcg/cpu64.c         |  1 +
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+ target/arm/tcg/mte_helper.c    | 46 ++++++++++++++++++++++------------
-index XXXXXXX..XXXXXXX 100644
+ target/arm/tcg/translate-a64.c |  5 ++--
---- a/target/arm/neon-dp.decode
+files changed, 45 insertions(+), 28 deletions(-)
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
-     @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
+--- a/target/arm/cpu.h
-                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
++++ b/target/arm/cpu.h
-+    # For the 'long' ops the Q bit is part of insn decode
+@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
-+    @2scalar_q0  .... ... . . . size:2 .... .... .... . . . . .... \
-+                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
+     /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
+     uint8_t dcz_blocksize;
-     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
++    /* GM blocksize, in log_2(words), ie low 4 bits of GMID_EL0 */
-     VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
++    uint8_t gm_blocksize;
-+    VMLAL_S_2sc  1111 001 0 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+     uint64_t rvbar_prop; /* Property/input signals.  */
-+    VMLAL_U_2sc  1111 001 1 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
-+
+diff --git a/target/arm/internals.h b/target/arm/internals.h
-+    VQDMLAL_2sc  1111 001 0 1 . .. .... .... 0011 . 1 . 0 .... @2scalar_q0
+index XXXXXXX..XXXXXXX 100644
-+
+--- a/target/arm/internals.h
-     VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
++++ b/target/arm/internals.h
-     VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
+@@ -XXX,XX +XXX,XX @@ void arm_log_exception(CPUState *cs);
-+    VMLSL_S_2sc  1111 001 0 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
+ #endif /* !CONFIG_USER_ONLY */
-+    VMLSL_U_2sc  1111 001 1 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
-+
+-/*
-+    VQDMLSL_2sc  1111 001 0 1 . .. .... .... 0111 . 1 . 0 .... @2scalar_q0
+- * The log2 of the words in the tag block, for GMID_EL1.BS.
-+
+- * The is the maximum, 256 bytes, which manipulates 64-bits of tags.
-     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
+- */
-     VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
+-#define GMID_EL1_BS  6
 +    VMULL_S_2sc  1111 001 0 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
 +    VMULL_U_2sc  1111 001 1 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
 +
 +    VQDMULL_2sc  1111 001 0 1 . .. .... .... 1011 . 1 . 0 .... @2scalar_q0
 +
      VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
      VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
      };
      return do_vqrdmlah_2sc(s, a, opfn[a->size]);
  }
 +
 +static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
 +                            NeonGenTwoOpWidenFn *opfn,
 +                            NeonGenTwo64OpFn *accfn)
 +{
 +    /*
 +     * Two registers and a scalar, long operations: perform an
 +     * operation on the input elements and the scalar which produces
 +     * a double-width result, and then possibly perform an accumulation
 +     * operation of that result into the destination.
 +     */
 +    TCGv_i32 scalar, rn;
 +    TCGv_i64 rn0_64, rn1_64;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!opfn) {
 +        /* Bad size (including size == 3, which is a different insn group) */
 +        return false;
 +    }
 +
 +    if (a->vd & 1) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    scalar = neon_get_scalar(a->size, a->vm);
 +
 +    /* Load all inputs before writing any outputs, in case of overlap */
 +    rn = neon_load_reg(a->vn, 0);
 +    rn0_64 = tcg_temp_new_i64();
 +    opfn(rn0_64, rn, scalar);
 +    tcg_temp_free_i32(rn);
 +
 +    rn = neon_load_reg(a->vn, 1);
 +    rn1_64 = tcg_temp_new_i64();
 +    opfn(rn1_64, rn, scalar);
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(scalar);
 +
 +    if (accfn) {
 +        TCGv_i64 t64 = tcg_temp_new_i64();
 +        neon_load_reg64(t64, a->vd);
 +        accfn(t64, t64, rn0_64);
 +        neon_store_reg64(t64, a->vd);
 +        neon_load_reg64(t64, a->vd + 1);
 +        accfn(t64, t64, rn1_64);
 +        neon_store_reg64(t64, a->vd + 1);
 +        tcg_temp_free_i64(t64);
 +    } else {
 +        neon_store_reg64(rn0_64, a->vd);
 +        neon_store_reg64(rn1_64, a->vd + 1);
 +    }
 +    tcg_temp_free_i64(rn0_64);
 +    tcg_temp_free_i64(rn1_64);
 +    return true;
 +}
 +
 +static bool trans_VMULL_S_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_helper_neon_mull_s16,
 +        gen_mull_s32,
 +        NULL,
 +    };
 +
 +    return do_2scalar_long(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_helper_neon_mull_u16,
 +        gen_mull_u32,
 +        NULL,
 +    };
 +
 +    return do_2scalar_long(s, a, opfn[a->size], NULL);
 +}
 +
 +#define DO_VMLAL_2SC(INSN, MULL, ACC)                                   \
 +    static bool trans_##INSN##_2sc(DisasContext *s, arg_2scalar *a)     \
 +    {                                                                   \
 +        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
 +            NULL,                                                       \
 +            gen_helper_neon_##MULL##16,                                 \
 +            gen_##MULL##32,                                             \
 +            NULL,                                                       \
 +        };                                                              \
 +        static NeonGenTwo64OpFn * const accfn[] = {                     \
 +            NULL,                                                       \
 +            gen_helper_neon_##ACC##l_u32,                               \
 +            tcg_gen_##ACC##_i64,                                        \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);    \
 +    }
 +
 +DO_VMLAL_2SC(VMLAL_S, mull_s, add)
 +DO_VMLAL_2SC(VMLAL_U, mull_u, add)
 +DO_VMLAL_2SC(VMLSL_S, mull_s, sub)
 +DO_VMLAL_2SC(VMLSL_U, mull_u, sub)
 +
 +static bool trans_VQDMULL_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULL_16,
 +        gen_VQDMULL_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar_long(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VQDMLAL_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULL_16,
 +        gen_VQDMULL_32,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const accfn[] = {
 +        NULL,
 +        gen_VQDMLAL_acc_16,
 +        gen_VQDMLAL_acc_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
 +}
 +
 +static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULL_16,
 +        gen_VQDMULL_32,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const accfn[] = {
 +        NULL,
 +        gen_VQDMLSL_acc_16,
 +        gen_VQDMLSL_acc_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
      tcg_gen_ext16s_i32(dest, var);
  }
 -/* 32x32->64 multiply.  Marks inputs as dead.  */
 -static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
 -{
 -    TCGv_i32 lo = tcg_temp_new_i32();
 -    TCGv_i32 hi = tcg_temp_new_i32();
 -    TCGv_i64 ret;
 -
--    tcg_gen_mulu2_i32(lo, hi, a, b);
+ /*
--    tcg_temp_free_i32(a);
+  * SVE predicates are 1/8 the size of SVE vectors, and cannot use
--    tcg_temp_free_i32(b);
+  * the same simd_desc() encoding due to restrictions on size.
--
+diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
--    ret = tcg_temp_new_i64();
+index XXXXXXX..XXXXXXX 100644
--    tcg_gen_concat_i32_i64(ret, lo, hi);
+--- a/target/arm/tcg/translate.h
--    tcg_temp_free_i32(lo);
++++ b/target/arm/tcg/translate.h
--    tcg_temp_free_i32(hi);
+@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
--
+     int8_t btype;
--    return ret;
+     /* A copy of cpu->dcz_blocksize. */
--}
+     uint8_t dcz_blocksize;
--
++    /* A copy of cpu->gm_blocksize. */
--static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
++    uint8_t gm_blocksize;
--{
+     /* True if this page is guarded.  */
--    TCGv_i32 lo = tcg_temp_new_i32();
+     bool guarded_page;
--    TCGv_i32 hi = tcg_temp_new_i32();
+     /* Bottom two bits of XScale c15_cpar coprocessor access control reg */
--    TCGv_i64 ret;
+diff --git a/target/arm/helper.c b/target/arm/helper.c
--
+index XXXXXXX..XXXXXXX 100644
--    tcg_gen_muls2_i32(lo, hi, a, b);
+--- a/target/arm/helper.c
--    tcg_temp_free_i32(a);
++++ b/target/arm/helper.c
--    tcg_temp_free_i32(b);
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo mte_reginfo[] = {
--
+       .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 6,
--    ret = tcg_temp_new_i64();
+       .access = PL1_RW, .accessfn = access_mte,
--    tcg_gen_concat_i32_i64(ret, lo, hi);
+       .fieldoffset = offsetof(CPUARMState, cp15.gcr_el1) },
--    tcg_temp_free_i32(lo);
+-    { .name = "GMID_EL1", .state = ARM_CP_STATE_AA64,
--    tcg_temp_free_i32(hi);
+-      .opc0 = 3, .opc1 = 1, .crn = 0, .crm = 0, .opc2 = 4,
--
+-      .access = PL1_R, .accessfn = access_aa64_tid5,
--    return ret;
+-      .type = ARM_CP_CONST, .resetvalue = GMID_EL1_BS },
--}
+     { .name = "TCO", .state = ARM_CP_STATE_AA64,
--
+       .opc0 = 3, .opc1 = 3, .crn = 4, .crm = 2, .opc2 = 7,
- /* Swap low and high halfwords.  */
+       .type = ARM_CP_NO_RAW,
- static void gen_swap_half(TCGv_i32 var)
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
- {
+      * then define only a RAZ/WI version of PSTATE.TCO.
-@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
+      */
      if (cpu_isar_feature(aa64_mte, cpu)) {
 +        ARMCPRegInfo gmid_reginfo = {
 +            .name = "GMID_EL1", .state = ARM_CP_STATE_AA64,
 +            .opc0 = 3, .opc1 = 1, .crn = 0, .crm = 0, .opc2 = 4,
 +            .access = PL1_R, .accessfn = access_aa64_tid5,
 +            .type = ARM_CP_CONST, .resetvalue = cpu->gm_blocksize,
 +        };
 +        define_one_arm_cp_reg(cpu, &gmid_reginfo);
          define_arm_cp_regs(cpu, mte_reginfo);
          define_arm_cp_regs(cpu, mte_el0_cacheop_reginfo);
      } else if (cpu_isar_feature(aa64_mte_insn_reg, cpu)) {
 diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/cpu64.c
 +++ b/target/arm/tcg/cpu64.c
@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
      cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
      cpu->dcz_blocksize = 7; /*  512 bytes */
  #endif
 +    cpu->gm_blocksize = 6;  /*  256 bytes */
      cpu->sve_vq.supported = MAKE_64BIT_MASK(0, ARM_MAX_VQ);
      cpu->sme_vq.supported = SVE_VQ_POW2_MAP;
 diff --git a/target/arm/tcg/mte_helper.c b/target/arm/tcg/mte_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/mte_helper.c
 +++ b/target/arm/tcg/mte_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(st2g_stub)(CPUARMState *env, uint64_t ptr)
      }
  }
--static inline void gen_neon_negl(TCGv_i64 var, int size)
+-#define LDGM_STGM_SIZE  (4 << GMID_EL1_BS)
 -{
 -    switch (size) {
 -    case 0: gen_helper_neon_negl_u16(var, var); break;
 -    case 1: gen_helper_neon_negl_u32(var, var); break;
 -    case 2:
 -        tcg_gen_neg_i64(var, var);
 -        break;
 -    default: abort();
 -    }
 -}
 -
--static inline void gen_neon_addl_saturate(TCGv_i64 op0, TCGv_i64 op1, int size)
+ uint64_t HELPER(ldgm)(CPUARMState *env, uint64_t ptr)
 -{
 -    switch (size) {
 -    case 1: gen_helper_neon_addl_saturate_s32(op0, cpu_env, op0, op1); break;
 -    case 2: gen_helper_neon_addl_saturate_s64(op0, cpu_env, op0, op1); break;
 -    default: abort();
 -    }
 -}
 -
 -static inline void gen_neon_mull(TCGv_i64 dest, TCGv_i32 a, TCGv_i32 b,
 -                                 int size, int u)
 -{
 -    TCGv_i64 tmp;
 -
 -    switch ((size << 1) | u) {
 -    case 0: gen_helper_neon_mull_s8(dest, a, b); break;
 -    case 1: gen_helper_neon_mull_u8(dest, a, b); break;
 -    case 2: gen_helper_neon_mull_s16(dest, a, b); break;
 -    case 3: gen_helper_neon_mull_u16(dest, a, b); break;
 -    case 4:
 -        tmp = gen_muls_i64_i32(a, b);
 -        tcg_gen_mov_i64(dest, tmp);
 -        tcg_temp_free_i64(tmp);
 -        break;
 -    case 5:
 -        tmp = gen_mulu_i64_i32(a, b);
 -        tcg_gen_mov_i64(dest, tmp);
 -        tcg_temp_free_i64(tmp);
 -        break;
 -    default: abort();
 -    }
 -
 -    /* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
 -       Don't forget to clean them now.  */
 -    if (size < 2) {
 -        tcg_temp_free_i32(a);
 -        tcg_temp_free_i32(b);
 -    }
 -}
 -
  static void gen_neon_narrow_op(int op, int u, int size,
                                 TCGv_i32 dest, TCGv_i64 src)
  {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+     int mmu_idx = cpu_mmu_index(env, false);
-     int u;
+     uintptr_t ra = GETPC();
-     int vec_size;
++    int gm_bs = env_archcpu(env)->gm_blocksize;
-     uint32_t imm;
++    int gm_bs_bytes = 4 << gm_bs;
--    TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
+     void *tag_mem;
-+    TCGv_i32 tmp, tmp2, tmp3, tmp5;
-     TCGv_ptr ptr1;
+-    ptr = QEMU_ALIGN_DOWN(ptr, LDGM_STGM_SIZE);
-     TCGv_i64 tmp64;
++    ptr = QEMU_ALIGN_DOWN(ptr, gm_bs_bytes);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+     /* Trap if accessing an invalid page.  */
-         return 1;
+     tag_mem = allocation_tag_mem(env, mmu_idx, ptr, MMU_DATA_LOAD,
-     } else { /* (insn & 0x00800010 == 0x00800000) */
+-                                 LDGM_STGM_SIZE, MMU_DATA_LOAD,
-         if (size != 3) {
+-                                 LDGM_STGM_SIZE / (2 * TAG_GRANULE), ra);
--            op = (insn >> 8) & 0xf;
++                                 gm_bs_bytes, MMU_DATA_LOAD,
--            if ((insn & (1 << 6)) == 0) {
++                                 gm_bs_bytes / (2 * TAG_GRANULE), ra);
--                /* Three registers of different lengths: handled by decodetree */
--                return 1;
+     /* The tag is squashed to zero if the page does not support tags.  */
--            } else {
+     if (!tag_mem) {
--                /* Two registers and a scalar. NB that for ops of this form
+         return 0;
--                 * the ARM ARM labels bit 24 as Q, but it is in our variable
+     }
--                 * 'u', not 'q'.
--                 */
+-    QEMU_BUILD_BUG_ON(GMID_EL1_BS != 6);
--                if (size == 0) {
+     /*
--                    return 1;
+-     * We are loading 64-bits worth of tags.  The ordering of elements
--                }
+-     * within the word corresponds to a 64-bit little-endian operation.
--                switch (op) {
++     * The ordering of elements within the word corresponds to
--                case 0: /* Integer VMLA scalar */
++     * a little-endian operation.
--                case 4: /* Integer VMLS scalar */
+      */
--                case 8: /* Integer VMUL scalar */
+-    return ldq_le_p(tag_mem);
--                case 1: /* Float VMLA scalar */
++    switch (gm_bs) {
--                case 5: /* Floating point VMLS scalar */
++    case 6:
--                case 9: /* Floating point VMUL scalar */
++        /* 256 bytes -> 16 tags -> 64 result bits */
--                case 12: /* VQDMULH scalar */
++        return ldq_le_p(tag_mem);
--                case 13: /* VQRDMULH scalar */
++    default:
--                case 14: /* VQRDMLAH scalar */
++        /* cpu configured with unsupported gm blocksize. */
--                case 15: /* VQRDMLSH scalar */
++        g_assert_not_reached();
--                    return 1; /* handled by decodetree */
++    }
--
+ }
--                case 3: /* VQDMLAL scalar */
--                case 7: /* VQDMLSL scalar */
+ void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
--                case 11: /* VQDMULL scalar */
+ {
--                    if (u == 1) {
+     int mmu_idx = cpu_mmu_index(env, false);
--                        return 1;
+     uintptr_t ra = GETPC();
--                    }
++    int gm_bs = env_archcpu(env)->gm_blocksize;
--                    /* fall through */
++    int gm_bs_bytes = 4 << gm_bs;
--                case 2: /* VMLAL sclar */
+     void *tag_mem;
--                case 6: /* VMLSL scalar */
--                case 10: /* VMULL scalar */
+-    ptr = QEMU_ALIGN_DOWN(ptr, LDGM_STGM_SIZE);
--                    if (rd & 1) {
++    ptr = QEMU_ALIGN_DOWN(ptr, gm_bs_bytes);
--                        return 1;
--                    }
+     /* Trap if accessing an invalid page.  */
--                    tmp2 = neon_get_scalar(size, rm);
+     tag_mem = allocation_tag_mem(env, mmu_idx, ptr, MMU_DATA_STORE,
--                    /* We need a copy of tmp2 because gen_neon_mull
+-                                 LDGM_STGM_SIZE, MMU_DATA_LOAD,
--                     * deletes it during pass 0.  */
+-                                 LDGM_STGM_SIZE / (2 * TAG_GRANULE), ra);
--                    tmp4 = tcg_temp_new_i32();
++                                 gm_bs_bytes, MMU_DATA_LOAD,
--                    tcg_gen_mov_i32(tmp4, tmp2);
++                                 gm_bs_bytes / (2 * TAG_GRANULE), ra);
--                    tmp3 = neon_load_reg(rn, 1);
--
+     /*
--                    for (pass = 0; pass < 2; pass++) {
+      * Tag store only happens if the page support tags,
--                        if (pass == 0) {
+@@ -XXX,XX +XXX,XX @@ void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
--                            tmp = neon_load_reg(rn, 0);
+         return;
--                        } else {
+     }
--                            tmp = tmp3;
--                            tmp2 = tmp4;
+-    QEMU_BUILD_BUG_ON(GMID_EL1_BS != 6);
--                        }
+     /*
--                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
+-     * We are storing 64-bits worth of tags.  The ordering of elements
--                        if (op != 11) {
+-     * within the word corresponds to a 64-bit little-endian operation.
--                            neon_load_reg64(cpu_V1, rd + pass);
++     * The ordering of elements within the word corresponds to
--                        }
++     * a little-endian operation.
--                        switch (op) {
+      */
--                        case 6:
+-    stq_le_p(tag_mem, val);
--                            gen_neon_negl(cpu_V0, size);
++    switch (gm_bs) {
--                            /* Fall through */
++    case 6:
--                        case 2:
++        stq_le_p(tag_mem, val);
--                            gen_neon_addl(size);
++        break;
--                            break;
++    default:
--                        case 3: case 7:
++        /* cpu configured with unsupported gm blocksize. */
--                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
++        g_assert_not_reached();
--                            if (op == 7) {
++    }
--                                gen_neon_negl(cpu_V0, size);
+ }
--                            }
--                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
+ void HELPER(stzgm_tags)(CPUARMState *env, uint64_t ptr, uint64_t val)
--                            break;
+diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
--                        case 10:
+index XXXXXXX..XXXXXXX 100644
--                            /* no-op */
+--- a/target/arm/tcg/translate-a64.c
--                            break;
++++ b/target/arm/tcg/translate-a64.c
--                        case 11:
+@@ -XXX,XX +XXX,XX @@ static bool trans_STGM(DisasContext *s, arg_ldst_tag *a)
--                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
+         gen_helper_stgm(cpu_env, addr, tcg_rt);
--                            break;
+     } else {
--                        default:
+         MMUAccessType acc = MMU_DATA_STORE;
--                            abort();
+-        int size = 4 << GMID_EL1_BS;
--                        }
++        int size = 4 << s->gm_blocksize;
--                        neon_store_reg64(cpu_V0, rd + pass);
--                    }
+         clean_addr = clean_data_tbi(s, addr);
--                    break;
+         tcg_gen_andi_i64(clean_addr, clean_addr, -size);
--                default:
+@@ -XXX,XX +XXX,XX @@ static bool trans_LDGM(DisasContext *s, arg_ldst_tag *a)
--                    g_assert_not_reached();
+         gen_helper_ldgm(tcg_rt, cpu_env, addr);
--                }
+     } else {
--            }
+         MMUAccessType acc = MMU_DATA_LOAD;
-+            /*
+-        int size = 4 << GMID_EL1_BS;
-+             * Three registers of different lengths, or two registers and
++        int size = 4 << s->gm_blocksize;
-+             * a scalar: handled by decodetree
-+             */
+         clean_addr = clean_data_tbi(s, addr);
-+            return 1;
+         tcg_gen_andi_i64(clean_addr, clean_addr, -size);
-         } else { /* size == 3 */
+@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
-             if (!u) {
+     dc->cp_regs = arm_cpu->cp_regs;
-                 /* Extract.  */
+     dc->features = env->features;
      dc->dcz_blocksize = arm_cpu->dcz_blocksize;
 +    dc->gm_blocksize = arm_cpu->gm_blocksize;
  #ifdef CONFIG_USER_ONLY
      /* In sve_probe_page, we assume TBI is enabled. */
 --
-.20.1
+.34.1

-[PULL 15/23] target/arm: Convert Neon VEXT to decodetree
+[PULL 03/24] target/arm: Support more GM blocksizes
-Convert the Neon VEXT insn to decodetree. Rather than keeping the
+From: Richard Henderson <richard.henderson@linaro.org>
 old implementation which used fixed temporaries cpu_V0 and cpu_V1
 and did the extraction with by-hand shift and logic ops, we use
 the TCG extract2 insn.
-We don't need to special case 0 or 8 immediates any more as the
+Support all of the easy GM block sizes.
-optimizer is smart enough to throw away the dead code.
+Use direct memory operations, since the pointers are aligned.
+While BS=2 (16 bytes, 1 tag) is a legal setting, that requires
+an atomic store of one nibble.  This is not difficult, but there
+is also no point in supporting it until required.
+Note that cortex-a710 sets GM blocksize to match its cacheline
+size of 64 bytes.  I expect many implementations will also
+match the cacheline, which makes 16 bytes very unlikely.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20230811214031.171020-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  8 +++-
+ target/arm/cpu.c            | 18 +++++++++---
- target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
+ target/arm/tcg/mte_helper.c | 56 +++++++++++++++++++++++++++++++------
- target/arm/translate.c          | 58 +------------------------
+files changed, 62 insertions(+), 12 deletions(-)
 files changed, 85 insertions(+), 57 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/cpu.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
- # return false for size==3.
+                                        ID_PFR1, VIRTUALIZATION, 0);
- ######################################################################
+     }
- {
--  # 0b11 subgroup will go here
++    if (cpu_isar_feature(aa64_mte, cpu)) {
-+  [
++        /*
-+    ##################################################################
++         * The architectural range of GM blocksize is 2-6, however qemu
-+    # Miscellaneous size=0b11 insns
++         * doesn't support blocksize of 2 (see HELPER(ldgm)).
-+    ##################################################################
++         */
-+    VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
++        if (tcg_enabled()) {
-+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
++            assert(cpu->gm_blocksize >= 3 && cpu->gm_blocksize <= 6);
 +  ]
    # Subgroup for size != 0b11
    [
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
      return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
  }
 +
 +static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
 +{
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if ((a->vn | a->vm | a->vd) & a->q) {
 +        return false;
 +    }
 +
 +    if (a->imm > 7 && !a->q) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (!a->q) {
 +        /* Extract 64 bits from <Vm:Vn> */
 +        TCGv_i64 left, right, dest;
 +
 +        left = tcg_temp_new_i64();
 +        right = tcg_temp_new_i64();
 +        dest = tcg_temp_new_i64();
 +
 +        neon_load_reg64(right, a->vn);
 +        neon_load_reg64(left, a->vm);
 +        tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
 +        neon_store_reg64(dest, a->vd);
 +
 +        tcg_temp_free_i64(left);
 +        tcg_temp_free_i64(right);
 +        tcg_temp_free_i64(dest);
 +    } else {
 +        /* Extract 128 bits from <Vm+1:Vm:Vn+1:Vn> */
 +        TCGv_i64 left, middle, right, destleft, destright;
 +
 +        left = tcg_temp_new_i64();
 +        middle = tcg_temp_new_i64();
 +        right = tcg_temp_new_i64();
 +        destleft = tcg_temp_new_i64();
 +        destright = tcg_temp_new_i64();
 +
 +        if (a->imm < 8) {
 +            neon_load_reg64(right, a->vn);
 +            neon_load_reg64(middle, a->vn + 1);
 +            tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
 +            neon_load_reg64(left, a->vm);
 +            tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
 +        } else {
 +            neon_load_reg64(right, a->vn + 1);
 +            neon_load_reg64(middle, a->vm);
 +            tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
 +            neon_load_reg64(left, a->vm + 1);
 +            tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
 +        }
 +
-+        neon_store_reg64(destright, a->vd);
+ #ifndef CONFIG_USER_ONLY
-+        neon_store_reg64(destleft, a->vd + 1);
+-    if (cpu->tag_memory == NULL && cpu_isar_feature(aa64_mte, cpu)) {
-+
+         /*
-+        tcg_temp_free_i64(destright);
+          * Disable the MTE feature bits if we do not have tag-memory
-+        tcg_temp_free_i64(destleft);
+          * provided by the machine.
-+        tcg_temp_free_i64(right);
+          */
-+        tcg_temp_free_i64(middle);
+-        cpu->isar.id_aa64pfr1 =
-+        tcg_temp_free_i64(left);
+-            FIELD_DP64(cpu->isar.id_aa64pfr1, ID_AA64PFR1, MTE, 0);
 -    }
 +        if (cpu->tag_memory == NULL) {
 +            cpu->isar.id_aa64pfr1 =
 +                FIELD_DP64(cpu->isar.id_aa64pfr1, ID_AA64PFR1, MTE, 0);
 +        }
  #endif
 +    }
-+    return true;
-+}
+     if (tcg_enabled()) {
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+         /*
 diff --git a/target/arm/tcg/mte_helper.c b/target/arm/tcg/mte_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/mte_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/mte_helper.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(ldgm)(CPUARMState *env, uint64_t ptr)
-     int pass;
+     int gm_bs = env_archcpu(env)->gm_blocksize;
-     int u;
+     int gm_bs_bytes = 4 << gm_bs;
-     int vec_size;
+     void *tag_mem;
--    uint32_t imm;
++    uint64_t ret;
-     TCGv_i32 tmp, tmp2, tmp3, tmp5;
++    int shift;
-     TCGv_ptr ptr1;
--    TCGv_i64 tmp64;
+     ptr = QEMU_ALIGN_DOWN(ptr, gm_bs_bytes);
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(ldgm)(CPUARMState *env, uint64_t ptr)
-         return 1;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+     /*
-             return 1;
+      * The ordering of elements within the word corresponds to
-         } else { /* size == 3 */
+-     * a little-endian operation.
-             if (!u) {
++     * a little-endian operation.  Computation of shift comes from
--                /* Extract.  */
++     *
--                imm = (insn >> 8) & 0xf;
++     *     index = address<LOG2_TAG_GRANULE+3:LOG2_TAG_GRANULE>
--
++     *     data<index*4+3:index*4> = tag
--                if (imm > 7 && !q)
++     *
--                    return 1;
++     * Because of the alignment of ptr above, BS=6 has shift=0.
--
++     * All memory operations are aligned.  Defer support for BS=2,
--                if (q && ((rd | rn | rm) & 1)) {
++     * requiring insertion or extraction of a nibble, until we
--                    return 1;
++     * support a cpu that requires it.
--                }
+      */
--
+     switch (gm_bs) {
--                if (imm == 0) {
++    case 3:
--                    neon_load_reg64(cpu_V0, rn);
++        /* 32 bytes -> 2 tags -> 8 result bits */
--                    if (q) {
++        ret = *(uint8_t *)tag_mem;
--                        neon_load_reg64(cpu_V1, rn + 1);
++        break;
--                    }
++    case 4:
--                } else if (imm == 8) {
++        /* 64 bytes -> 4 tags -> 16 result bits */
--                    neon_load_reg64(cpu_V0, rn + 1);
++        ret = cpu_to_le16(*(uint16_t *)tag_mem);
--                    if (q) {
++        break;
--                        neon_load_reg64(cpu_V1, rm);
++    case 5:
--                    }
++        /* 128 bytes -> 8 tags -> 32 result bits */
--                } else if (q) {
++        ret = cpu_to_le32(*(uint32_t *)tag_mem);
--                    tmp64 = tcg_temp_new_i64();
++        break;
--                    if (imm < 8) {
+     case 6:
--                        neon_load_reg64(cpu_V0, rn);
+         /* 256 bytes -> 16 tags -> 64 result bits */
--                        neon_load_reg64(tmp64, rn + 1);
+-        return ldq_le_p(tag_mem);
--                    } else {
++        return cpu_to_le64(*(uint64_t *)tag_mem);
--                        neon_load_reg64(cpu_V0, rn + 1);
+     default:
--                        neon_load_reg64(tmp64, rm);
+-        /* cpu configured with unsupported gm blocksize. */
--                    }
++        /*
--                    tcg_gen_shri_i64(cpu_V0, cpu_V0, (imm & 7) * 8);
++         * CPU configured with unsupported/invalid gm blocksize.
--                    tcg_gen_shli_i64(cpu_V1, tmp64, 64 - ((imm & 7) * 8));
++         * This is detected early in arm_cpu_realizefn.
--                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
++         */
--                    if (imm < 8) {
+         g_assert_not_reached();
--                        neon_load_reg64(cpu_V1, rm);
+     }
--                    } else {
++    shift = extract64(ptr, LOG2_TAG_GRANULE, 4) * 4;
--                        neon_load_reg64(cpu_V1, rm + 1);
++    return ret << shift;
--                        imm -= 8;
+ }
--                    }
--                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
+ void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
--                    tcg_gen_shri_i64(tmp64, tmp64, imm * 8);
+@@ -XXX,XX +XXX,XX @@ void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
--                    tcg_gen_or_i64(cpu_V1, cpu_V1, tmp64);
+     int gm_bs = env_archcpu(env)->gm_blocksize;
--                    tcg_temp_free_i64(tmp64);
+     int gm_bs_bytes = 4 << gm_bs;
--                } else {
+     void *tag_mem;
--                    /* BUGFIX */
++    int shift;
--                    neon_load_reg64(cpu_V0, rn);
--                    tcg_gen_shri_i64(cpu_V0, cpu_V0, imm * 8);
+     ptr = QEMU_ALIGN_DOWN(ptr, gm_bs_bytes);
--                    neon_load_reg64(cpu_V1, rm);
--                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
+@@ -XXX,XX +XXX,XX @@ void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
--                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
+         return;
--                }
+     }
--                neon_store_reg64(cpu_V0, rd);
--                if (q) {
+-    /*
--                    neon_store_reg64(cpu_V1, rd + 1);
+-     * The ordering of elements within the word corresponds to
--                }
+-     * a little-endian operation.
-+                /* Extract: handled by decodetree */
+-     */
-+                return 1;
++    /* See LDGM for comments on BS and on shift.  */
-             } else if ((insn & (1 << 11)) == 0) {
++    shift = extract64(ptr, LOG2_TAG_GRANULE, 4) * 4;
-                 /* Two register misc.  */
++    val >>= shift;
-                 op = ((insn >> 12) & 0x30) | ((insn >> 7) & 0xf);
+     switch (gm_bs) {
 +    case 3:
 +        /* 32 bytes -> 2 tags -> 8 result bits */
 +        *(uint8_t *)tag_mem = val;
 +        break;
 +    case 4:
 +        /* 64 bytes -> 4 tags -> 16 result bits */
 +        *(uint16_t *)tag_mem = cpu_to_le16(val);
 +        break;
 +    case 5:
 +        /* 128 bytes -> 8 tags -> 32 result bits */
 +        *(uint32_t *)tag_mem = cpu_to_le32(val);
 +        break;
      case 6:
 -        stq_le_p(tag_mem, val);
 +        /* 256 bytes -> 16 tags -> 64 result bits */
 +        *(uint64_t *)tag_mem = cpu_to_le64(val);
          break;
      default:
          /* cpu configured with unsupported gm blocksize. */
 --
-.20.1
+.34.1

-[PULL 22/23] sd: sdhci: Implement basic vendor specific register support
+[PULL 04/24] target/arm: When tag memory is not present, set MTE=1
-From: Guenter Roeck <linux@roeck-us.net>
+From: Richard Henderson <richard.henderson@linaro.org>
-The Linux kernel's IMX code now uses vendor specific commands.
+When the cpu support MTE, but the system does not, reduce cpu
-This results in endless warnings when booting the Linux kernel.
+support to user instructions at EL0 instead of completely
 disabling MTE.  If we encounter a cpu implementation which does
 something else, we can revisit this setting.
-sdhci-esdhc-imx 2194000.usdhc: esdhc_wait_for_card_clock_gate_off:
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-    card clock still not gate off in 100us!.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20230811214031.171020-5-richard.henderson@linaro.org
 Implement support for the vendor specific command implemented in IMX hardware
 to be able to avoid this warning.
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Guenter Roeck <linux@roeck-us.net>
 Message-id: 20200603145258.195920-2-linux@roeck-us.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/sd/sdhci-internal.h |  5 +++++
+ target/arm/cpu.c | 7 ++++---
- include/hw/sd/sdhci.h  |  5 +++++
+file changed, 4 insertions(+), 3 deletions(-)
  hw/sd/sdhci.c          | 18 +++++++++++++++++-
 files changed, 27 insertions(+), 1 deletion(-)
-diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/sd/sdhci-internal.h
+--- a/target/arm/cpu.c
-+++ b/hw/sd/sdhci-internal.h
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
- #define SDHC_CMD_INHIBIT               0x00000001
- #define SDHC_DATA_INHIBIT              0x00000002
+ #ifndef CONFIG_USER_ONLY
- #define SDHC_DAT_LINE_ACTIVE           0x00000004
+         /*
-+#define SDHC_IMX_CLOCK_GATE_OFF        0x00000080
+-         * Disable the MTE feature bits if we do not have tag-memory
- #define SDHC_DOING_WRITE               0x00000100
+-         * provided by the machine.
- #define SDHC_DOING_READ                0x00000200
++         * If we do not have tag-memory provided by the machine,
- #define SDHC_SPACE_AVAILABLE           0x00000400
++         * reduce MTE support to instructions enabled at EL0.
-@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
++         * This matches Cortex-A710 BROADCASTMTE input being LOW.
+          */
+         if (cpu->tag_memory == NULL) {
- #define ESDHC_MIX_CTRL                  0x48
+             cpu->isar.id_aa64pfr1 =
-+
+-                FIELD_DP64(cpu->isar.id_aa64pfr1, ID_AA64PFR1, MTE, 0);
- #define ESDHC_VENDOR_SPEC               0xc0
++                FIELD_DP64(cpu->isar.id_aa64pfr1, ID_AA64PFR1, MTE, 1);
 +#define ESDHC_IMX_FRC_SDCLK_ON          (1 << 8)
 +
  #define ESDHC_DLL_CTRL                  0x60
  #define ESDHC_TUNING_CTRL               0xcc
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
  #define DEFINE_SDHCI_COMMON_PROPERTIES(_state) \
      DEFINE_PROP_UINT8("sd-spec-version", _state, sd_spec_version, 2), \
      DEFINE_PROP_UINT8("uhs", _state, uhs_mode, UHS_NOT_SUPPORTED), \
 +    DEFINE_PROP_UINT8("vendor", _state, vendor, SDHCI_VENDOR_NONE), \
      \
      /* Capabilities registers provide information on supported
       * features of this specific host controller implementation */ \
 diff --git a/include/hw/sd/sdhci.h b/include/hw/sd/sdhci.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/sd/sdhci.h
 +++ b/include/hw/sd/sdhci.h
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
      uint16_t acmd12errsts; /* Auto CMD12 error status register */
      uint16_t hostctl2;     /* Host Control 2 */
      uint64_t admasysaddr;  /* ADMA System Address Register */
 +    uint16_t vendor_spec;  /* Vendor specific register */
      /* Read-only registers */
      uint64_t capareg;      /* Capabilities Register */
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
      uint32_t quirks;
      uint8_t sd_spec_version;
      uint8_t uhs_mode;
 +    uint8_t vendor;        /* For vendor specific functionality */
  } SDHCIState;
 +#define SDHCI_VENDOR_NONE       0
 +#define SDHCI_VENDOR_IMX        1
 +
  /*
   * Controller does not provide transfer-complete interrupt when not
   * busy.
 diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/sd/sdhci.c
 +++ b/hw/sd/sdhci.c
@@ -XXX,XX +XXX,XX @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
          }
-         break;
+ #endif
+     }
 +    case ESDHC_VENDOR_SPEC:
 +        ret = s->vendor_spec;
 +        break;
      case ESDHC_DLL_CTRL:
      case ESDHC_TUNE_CTRL_STATUS:
      case ESDHC_UNDOCUMENTED_REG27:
      case ESDHC_TUNING_CTRL:
 -    case ESDHC_VENDOR_SPEC:
      case ESDHC_MIX_CTRL:
      case ESDHC_WTMK_LVL:
          ret = 0;
@@ -XXX,XX +XXX,XX @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
      case ESDHC_UNDOCUMENTED_REG27:
      case ESDHC_TUNING_CTRL:
      case ESDHC_WTMK_LVL:
 +        break;
 +
      case ESDHC_VENDOR_SPEC:
 +        s->vendor_spec = value;
 +        switch (s->vendor) {
 +        case SDHCI_VENDOR_IMX:
 +            if (value & ESDHC_IMX_FRC_SDCLK_ON) {
 +                s->prnsts &= ~SDHC_IMX_CLOCK_GATE_OFF;
 +            } else {
 +                s->prnsts |= SDHC_IMX_CLOCK_GATE_OFF;
 +            }
 +            break;
 +        default:
 +            break;
 +        }
          break;
      case SDHC_HOSTCTL:
 --
-.20.1
+.34.1

-[PULL 02/23] target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
+[PULL 05/24] target/arm: Introduce make_ccsidr64
-Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
+From: Richard Henderson <richard.henderson@linaro.org>
 in the Neon 3-registers-different-lengths group to decodetree.
 These insns work by widening one or both inputs to double their
 size, performing an add or subtract at the doubled size and
 then storing the double-size result.
-As usual, rather than copying the loop of the original decoder
+Do not hard-code the constants for Neoverse V1.
 (which needs awkward code to avoid problems when source and
 destination registers overlap) we just unroll the two passes.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20230811214031.171020-6-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  43 +++++++++++++
+ target/arm/tcg/cpu64.c | 48 ++++++++++++++++++++++++++++--------------
- target/arm/translate-neon.inc.c | 104 ++++++++++++++++++++++++++++++++
+file changed, 32 insertions(+), 16 deletions(-)
  target/arm/translate.c          |  16 ++---
 files changed, 151 insertions(+), 12 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/tcg/cpu64.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/tcg/cpu64.c
-@@ -XXX,XX +XXX,XX @@ VCVT_FU_2sh      1111 001 1 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
+@@ -XXX,XX +XXX,XX @@
- # So we have a single decode line and check the cmode/op in the
+ #include "qemu/module.h"
- # trans function.
+ #include "qapi/visitor.h"
- Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+ #include "hw/qdev-properties.h"
-+
++#include "qemu/units.h"
-+######################################################################
+ #include "internals.h"
-+# Within the "two registers, or three registers of different lengths"
+ #include "cpregs.h"
-+# grouping ([23,4]=0b10), bits [21:20] are either part of the opcode
-+# decode: 0b11 for VEXT, two-reg-misc, VTBL, and duplicate-scalar;
++static uint64_t make_ccsidr64(unsigned assoc, unsigned linesize,
-+# or they are a size field for the three-reg-different-lengths and
++                              unsigned cachesize)
 +# two-reg-and-scalar insn groups (where size cannot be 0b11). This
 +# is slightly awkward for decodetree: we handle it with this
 +# non-exclusive group which contains within it two exclusive groups:
 +# one for the size=0b11 patterns, and one for the size-not-0b11
 +# patterns. This allows us to check that none of the insns within
 +# each subgroup accidentally overlap each other. Note that all the
 +# trans functions for the size-not-0b11 patterns must check and
 +# return false for size==3.
 +######################################################################
 +{
-+  # 0b11 subgroup will go here
++    unsigned lg_linesize = ctz32(linesize);
-+
++    unsigned sets;
 +  # Subgroup for size != 0b11
 +  [
 +    ##################################################################
 +    # 3-reg-different-length grouping:
 +    # 1111 001 U 1 D sz!=11 Vn:4 Vd:4 opc:4 N 0 M 0 Vm:4
 +    ##################################################################
 +
 +    &3diff vm vn vd size
 +
 +    @3diff       .... ... . . . size:2 .... .... .... . . . . .... \
 +                 &3diff vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +    VADDL_S_3d   1111 001 0 1 . .. .... .... 0000 . 0 . 0 .... @3diff
 +    VADDL_U_3d   1111 001 1 1 . .. .... .... 0000 . 0 . 0 .... @3diff
 +
 +    VADDW_S_3d   1111 001 0 1 . .. .... .... 0001 . 0 . 0 .... @3diff
 +    VADDW_U_3d   1111 001 1 1 . .. .... .... 0001 . 0 . 0 .... @3diff
 +
 +    VSUBL_S_3d   1111 001 0 1 . .. .... .... 0010 . 0 . 0 .... @3diff
 +    VSUBL_U_3d   1111 001 1 1 . .. .... .... 0010 . 0 . 0 .... @3diff
 +
 +    VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
 +    VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
 +  ]
 +}
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
      }
      return do_1reg_imm(s, a, fn);
  }
 +
 +static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
 +                           NeonGenWidenFn *widenfn,
 +                           NeonGenTwo64OpFn *opfn,
 +                           bool src1_wide)
 +{
 +    /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
 +    TCGv_i64 rn0_64, rn1_64, rm_64;
 +    TCGv_i32 rm;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!widenfn || !opfn) {
 +        /* size == 3 case, which is an entirely different insn group */
 +        return false;
 +    }
 +
 +    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    rn0_64 = tcg_temp_new_i64();
 +    rn1_64 = tcg_temp_new_i64();
 +    rm_64 = tcg_temp_new_i64();
 +
 +    if (src1_wide) {
 +        neon_load_reg64(rn0_64, a->vn);
 +    } else {
 +        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
 +        widenfn(rn0_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
 +    rm = neon_load_reg(a->vm, 0);
 +
 +    widenfn(rm_64, rm);
 +    tcg_temp_free_i32(rm);
 +    opfn(rn0_64, rn0_64, rm_64);
 +
 +    /*
-+     * Load second pass inputs before storing the first pass result, to
++     * The 64-bit CCSIDR_EL1 format is:
-+     * avoid incorrect results if a narrow input overlaps with the result.
++     *   [55:32] number of sets - 1
 +     *   [23:3]  associativity - 1
 +     *   [2:0]   log2(linesize) - 4
 +     *           so 0 == 16 bytes, 1 == 32 bytes, 2 == 64 bytes, etc
 +     */
-+    if (src1_wide) {
++    assert(assoc != 0);
-+        neon_load_reg64(rn1_64, a->vn + 1);
++    assert(is_power_of_2(linesize));
-+    } else {
++    assert(lg_linesize >= 4 && lg_linesize <= 7 + 4);
 +        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
 +        widenfn(rn1_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
 +    rm = neon_load_reg(a->vm, 1);
 +
-+    neon_store_reg64(rn0_64, a->vd);
++    /* sets * associativity * linesize == cachesize. */
 +    sets = cachesize / (assoc * linesize);
 +    assert(cachesize % (assoc * linesize) == 0);
 +
-+    widenfn(rm_64, rm);
++    return ((uint64_t)(sets - 1) << 32)
-+    tcg_temp_free_i32(rm);
++         | ((assoc - 1) << 3)
-+    opfn(rn1_64, rn1_64, rm_64);
++         | (lg_linesize - 4);
 +    neon_store_reg64(rn1_64, a->vd + 1);
 +
 +    tcg_temp_free_i64(rn0_64);
 +    tcg_temp_free_i64(rn1_64);
 +    tcg_temp_free_i64(rm_64);
 +
 +    return true;
 +}
 +
-+#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+ static void aarch64_a35_initfn(Object *obj)
-+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+ {
-+    {                                                                   \
+     ARMCPU *cpu = ARM_CPU(obj);
-+        static NeonGenWidenFn * const widenfn[] = {                     \
+@@ -XXX,XX +XXX,XX @@ static void aarch64_neoverse_v1_initfn(Object *obj)
-+            gen_helper_neon_widen_##S##8,                               \
+      * The Neoverse-V1 r1p2 TRM lists 32-bit format CCSIDR_EL1 values,
-+            gen_helper_neon_widen_##S##16,                              \
+      * but also says it implements CCIDX, which means they should be
-+            tcg_gen_##EXT##_i32_i64,                                    \
+      * 64-bit format. So we here use values which are based on the textual
-+            NULL,                                                       \
+-     * information in chapter 2 of the TRM (and on the fact that
-+        };                                                              \
+-     * sets * associativity * linesize == cachesize).
-+        static NeonGenTwo64OpFn * const addfn[] = {                     \
+-     *
-+            gen_helper_neon_##OP##l_u16,                                \
+-     * The 64-bit CCSIDR_EL1 format is:
-+            gen_helper_neon_##OP##l_u32,                                \
+-     *   [55:32] number of sets - 1
-+            tcg_gen_##OP##_i64,                                         \
+-     *   [23:3]  associativity - 1
-+            NULL,                                                       \
+-     *   [2:0]   log2(linesize) - 4
-+        };                                                              \
+-     *           so 0 == 16 bytes, 1 == 32 bytes, 2 == 64 bytes, etc
-+        return do_prewiden_3d(s, a, widenfn[a->size],                   \
+-     *
-+                              addfn[a->size], SRC1WIDE);                \
+-     * L1: 4-way set associative 64-byte line size, total size 64K,
-+    }
+-     * so sets is 256.
-+
++     * information in chapter 2 of the TRM:
-+DO_PREWIDEN(VADDL_S, s, ext, add, false)
+      *
-+DO_PREWIDEN(VADDL_U, u, extu, add, false)
++     * L1: 4-way set associative 64-byte line size, total size 64K.
-+DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
+      * L2: 8-way set associative, 64 byte line size, either 512K or 1MB.
-+DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
+-     * We pick 1MB, so this has 2048 sets.
-+DO_PREWIDEN(VADDW_S, s, ext, add, true)
+-     *
-+DO_PREWIDEN(VADDW_U, u, extu, add, true)
+      * L3: No L3 (this matches the CLIDR_EL1 value).
-+DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
+      */
-+DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+-    cpu->ccsidr[0] = 0x000000ff0000001aull; /* 64KB L1 dcache */
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+-    cpu->ccsidr[1] = 0x000000ff0000001aull; /* 64KB L1 icache */
-index XXXXXXX..XXXXXXX 100644
+-    cpu->ccsidr[2] = 0x000007ff0000003aull; /* 1MB L2 cache */
---- a/target/arm/translate.c
++    cpu->ccsidr[0] = make_ccsidr64(4, 64, 64 * KiB); /* L1 dcache */
-+++ b/target/arm/translate.c
++    cpu->ccsidr[1] = cpu->ccsidr[0];                 /* L1 icache */
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++    cpu->ccsidr[2] = make_ccsidr64(8, 64, 1 * MiB);  /* L2 cache */
-                 /* Three registers of different lengths.  */
-                 int src1_wide;
+     /* From 3.2.115 SCTLR_EL3 */
-                 int src2_wide;
+     cpu->reset_sctlr = 0x30c50838;
 -                int prewiden;
                  /* undefreq: bit 0 : UNDEF if size == 0
                   *           bit 1 : UNDEF if size == 1
                   *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  int undefreq;
                  /* prewiden, src1_wide, src2_wide, undefreq */
                  static const int neon_3reg_wide[16][4] = {
 -                    {1, 0, 0, 0}, /* VADDL */
 -                    {1, 1, 0, 0}, /* VADDW */
 -                    {1, 0, 0, 0}, /* VSUBL */
 -                    {1, 1, 0, 0}, /* VSUBW */
 +                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
 +                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
 +                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
 +                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                      {0, 1, 1, 0}, /* VADDHN */
                      {0, 0, 0, 0}, /* VABAL */
                      {0, 1, 1, 0}, /* VSUBHN */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* Reserved: always UNDEF */
                  };
 -                prewiden = neon_3reg_wide[op][0];
                  src1_wide = neon_3reg_wide[op][1];
                  src2_wide = neon_3reg_wide[op][2];
                  undefreq = neon_3reg_wide[op][3];
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          } else {
                              tmp = neon_load_reg(rn, pass);
                          }
 -                        if (prewiden) {
 -                            gen_neon_widen(cpu_V0, tmp, size, u);
 -                        }
                      }
                      if (src2_wide) {
                          neon_load_reg64(cpu_V1, rm + pass);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          } else {
                              tmp2 = neon_load_reg(rm, pass);
                          }
 -                        if (prewiden) {
 -                            gen_neon_widen(cpu_V1, tmp2, size, u);
 -                        }
                      }
                      switch (op) {
                      case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
 --
-.20.1
+.34.1

-[PULL 13/23] target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
+[PULL 06/24] target/arm: Apply access checks to neoverse-n1 special registers
-Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
+From: Richard Henderson <richard.henderson@linaro.org>
 group to decodetree.
+Access to many of the special registers is enabled or disabled
+by ACTLR_EL[23], which we implement as constant 0, which means
+that all writes outside EL3 should trap.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20230811214031.171020-7-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  3 ++
+ target/arm/cpregs.h    |  2 ++
- target/arm/translate-neon.inc.c | 74 +++++++++++++++++++++++++++++++++
+ target/arm/helper.c    |  4 ++--
- target/arm/translate.c          | 38 +----------------
+ target/arm/tcg/cpu64.c | 46 +++++++++++++++++++++++++++++++++---------
-files changed, 79 insertions(+), 36 deletions(-)
+files changed, 41 insertions(+), 11 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/cpregs.h
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/cpregs.h
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static inline void define_cortex_a72_a57_a53_cp_reginfo(ARMCPU *cpu) { }
+ void define_cortex_a72_a57_a53_cp_reginfo(ARMCPU *cpu);
-     VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
+ #endif
-     VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
 +CPAccessResult access_tvm_trvm(CPUARMState *, const ARMCPRegInfo *, bool);
 +
-+    VQRDMLAH_2sc 1111 001 . 1 . .. .... .... 1110 . 1 . 0 .... @2scalar
+ #endif /* TARGET_ARM_CPREGS_H */
-+    VQRDMLSH_2sc 1111 001 . 1 . .. .... .... 1111 . 1 . 0 .... @2scalar
+diff --git a/target/arm/helper.c b/target/arm/helper.c
-   ]
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tpm(CPUARMState *env, const ARMCPRegInfo *ri,
  }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
  /* Check for traps from EL1 due to HCR_EL2.TVM and HCR_EL2.TRVM.  */
 -static CPAccessResult access_tvm_trvm(CPUARMState *env, const ARMCPRegInfo *ri,
 -                                      bool isread)
 +CPAccessResult access_tvm_trvm(CPUARMState *env, const ARMCPRegInfo *ri,
 +                               bool isread)
  {
      if (arm_current_el(env) == 1) {
          uint64_t trap = isread ? HCR_TRVM : HCR_TVM;
 diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/tcg/cpu64.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/tcg/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
+@@ -XXX,XX +XXX,XX @@ static void aarch64_a64fx_initfn(Object *obj)
+     /* TODO:  Add A64FX specific HPC extension registers */
      return do_2scalar(s, a, opfn[a->size], NULL);
  }
++static CPAccessResult access_actlr_w(CPUARMState *env, const ARMCPRegInfo *r,
++                                     bool read)
++{
++    if (!read) {
++        int el = arm_current_el(env);
 +
-+static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
++        /* Because ACTLR_EL2 is constant 0, writes below EL2 trap to EL2. */
-+                            NeonGenThreeOpEnvFn *opfn)
++        if (el < 2 && arm_is_el2_enabled(env)) {
-+{
++            return CP_ACCESS_TRAP_EL2;
-+    /*
++        }
-+     * VQRDMLAH/VQRDMLSH: this is like do_2scalar, but the opfn
++        /* Because ACTLR_EL3 is constant 0, writes below EL3 trap to EL3. */
-+     * performs a kind of fused op-then-accumulate using a helper
++        if (el < 3 && arm_feature(env, ARM_FEATURE_EL3)) {
-+     * function that takes all of rd, rn and the scalar at once.
++            return CP_ACCESS_TRAP_EL3;
-+     */
++        }
 +    TCGv_i32 scalar;
 +    int pass;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
-+
++    return CP_ACCESS_OK;
 +    if (!dc_isar_feature(aa32_rdm, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!opfn) {
 +        /* Bad size (including size == 3, which is a different insn group) */
 +        return false;
 +    }
 +
 +    if (a->q && ((a->vd | a->vn) & 1)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    scalar = neon_get_scalar(a->size, a->vm);
 +
 +    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 +        TCGv_i32 rn = neon_load_reg(a->vn, pass);
 +        TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +        opfn(rd, cpu_env, rn, scalar, rd);
 +        tcg_temp_free_i32(rn);
 +        neon_store_reg(a->vd, pass, rd);
 +    }
 +    tcg_temp_free_i32(scalar);
 +
 +    return true;
 +}
 +
-+static bool trans_VQRDMLAH_2sc(DisasContext *s, arg_2scalar *a)
+ static const ARMCPRegInfo neoverse_n1_cp_reginfo[] = {
-+{
+     { .name = "ATCR_EL1", .state = ARM_CP_STATE_AA64,
-+    static NeonGenThreeOpEnvFn *opfn[] = {
+       .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 7, .opc2 = 0,
-+        NULL,
+-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-+        gen_helper_neon_qrdmlah_s16,
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0,
-+        gen_helper_neon_qrdmlah_s32,
++      /* Traps and enables are the same as for TCR_EL1. */
-+        NULL,
++      .accessfn = access_tvm_trvm, .fgt = FGT_TCR_EL1, },
-+    };
+     { .name = "ATCR_EL2", .state = ARM_CP_STATE_AA64,
-+    return do_vqrdmlah_2sc(s, a, opfn[a->size]);
+       .opc0 = 3, .opc1 = 4, .crn = 15, .crm = 7, .opc2 = 0,
-+}
+       .access = PL2_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-+
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo neoverse_n1_cp_reginfo[] = {
-+static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
+       .access = PL2_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-+{
+     { .name = "CPUACTLR_EL1", .state = ARM_CP_STATE_AA64,
-+    static NeonGenThreeOpEnvFn *opfn[] = {
+       .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 1, .opc2 = 0,
-+        NULL,
+-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-+        gen_helper_neon_qrdmlsh_s16,
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0,
-+        gen_helper_neon_qrdmlsh_s32,
++      .accessfn = access_actlr_w },
-+        NULL,
+     { .name = "CPUACTLR2_EL1", .state = ARM_CP_STATE_AA64,
-+    };
+       .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 1, .opc2 = 1,
-+    return do_vqrdmlah_2sc(s, a, opfn[a->size]);
+-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-+}
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0,
-diff --git a/target/arm/translate.c b/target/arm/translate.c
++      .accessfn = access_actlr_w },
-index XXXXXXX..XXXXXXX 100644
+     { .name = "CPUACTLR3_EL1", .state = ARM_CP_STATE_AA64,
---- a/target/arm/translate.c
+       .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 1, .opc2 = 2,
-+++ b/target/arm/translate.c
+-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0,
-                 case 9: /* Floating point VMUL scalar */
++      .accessfn = access_actlr_w },
-                 case 12: /* VQDMULH scalar */
+     /*
-                 case 13: /* VQRDMULH scalar */
+      * Report CPUCFR_EL1.SCU as 1, as we do not implement the DSU
-+                case 14: /* VQRDMLAH scalar */
+      * (and in particular its system registers).
-+                case 15: /* VQRDMLSH scalar */
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo neoverse_n1_cp_reginfo[] = {
-                     return 1; /* handled by decodetree */
+       .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 4 },
+     { .name = "CPUECTLR_EL1", .state = ARM_CP_STATE_AA64,
-                 case 3: /* VQDMLAL scalar */
+       .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 1, .opc2 = 4,
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0x961563010 },
-                         neon_store_reg64(cpu_V0, rd + pass);
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0x961563010,
-                     }
++      .accessfn = access_actlr_w },
-                     break;
+     { .name = "CPUPCR_EL3", .state = ARM_CP_STATE_AA64,
--                case 14: /* VQRDMLAH scalar */
+       .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 8, .opc2 = 1,
--                case 15: /* VQRDMLSH scalar */
+       .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
--                    {
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo neoverse_n1_cp_reginfo[] = {
--                        NeonGenThreeOpEnvFn *fn;
+       .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
--
+     { .name = "CPUPWRCTLR_EL1", .state = ARM_CP_STATE_AA64,
--                        if (!dc_isar_feature(aa32_rdm, s)) {
+       .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 7,
--                            return 1;
+-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
--                        }
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0,
--                        if (u && ((rd | rn) & 1)) {
++      .accessfn = access_actlr_w },
--                            return 1;
+     { .name = "ERXPFGCDN_EL1", .state = ARM_CP_STATE_AA64,
--                        }
+       .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 2,
--                        if (op == 14) {
+-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
--                            if (size == 1) {
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0,
--                                fn = gen_helper_neon_qrdmlah_s16;
++      .accessfn = access_actlr_w },
--                            } else {
+     { .name = "ERXPFGCTL_EL1", .state = ARM_CP_STATE_AA64,
--                                fn = gen_helper_neon_qrdmlah_s32;
+       .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 1,
--                            }
+-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
--                        } else {
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0,
--                            if (size == 1) {
++      .accessfn = access_actlr_w },
--                                fn = gen_helper_neon_qrdmlsh_s16;
+     { .name = "ERXPFGF_EL1", .state = ARM_CP_STATE_AA64,
--                            } else {
+       .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 0,
--                                fn = gen_helper_neon_qrdmlsh_s32;
+-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
--                            }
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0,
--                        }
++      .accessfn = access_actlr_w },
--
+ };
--                        tmp2 = neon_get_scalar(size, rm);
--                        for (pass = 0; pass < (u ? 4 : 2); pass++) {
+ static void define_neoverse_n1_cp_reginfo(ARMCPU *cpu)
 -                            tmp = neon_load_reg(rn, pass);
 -                            tmp3 = neon_load_reg(rd, pass);
 -                            fn(tmp, cpu_env, tmp, tmp2, tmp3);
 -                            tcg_temp_free_i32(tmp3);
 -                            neon_store_reg(rd, pass, tmp);
 -                        }
 -                        tcg_temp_free_i32(tmp2);
 -                    }
 -                    break;
                  default:
                      g_assert_not_reached();
                  }
 --
-.20.1
+.34.1

-New patch
+[PULL 07/24] target/arm: Apply access checks to neoverse-v1 special registers
+From: Richard Henderson <richard.henderson@linaro.org>
+There is only one additional EL1 register modeled, which
+also needs to use access_actlr_w.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20230811214031.171020-8-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/tcg/cpu64.c | 3 ++-
+file changed, 2 insertions(+), 1 deletion(-)
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/cpu64.c
++++ b/target/arm/tcg/cpu64.c
+@@ -XXX,XX +XXX,XX @@ static void define_neoverse_n1_cp_reginfo(ARMCPU *cpu)
+ static const ARMCPRegInfo neoverse_v1_cp_reginfo[] = {
+     { .name = "CPUECTLR2_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 1, .opc2 = 5,
+-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0,
++      .accessfn = access_actlr_w },
+     { .name = "CPUPPMCR_EL3", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 2, .opc2 = 0,
+       .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+--
+.34.1

-[PULL 20/23] target/arm/cpu: adjust virtual time for all KVM arm cpus
+[PULL 08/24] target/arm: Suppress FEAT_TRBE (Trace Buffer Extension)
-From: fangying <fangying1@huawei.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Virtual time adjustment was implemented for virt-5.0 machine type,
+Like FEAT_TRF (Self-hosted Trace Extension), suppress tracing
-but the cpu property was enabled only for host-passthrough and max
+external to the cpu, which is out of scope for QEMU.
 cpu model.  Let's add it for any KVM arm cpu which has the generic
 timer feature enabled.
-Signed-off-by: Ying Fang <fangying1@huawei.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Andrew Jones <drjones@redhat.com>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20200608121243.2076-1-fangying1@huawei.com
+Message-id: 20230811214031.171020-10-richard.henderson@linaro.org
 [PMM: minor commit message tweak, removed inaccurate
  suggested-by tag]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.c   |  6 ++++--
+ target/arm/cpu.c | 3 +++
- target/arm/cpu64.c |  1 -
+file changed, 3 insertions(+)
  target/arm/kvm.c   | 21 +++++++++++----------
 files changed, 15 insertions(+), 13 deletions(-)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-     if (arm_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER)) {
+         /* FEAT_SPE (Statistical Profiling Extension) */
-         qdev_property_add_static(DEVICE(cpu), &arm_cpu_gt_cntfrq_property);
+         cpu->isar.id_aa64dfr0 =
-     }
+             FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, PMSVER, 0);
-+
++        /* FEAT_TRBE (Trace Buffer Extension) */
-+    if (kvm_enabled()) {
++        cpu->isar.id_aa64dfr0 =
-+        kvm_arm_add_vcpu_properties(obj);
++            FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, TRACEBUFFER, 0);
-+    }
+         /* FEAT_TRF (Self-hosted Trace Extension) */
- }
+         cpu->isar.id_aa64dfr0 =
+             FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, TRACEFILT, 0);
  static void arm_cpu_finalizefn(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
      if (kvm_enabled()) {
          kvm_arm_set_cpu_features_from_host(cpu);
 -        kvm_arm_add_vcpu_properties(obj);
      } else {
          cortex_a15_initfn(obj);
@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
      if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
          aarch64_add_sve_properties(obj);
      }
 -    kvm_arm_add_vcpu_properties(obj);
      arm_cpu_post_init(obj);
  }
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
      if (kvm_enabled()) {
          kvm_arm_set_cpu_features_from_host(cpu);
 -        kvm_arm_add_vcpu_properties(obj);
      } else {
          uint64_t t;
          uint32_t u;
 diff --git a/target/arm/kvm.c b/target/arm/kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm.c
 +++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp)
  /* KVM VCPU properties should be prefixed with "kvm-". */
  void kvm_arm_add_vcpu_properties(Object *obj)
  {
 -    if (!kvm_enabled()) {
 -        return;
 -    }
 +    ARMCPU *cpu = ARM_CPU(obj);
 +    CPUARMState *env = &cpu->env;
 -    ARM_CPU(obj)->kvm_adjvtime = true;
 -    object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
 -                             kvm_no_adjvtime_set);
 -    object_property_set_description(obj, "kvm-no-adjvtime",
 -                                    "Set on to disable the adjustment of "
 -                                    "the virtual counter. VM stopped time "
 -                                    "will be counted.");
 +    if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
 +        cpu->kvm_adjvtime = true;
 +        object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
 +                                 kvm_no_adjvtime_set);
 +        object_property_set_description(obj, "kvm-no-adjvtime",
 +                                        "Set on to disable the adjustment of "
 +                                        "the virtual counter. VM stopped time "
 +                                        "will be counted.");
 +    }
  }
  bool kvm_arm_pmu_supported(CPUState *cpu)
 --
-.20.1
+.34.1

-[PULL 12/23] target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
+[PULL 09/24] target/arm: Implement FEAT_HPDS2 as a no-op
-Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
+From: Richard Henderson <richard.henderson@linaro.org>
 to decodetree.
+This feature allows the operating system to set TCR_ELx.HWU*
+to allow the implementation to use the PBHA bits from the
+block and page descriptors for for IMPLEMENTATION DEFINED
+purposes.  Since QEMU has no need to use these bits, we may
+simply ignore them.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20230811214031.171020-11-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  3 +++
+ docs/system/arm/emulation.rst | 1 +
- target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
+ target/arm/tcg/cpu32.c        | 2 +-
- target/arm/translate.c          | 42 ++-------------------------------
+ target/arm/tcg/cpu64.c        | 2 +-
-files changed, 34 insertions(+), 40 deletions(-)
+files changed, 3 insertions(+), 2 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/docs/system/arm/emulation.rst
-+++ b/target/arm/neon-dp.decode
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
+ - FEAT_HAFDBS (Hardware management of the access flag and dirty bit state)
-     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
+ - FEAT_HCX (Support for the HCRX_EL2 register)
-     VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
+ - FEAT_HPDS (Hierarchical permission disables)
-+
++- FEAT_HPDS2 (Translation table page-based hardware attributes)
-+    VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
+ - FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
-+    VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
+ - FEAT_IDST (ID space trap handling)
-   ]
+ - FEAT_IESB (Implicit error synchronization event)
- }
+diff --git a/target/arm/tcg/cpu32.c b/target/arm/tcg/cpu32.c
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/tcg/cpu32.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/tcg/cpu32.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
+@@ -XXX,XX +XXX,XX @@ void aa32_max_features(ARMCPU *cpu)
+     cpu->isar.id_mmfr3 = t;
-     return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
- }
+     t = cpu->isar.id_mmfr4;
-+
+-    t = FIELD_DP32(t, ID_MMFR4, HPDS, 1);         /* FEAT_AA32HPD */
-+WRAP_ENV_FN(gen_VQDMULH_16, gen_helper_neon_qdmulh_s16)
++    t = FIELD_DP32(t, ID_MMFR4, HPDS, 2);         /* FEAT_HPDS2 */
-+WRAP_ENV_FN(gen_VQDMULH_32, gen_helper_neon_qdmulh_s32)
+     t = FIELD_DP32(t, ID_MMFR4, AC2, 1);          /* ACTLR2, HACTLR2 */
-+WRAP_ENV_FN(gen_VQRDMULH_16, gen_helper_neon_qrdmulh_s16)
+     t = FIELD_DP32(t, ID_MMFR4, CNP, 1);          /* FEAT_TTCNP */
-+WRAP_ENV_FN(gen_VQRDMULH_32, gen_helper_neon_qrdmulh_s32)
+     t = FIELD_DP32(t, ID_MMFR4, XNX, 1);          /* FEAT_XNX */
-+
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
 +static bool trans_VQDMULH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULH_16,
 +        gen_VQDMULH_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_VQRDMULH_16,
 +        gen_VQRDMULH_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/cpu64.c
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
+@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
+     t = FIELD_DP64(t, ID_AA64MMFR1, HAFDBS, 2);   /* FEAT_HAFDBS */
- #define CPU_V001 cpu_V0, cpu_V0, cpu_V1
+     t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* FEAT_VMID16 */
+     t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);       /* FEAT_VHE */
--static TCGv_i32 neon_load_scratch(int scratch)
+-    t = FIELD_DP64(t, ID_AA64MMFR1, HPDS, 1);     /* FEAT_HPDS */
--{
++    t = FIELD_DP64(t, ID_AA64MMFR1, HPDS, 2);     /* FEAT_HPDS2 */
--    TCGv_i32 tmp = tcg_temp_new_i32();
+     t = FIELD_DP64(t, ID_AA64MMFR1, LO, 1);       /* FEAT_LOR */
--    tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
+     t = FIELD_DP64(t, ID_AA64MMFR1, PAN, 3);      /* FEAT_PAN3 */
--    return tmp;
+     t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1);      /* FEAT_XNX */
 -}
 -
 -static void neon_store_scratch(int scratch, TCGv_i32 var)
 -{
 -    tcg_gen_st_i32(var, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
 -    tcg_temp_free_i32(var);
 -}
 -
  static int gen_neon_unzip(int rd, int rm, int size, int q)
  {
      TCGv_ptr pd, pm;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  case 1: /* Float VMLA scalar */
                  case 5: /* Floating point VMLS scalar */
                  case 9: /* Floating point VMUL scalar */
 -                    return 1; /* handled by decodetree */
 -
                  case 12: /* VQDMULH scalar */
                  case 13: /* VQRDMULH scalar */
 -                    if (u && ((rd | rn) & 1)) {
 -                        return 1;
 -                    }
 -                    tmp = neon_get_scalar(size, rm);
 -                    neon_store_scratch(0, tmp);
 -                    for (pass = 0; pass < (u ? 4 : 2); pass++) {
 -                        tmp = neon_load_scratch(0);
 -                        tmp2 = neon_load_reg(rn, pass);
 -                        if (op == 12) {
 -                            if (size == 1) {
 -                                gen_helper_neon_qdmulh_s16(tmp, cpu_env, tmp, tmp2);
 -                            } else {
 -                                gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
 -                            }
 -                        } else {
 -                            if (size == 1) {
 -                                gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
 -                            } else {
 -                                gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
 -                            }
 -                        }
 -                        tcg_temp_free_i32(tmp2);
 -                        neon_store_reg(rd, pass, tmp);
 -                    }
 -                    break;
 +                    return 1; /* handled by decodetree */
 +
                  case 3: /* VQDMLAL scalar */
                  case 7: /* VQDMLSL scalar */
                  case 11: /* VQDMULL scalar */
 --
-.20.1
+.34.1

-[PULL 11/23] target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
+[PULL 10/24] target/arm: properly document FEAT_CRC32
-Convert the float versions of VMLA, VMLS and VMUL in the Neon
+From: Alex Bennée <alex.bennee@linaro.org>
 -reg-scalar group to decodetree.
+This is a mandatory feature for Armv8.1 architectures but we don't
+state the feature clearly in our emulation list. Also include
+FEAT_CRC32 comment in aarch64_max_tcg_initfn for ease of grepping.
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
+Message-id: 20230824075406.1515566-1-alex.bennee@linaro.org
+Cc: qemu-stable@nongnu.org
+Message-Id: <20230222110104.3996971-1-alex.bennee@linaro.org>
+[PMM: pluralize 'instructions' in docs]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
-As noted in the comment on the WRAP_FP_FN macro, we could have
+ docs/system/arm/emulation.rst | 1 +
-had a do_2scalar_fp() function, but for 3 insns it seemed
+ target/arm/tcg/cpu64.c        | 2 +-
-simpler to just do the wrapping to get hold of the fpstatus ptr.
+files changed, 2 insertions(+), 1 deletion(-)
 (These are the only fp insns in the group.)
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/neon-dp.decode       |  3 ++
  target/arm/translate-neon.inc.c | 65 +++++++++++++++++++++++++++++++++
  target/arm/translate.c          | 37 ++-----------------
 files changed, 71 insertions(+), 34 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/docs/system/arm/emulation.rst
-+++ b/target/arm/neon-dp.decode
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
-                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ - FEAT_BBM at level 2 (Translation table break-before-make levels)
+ - FEAT_BF16 (AArch64 BFloat16 instructions)
-     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
+ - FEAT_BTI (Branch Target Identification)
-+    VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
++- FEAT_CRC32 (CRC32 instructions)
+ - FEAT_CSV2 (Cache speculation variant 2)
-     VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
+ - FEAT_CSV2_1p1 (Cache speculation variant 2, version 1.1)
-+    VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
+ - FEAT_CSV2_1p2 (Cache speculation variant 2, version 1.2)
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
      VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
 +    VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
    ]
  }
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/tcg/cpu64.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/tcg/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
+@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
+     t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2);      /* FEAT_PMULL */
-     return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
+     t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);     /* FEAT_SHA1 */
- }
+     t = FIELD_DP64(t, ID_AA64ISAR0, SHA2, 2);     /* FEAT_SHA512 */
-+
+-    t = FIELD_DP64(t, ID_AA64ISAR0, CRC32, 1);
-+/*
++    t = FIELD_DP64(t, ID_AA64ISAR0, CRC32, 1);    /* FEAT_CRC32 */
-+ * Rather than have a float-specific version of do_2scalar just for
+     t = FIELD_DP64(t, ID_AA64ISAR0, ATOMIC, 2);   /* FEAT_LSE */
-+ * three insns, we wrap a NeonGenTwoSingleOpFn to turn it into
+     t = FIELD_DP64(t, ID_AA64ISAR0, RDM, 1);      /* FEAT_RDM */
-+ * a NeonGenTwoOpFn.
+     t = FIELD_DP64(t, ID_AA64ISAR0, SHA3, 1);     /* FEAT_SHA3 */
 + */
 +#define WRAP_FP_FN(WRAPNAME, FUNC)                              \
 +    static void WRAPNAME(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm) \
 +    {                                                           \
 +        TCGv_ptr fpstatus = get_fpstatus_ptr(1);                \
 +        FUNC(rd, rn, rm, fpstatus);                             \
 +        tcg_temp_free_ptr(fpstatus);                            \
 +    }
 +
 +WRAP_FP_FN(gen_VMUL_F_mul, gen_helper_vfp_muls)
 +WRAP_FP_FN(gen_VMUL_F_add, gen_helper_vfp_adds)
 +WRAP_FP_FN(gen_VMUL_F_sub, gen_helper_vfp_subs)
 +
 +static bool trans_VMUL_F_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_mul,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VMLA_F_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_mul,
 +        NULL,
 +    };
 +    static NeonGenTwoOpFn * const accfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_add,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 +}
 +
 +static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_mul,
 +        NULL,
 +    };
 +    static NeonGenTwoOpFn * const accfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_sub,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  case 0: /* Integer VMLA scalar */
                  case 4: /* Integer VMLS scalar */
                  case 8: /* Integer VMUL scalar */
 -                    return 1; /* handled by decodetree */
 -
                  case 1: /* Float VMLA scalar */
                  case 5: /* Floating point VMLS scalar */
                  case 9: /* Floating point VMUL scalar */
 -                    if (size == 1) {
 -                        return 1;
 -                    }
 -                    /* fall through */
 +                    return 1; /* handled by decodetree */
 +
                  case 12: /* VQDMULH scalar */
                  case 13: /* VQRDMULH scalar */
                      if (u && ((rd | rn) & 1)) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              } else {
                                  gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
                              }
 -                        } else if (op == 13) {
 +                        } else {
                              if (size == 1) {
                                  gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
                              } else {
                                  gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
                              }
 -                        } else {
 -                            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
 -                            gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
 -                            tcg_temp_free_ptr(fpstatus);
                          }
                          tcg_temp_free_i32(tmp2);
 -                        if (op < 8) {
 -                            /* Accumulate.  */
 -                            tmp2 = neon_load_reg(rd, pass);
 -                            switch (op) {
 -                            case 1:
 -                            {
 -                                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
 -                                gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
 -                                tcg_temp_free_ptr(fpstatus);
 -                                break;
 -                            }
 -                            case 5:
 -                            {
 -                                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
 -                                gen_helper_vfp_subs(tmp, tmp2, tmp, fpstatus);
 -                                tcg_temp_free_ptr(fpstatus);
 -                                break;
 -                            }
 -                            default:
 -                                abort();
 -                            }
 -                            tcg_temp_free_i32(tmp2);
 -                        }
                          neon_store_reg(rd, pass, tmp);
                      }
                      break;
 --
-.20.1
+.34.1

-[PULL 19/23] Implement configurable descriptor size in ftgmac100
+[PULL 11/24] Remove i.MX7 IOMUX GPR device from i.MX6UL
-From: Erik Smit <erik.lucas.smit@gmail.com>
+From: Jean-Christophe Dubois <jcd@tribudubois.net>
-The hardware supports configurable descriptor sizes, configured in the DBLAC
+i.MX7 IOMUX GPR device is not equivalent to i.MX6UL IOMUXC GPR device.
-register.
+In particular, register 22 is not present on i.MX6UL and this is actualy
 The only register that is really emulated in the i.MX7 IOMUX GPR device.
-Most drivers use the default 4 word descriptor, which is currently hardcoded,
+Note: The i.MX6UL code is actually also implementing the IOMUX GPR device
-but Aspeed SDK configures 8 words to store extra data.
+as an unimplemented device at the same bus adress and the 2 instantiations
 were actualy colliding. So we go back to the unimplemented device for now.
-Signed-off-by: Erik Smit <erik.lucas.smit@gmail.com>
+Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
-Reviewed-by: Cédric Le Goater <clg@kaod.org>
+Message-id: 48681bf51ee97646479bb261bee19abebbc8074e.1692964892.git.jcd@tribudubois.net
-[PMM: removed unnecessary parens]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/net/ftgmac100.c | 26 ++++++++++++++++++++++++--
+ include/hw/arm/fsl-imx6ul.h |  2 --
-file changed, 24 insertions(+), 2 deletions(-)
+ hw/arm/fsl-imx6ul.c         | 11 -----------
 files changed, 13 deletions(-)
-diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
+diff --git a/include/hw/arm/fsl-imx6ul.h b/include/hw/arm/fsl-imx6ul.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/ftgmac100.c
+--- a/include/hw/arm/fsl-imx6ul.h
-+++ b/hw/net/ftgmac100.c
++++ b/include/hw/arm/fsl-imx6ul.h
 @@ -XXX,XX +XXX,XX @@
- #define FTGMAC100_APTC_TXPOLL_CNT(x)        (((x) >> 8) & 0xf)
+ #include "hw/misc/imx6ul_ccm.h"
- #define FTGMAC100_APTC_TXPOLL_TIME_SEL      (1 << 12)
+ #include "hw/misc/imx6_src.h"
+ #include "hw/misc/imx7_snvs.h"
-+/*
+-#include "hw/misc/imx7_gpr.h"
-+ * DMA burst length and arbitration control register
+ #include "hw/intc/imx_gpcv2.h"
-+ */
+ #include "hw/watchdog/wdt_imx2.h"
-+#define FTGMAC100_DBLAC_RXBURST_SIZE(x)     (((x) >> 8) & 0x3)
+ #include "hw/gpio/imx_gpio.h"
-+#define FTGMAC100_DBLAC_TXBURST_SIZE(x)     (((x) >> 10) & 0x3)
+@@ -XXX,XX +XXX,XX @@ struct FslIMX6ULState {
-+#define FTGMAC100_DBLAC_RXDES_SIZE(x)       ((((x) >> 12) & 0xf) * 8)
+     IMX6SRCState       src;
-+#define FTGMAC100_DBLAC_TXDES_SIZE(x)       ((((x) >> 16) & 0xf) * 8)
+     IMX7SNVSState      snvs;
-+#define FTGMAC100_DBLAC_IFG_CNT(x)          (((x) >> 20) & 0x7)
+     IMXGPCv2State      gpcv2;
-+#define FTGMAC100_DBLAC_IFG_INC             (1 << 23)
+-    IMX7GPRState       gpr;
-+
+     IMXSPIState        spi[FSL_IMX6UL_NUM_ECSPIS];
- /*
+     IMXI2CState        i2c[FSL_IMX6UL_NUM_I2CS];
-  * PHY control register
+     IMXSerialState     uart[FSL_IMX6UL_NUM_UARTS];
-  */
+diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
-@@ -XXX,XX +XXX,XX @@ static void ftgmac100_do_tx(FTGMAC100State *s, uint32_t tx_ring,
+index XXXXXXX..XXXXXXX 100644
-         if (bd.des0 & s->txdes0_edotr) {
+--- a/hw/arm/fsl-imx6ul.c
-             addr = tx_ring;
++++ b/hw/arm/fsl-imx6ul.c
-         } else {
+@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_init(Object *obj)
--            addr += sizeof(FTGMAC100Desc);
+      */
-+            addr += FTGMAC100_DBLAC_TXDES_SIZE(s->dblac);
+     object_initialize_child(obj, "snvs", &s->snvs, TYPE_IMX7_SNVS);
-         }
 -    /*
 -     * GPR
 -     */
 -    object_initialize_child(obj, "gpr", &s->gpr, TYPE_IMX7_GPR);
 -
      /*
       * GPIOs 1 to 5
       */
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
                                              FSL_IMX6UL_WDOGn_IRQ[i]));
      }
-@@ -XXX,XX +XXX,XX @@ static void ftgmac100_write(void *opaque, hwaddr addr,
+-    /*
-         s->phydata = value & 0xffff;
+-     * GPR
-         break;
+-     */
-     case FTGMAC100_DBLAC: /* DMA Burst Length and Arbitration Control */
+-    sysbus_realize(SYS_BUS_DEVICE(&s->gpr), &error_abort);
-+        if (FTGMAC100_DBLAC_TXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+-    sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpr), 0, FSL_IMX6UL_IOMUXC_GPR_ADDR);
-+            qemu_log_mask(LOG_GUEST_ERROR,
+-
-+                          "%s: transmit descriptor too small : %d bytes\n",
+     /*
-+                          __func__, FTGMAC100_DBLAC_TXDES_SIZE(s->dblac));
+      * SDMA
-+            break;
+      */
 +        }
 +        if (FTGMAC100_DBLAC_RXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
 +            qemu_log_mask(LOG_GUEST_ERROR,
 +                          "%s: receive descriptor too small : %d bytes\n",
 +                          __func__, FTGMAC100_DBLAC_RXDES_SIZE(s->dblac));
 +            break;
 +        }
          s->dblac = value;
          break;
      case FTGMAC100_REVR:  /* Feature Register */
@@ -XXX,XX +XXX,XX @@ static ssize_t ftgmac100_receive(NetClientState *nc, const uint8_t *buf,
          if (bd.des0 & s->rxdes0_edorr) {
              addr = s->rx_ring;
          } else {
 -            addr += sizeof(FTGMAC100Desc);
 +            addr += FTGMAC100_DBLAC_RXDES_SIZE(s->dblac);
          }
      }
      s->rx_descriptor = addr;
 --
-.20.1
+.34.1

-[PULL 10/23] target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
+[PULL 12/24] Refactor i.MX6UL processor code
-Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
+From: Jean-Christophe Dubois <jcd@tribudubois.net>
 scalar" group to decodetree.  These are 32x32->32 operations where
 one of the inputs is the scalar, followed by a possible accumulate
 operation of the 32-bit result.
-The refactoring removes some of the oddities of the old decoder:
+* Add Addr and size definition for most i.MX6UL devices in i.MX6UL header file.
- * operands to the operation and accumulation were often
+* Use those newly defined named constants whenever possible.
-   reversed (taking advantage of the fact that most of these ops
+* Standardize the way we init a familly of unimplemented devices
-   are commutative); the new code follows the pseudocode order
+  - SAI
- * the Q bit in the insn was in a local variable 'u'; in the
+  - PWM
-   new code it is decoded into a->q
+  - CAN
 * Add/rework few comments
+Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
+Message-id: d579043fbd4e4b490370783fda43fc02c8e9be75.1692964892.git.jcd@tribudubois.net
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  15 ++++
+ include/hw/arm/fsl-imx6ul.h | 156 +++++++++++++++++++++++++++++++-----
- target/arm/translate-neon.inc.c | 133 ++++++++++++++++++++++++++++++++
+ hw/arm/fsl-imx6ul.c         | 147 ++++++++++++++++++++++-----------
- target/arm/translate.c          |  77 ++----------------
+files changed, 232 insertions(+), 71 deletions(-)
 files changed, 154 insertions(+), 71 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/include/hw/arm/fsl-imx6ul.h b/include/hw/arm/fsl-imx6ul.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/include/hw/arm/fsl-imx6ul.h
-+++ b/target/arm/neon-dp.decode
++++ b/include/hw/arm/fsl-imx6ul.h
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@
-     VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
+ #include "exec/memory.h"
+ #include "cpu.h"
-     VMULL_P_3d   1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
+ #include "qom/object.h"
-+
++#include "qemu/units.h"
-+    ##################################################################
-+    # 2-regs-plus-scalar grouping:
+ #define TYPE_FSL_IMX6UL "fsl-imx6ul"
-+    # 1111 001 Q 1 D sz!=11 Vn:4 Vd:4 opc:4 N 1 M 0 Vm:4
+ OBJECT_DECLARE_SIMPLE_TYPE(FslIMX6ULState, FSL_IMX6UL)
-+    ##################################################################
+@@ -XXX,XX +XXX,XX @@ enum FslIMX6ULConfiguration {
-+    &2scalar vm vn vd size q
+     FSL_IMX6UL_NUM_ADCS         = 2,
-+
+     FSL_IMX6UL_NUM_USB_PHYS     = 2,
-+    @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
+     FSL_IMX6UL_NUM_USBS         = 2,
-+                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
++    FSL_IMX6UL_NUM_SAIS         = 3,
-+
++    FSL_IMX6UL_NUM_CANS         = 2,
-+    VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
++    FSL_IMX6UL_NUM_PWMS         = 4,
-+
+ };
-+    VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
-+
+ struct FslIMX6ULState {
-+    VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
+@@ -XXX,XX +XXX,XX @@ struct FslIMX6ULState {
-   ]
- }
+ enum FslIMX6ULMemoryMap {
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+     FSL_IMX6UL_MMDC_ADDR            = 0x80000000,
 -    FSL_IMX6UL_MMDC_SIZE            = 2 * 1024 * 1024 * 1024UL,
 +    FSL_IMX6UL_MMDC_SIZE            = (2 * GiB),
      FSL_IMX6UL_QSPI1_MEM_ADDR       = 0x60000000,
 -    FSL_IMX6UL_EIM_ALIAS_ADDR       = 0x58000000,
 -    FSL_IMX6UL_EIM_CS_ADDR          = 0x50000000,
 -    FSL_IMX6UL_AES_ENCRYPT_ADDR     = 0x10000000,
 -    FSL_IMX6UL_QSPI1_RX_ADDR        = 0x0C000000,
 +    FSL_IMX6UL_QSPI1_MEM_SIZE       = (256 * MiB),
 -    /* AIPS-2 */
 +    FSL_IMX6UL_EIM_ALIAS_ADDR       = 0x58000000,
 +    FSL_IMX6UL_EIM_ALIAS_SIZE       = (128 * MiB),
 +
 +    FSL_IMX6UL_EIM_CS_ADDR          = 0x50000000,
 +    FSL_IMX6UL_EIM_CS_SIZE          = (128 * MiB),
 +
 +    FSL_IMX6UL_AES_ENCRYPT_ADDR     = 0x10000000,
 +    FSL_IMX6UL_AES_ENCRYPT_SIZE     = (1 * MiB),
 +
 +    FSL_IMX6UL_QSPI1_RX_ADDR        = 0x0C000000,
 +    FSL_IMX6UL_QSPI1_RX_SIZE        = (32 * MiB),
 +
 +    /* AIPS-2 Begin */
      FSL_IMX6UL_UART6_ADDR           = 0x021FC000,
 +
      FSL_IMX6UL_I2C4_ADDR            = 0x021F8000,
 +
      FSL_IMX6UL_UART5_ADDR           = 0x021F4000,
      FSL_IMX6UL_UART4_ADDR           = 0x021F0000,
      FSL_IMX6UL_UART3_ADDR           = 0x021EC000,
      FSL_IMX6UL_UART2_ADDR           = 0x021E8000,
 +
      FSL_IMX6UL_WDOG3_ADDR           = 0x021E4000,
 +
      FSL_IMX6UL_QSPI_ADDR            = 0x021E0000,
 +    FSL_IMX6UL_QSPI_SIZE            = 0x500,
 +
      FSL_IMX6UL_SYS_CNT_CTRL_ADDR    = 0x021DC000,
 +    FSL_IMX6UL_SYS_CNT_CTRL_SIZE    = (16 * KiB),
 +
      FSL_IMX6UL_SYS_CNT_CMP_ADDR     = 0x021D8000,
 +    FSL_IMX6UL_SYS_CNT_CMP_SIZE     = (16 * KiB),
 +
      FSL_IMX6UL_SYS_CNT_RD_ADDR      = 0x021D4000,
 +    FSL_IMX6UL_SYS_CNT_RD_SIZE      = (16 * KiB),
 +
      FSL_IMX6UL_TZASC_ADDR           = 0x021D0000,
 +    FSL_IMX6UL_TZASC_SIZE           = (16 * KiB),
 +
      FSL_IMX6UL_PXP_ADDR             = 0x021CC000,
 +    FSL_IMX6UL_PXP_SIZE             = (16 * KiB),
 +
      FSL_IMX6UL_LCDIF_ADDR           = 0x021C8000,
 +    FSL_IMX6UL_LCDIF_SIZE           = 0x100,
 +
      FSL_IMX6UL_CSI_ADDR             = 0x021C4000,
 +    FSL_IMX6UL_CSI_SIZE             = 0x100,
 +
      FSL_IMX6UL_CSU_ADDR             = 0x021C0000,
 +    FSL_IMX6UL_CSU_SIZE             = (16 * KiB),
 +
      FSL_IMX6UL_OCOTP_CTRL_ADDR      = 0x021BC000,
 +    FSL_IMX6UL_OCOTP_CTRL_SIZE      = (4 * KiB),
 +
      FSL_IMX6UL_EIM_ADDR             = 0x021B8000,
 +    FSL_IMX6UL_EIM_SIZE             = 0x100,
 +
      FSL_IMX6UL_SIM2_ADDR            = 0x021B4000,
 +
      FSL_IMX6UL_MMDC_CFG_ADDR        = 0x021B0000,
 +    FSL_IMX6UL_MMDC_CFG_SIZE        = (4 * KiB),
 +
      FSL_IMX6UL_ROMCP_ADDR           = 0x021AC000,
 +    FSL_IMX6UL_ROMCP_SIZE           = 0x300,
 +
      FSL_IMX6UL_I2C3_ADDR            = 0x021A8000,
      FSL_IMX6UL_I2C2_ADDR            = 0x021A4000,
      FSL_IMX6UL_I2C1_ADDR            = 0x021A0000,
 +
      FSL_IMX6UL_ADC2_ADDR            = 0x0219C000,
      FSL_IMX6UL_ADC1_ADDR            = 0x02198000,
 +    FSL_IMX6UL_ADCn_SIZE            = 0x100,
 +
      FSL_IMX6UL_USDHC2_ADDR          = 0x02194000,
      FSL_IMX6UL_USDHC1_ADDR          = 0x02190000,
 -    FSL_IMX6UL_SIM1_ADDR            = 0x0218C000,
 -    FSL_IMX6UL_ENET1_ADDR           = 0x02188000,
 -    FSL_IMX6UL_USBO2_USBMISC_ADDR   = 0x02184800,
 -    FSL_IMX6UL_USBO2_USB_ADDR       = 0x02184000,
 -    FSL_IMX6UL_USBO2_PL301_ADDR     = 0x02180000,
 -    FSL_IMX6UL_AIPS2_CFG_ADDR       = 0x0217C000,
 -    FSL_IMX6UL_CAAM_ADDR            = 0x02140000,
 -    FSL_IMX6UL_A7MPCORE_DAP_ADDR    = 0x02100000,
 -    /* AIPS-1 */
 +    FSL_IMX6UL_SIM1_ADDR            = 0x0218C000,
 +    FSL_IMX6UL_SIMn_SIZE            = (16 * KiB),
 +
 +    FSL_IMX6UL_ENET1_ADDR           = 0x02188000,
 +
 +    FSL_IMX6UL_USBO2_USBMISC_ADDR   = 0x02184800,
 +    FSL_IMX6UL_USBO2_USB1_ADDR      = 0x02184000,
 +    FSL_IMX6UL_USBO2_USB2_ADDR      = 0x02184200,
 +
 +    FSL_IMX6UL_USBO2_PL301_ADDR     = 0x02180000,
 +    FSL_IMX6UL_USBO2_PL301_SIZE     = (16 * KiB),
 +
 +    FSL_IMX6UL_AIPS2_CFG_ADDR       = 0x0217C000,
 +    FSL_IMX6UL_AIPS2_CFG_SIZE       = 0x100,
 +
 +    FSL_IMX6UL_CAAM_ADDR            = 0x02140000,
 +    FSL_IMX6UL_CAAM_SIZE            = (16 * KiB),
 +
 +    FSL_IMX6UL_A7MPCORE_DAP_ADDR    = 0x02100000,
 +    FSL_IMX6UL_A7MPCORE_DAP_SIZE    = (4 * KiB),
 +    /* AIPS-2 End */
 +
 +    /* AIPS-1 Begin */
      FSL_IMX6UL_PWM8_ADDR            = 0x020FC000,
      FSL_IMX6UL_PWM7_ADDR            = 0x020F8000,
      FSL_IMX6UL_PWM6_ADDR            = 0x020F4000,
      FSL_IMX6UL_PWM5_ADDR            = 0x020F0000,
 +
      FSL_IMX6UL_SDMA_ADDR            = 0x020EC000,
 +    FSL_IMX6UL_SDMA_SIZE            = 0x300,
 +
      FSL_IMX6UL_GPT2_ADDR            = 0x020E8000,
 +
      FSL_IMX6UL_IOMUXC_GPR_ADDR      = 0x020E4000,
 +    FSL_IMX6UL_IOMUXC_GPR_SIZE      = 0x40,
 +
      FSL_IMX6UL_IOMUXC_ADDR          = 0x020E0000,
 +    FSL_IMX6UL_IOMUXC_SIZE          = 0x700,
 +
      FSL_IMX6UL_GPC_ADDR             = 0x020DC000,
 +
      FSL_IMX6UL_SRC_ADDR             = 0x020D8000,
 +
      FSL_IMX6UL_EPIT2_ADDR           = 0x020D4000,
      FSL_IMX6UL_EPIT1_ADDR           = 0x020D0000,
 +
      FSL_IMX6UL_SNVS_HP_ADDR         = 0x020CC000,
 +
      FSL_IMX6UL_USBPHY2_ADDR         = 0x020CA000,
 -    FSL_IMX6UL_USBPHY2_SIZE         = (4 * 1024),
      FSL_IMX6UL_USBPHY1_ADDR         = 0x020C9000,
 -    FSL_IMX6UL_USBPHY1_SIZE         = (4 * 1024),
 +
      FSL_IMX6UL_ANALOG_ADDR          = 0x020C8000,
 +    FSL_IMX6UL_ANALOG_SIZE          = 0x300,
 +
      FSL_IMX6UL_CCM_ADDR             = 0x020C4000,
 +
      FSL_IMX6UL_WDOG2_ADDR           = 0x020C0000,
      FSL_IMX6UL_WDOG1_ADDR           = 0x020BC000,
 +
      FSL_IMX6UL_KPP_ADDR             = 0x020B8000,
 +    FSL_IMX6UL_KPP_SIZE             = 0x10,
 +
      FSL_IMX6UL_ENET2_ADDR           = 0x020B4000,
 +
      FSL_IMX6UL_SNVS_LP_ADDR         = 0x020B0000,
 +    FSL_IMX6UL_SNVS_LP_SIZE         = (16 * KiB),
 +
      FSL_IMX6UL_GPIO5_ADDR           = 0x020AC000,
      FSL_IMX6UL_GPIO4_ADDR           = 0x020A8000,
      FSL_IMX6UL_GPIO3_ADDR           = 0x020A4000,
      FSL_IMX6UL_GPIO2_ADDR           = 0x020A0000,
      FSL_IMX6UL_GPIO1_ADDR           = 0x0209C000,
 +
      FSL_IMX6UL_GPT1_ADDR            = 0x02098000,
 +
      FSL_IMX6UL_CAN2_ADDR            = 0x02094000,
      FSL_IMX6UL_CAN1_ADDR            = 0x02090000,
 +    FSL_IMX6UL_CANn_SIZE            = (4 * KiB),
 +
      FSL_IMX6UL_PWM4_ADDR            = 0x0208C000,
      FSL_IMX6UL_PWM3_ADDR            = 0x02088000,
      FSL_IMX6UL_PWM2_ADDR            = 0x02084000,
      FSL_IMX6UL_PWM1_ADDR            = 0x02080000,
 +    FSL_IMX6UL_PWMn_SIZE            = 0x20,
 +
      FSL_IMX6UL_AIPS1_CFG_ADDR       = 0x0207C000,
 +    FSL_IMX6UL_AIPS1_CFG_SIZE       = (16 * KiB),
 +
      FSL_IMX6UL_BEE_ADDR             = 0x02044000,
 +    FSL_IMX6UL_BEE_SIZE             = (16 * KiB),
 +
      FSL_IMX6UL_TOUCH_CTRL_ADDR      = 0x02040000,
 +    FSL_IMX6UL_TOUCH_CTRL_SIZE      = 0x100,
 +
      FSL_IMX6UL_SPBA_ADDR            = 0x0203C000,
 +    FSL_IMX6UL_SPBA_SIZE            = 0x100,
 +
      FSL_IMX6UL_ASRC_ADDR            = 0x02034000,
 +    FSL_IMX6UL_ASRC_SIZE            = 0x100,
 +
      FSL_IMX6UL_SAI3_ADDR            = 0x02030000,
      FSL_IMX6UL_SAI2_ADDR            = 0x0202C000,
      FSL_IMX6UL_SAI1_ADDR            = 0x02028000,
 +    FSL_IMX6UL_SAIn_SIZE            = 0x200,
 +
      FSL_IMX6UL_UART8_ADDR           = 0x02024000,
      FSL_IMX6UL_UART1_ADDR           = 0x02020000,
      FSL_IMX6UL_UART7_ADDR           = 0x02018000,
 +
      FSL_IMX6UL_ECSPI4_ADDR          = 0x02014000,
      FSL_IMX6UL_ECSPI3_ADDR          = 0x02010000,
      FSL_IMX6UL_ECSPI2_ADDR          = 0x0200C000,
      FSL_IMX6UL_ECSPI1_ADDR          = 0x02008000,
 +
      FSL_IMX6UL_SPDIF_ADDR           = 0x02004000,
 +    FSL_IMX6UL_SPDIF_SIZE           = 0x100,
 +    /* AIPS-1 End */
 +
 +    FSL_IMX6UL_BCH_ADDR             = 0x01808000,
 +    FSL_IMX6UL_BCH_SIZE             = 0x200,
 +
 +    FSL_IMX6UL_GPMI_ADDR            = 0x01806000,
 +    FSL_IMX6UL_GPMI_SIZE            = 0x200,
      FSL_IMX6UL_APBH_DMA_ADDR        = 0x01804000,
 -    FSL_IMX6UL_APBH_DMA_SIZE        = (32 * 1024),
 +    FSL_IMX6UL_APBH_DMA_SIZE        = (4 * KiB),
      FSL_IMX6UL_A7MPCORE_ADDR        = 0x00A00000,
      FSL_IMX6UL_OCRAM_ALIAS_ADDR     = 0x00920000,
 -    FSL_IMX6UL_OCRAM_ALIAS_SIZE     = 0x00060000,
 +    FSL_IMX6UL_OCRAM_ALIAS_SIZE     = (384 * KiB),
 +
      FSL_IMX6UL_OCRAM_MEM_ADDR       = 0x00900000,
 -    FSL_IMX6UL_OCRAM_MEM_SIZE       = 0x00020000,
 +    FSL_IMX6UL_OCRAM_MEM_SIZE       = (128 * KiB),
 +
      FSL_IMX6UL_CAAM_MEM_ADDR        = 0x00100000,
 -    FSL_IMX6UL_CAAM_MEM_SIZE        = 0x00008000,
 +    FSL_IMX6UL_CAAM_MEM_SIZE        = (32 * KiB),
 +
      FSL_IMX6UL_ROM_ADDR             = 0x00000000,
 -    FSL_IMX6UL_ROM_SIZE             = 0x00018000,
 +    FSL_IMX6UL_ROM_SIZE             = (96 * KiB),
  };
  enum FslIMX6ULIRQs {
 diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/hw/arm/fsl-imx6ul.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/hw/arm/fsl-imx6ul.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
+@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_init(Object *obj)
-, 16, 0, fn_gvec);
+     object_initialize_child(obj, "snvs", &s->snvs, TYPE_IMX7_SNVS);
-     return true;
- }
+     /*
-+
+-     * GPIOs 1 to 5
-+static void gen_neon_dup_low16(TCGv_i32 var)
++     * GPIOs
-+{
+      */
-+    TCGv_i32 tmp = tcg_temp_new_i32();
+     for (i = 0; i < FSL_IMX6UL_NUM_GPIOS; i++) {
-+    tcg_gen_ext16u_i32(var, var);
+         snprintf(name, NAME_SIZE, "gpio%d", i);
-+    tcg_gen_shli_i32(tmp, var, 16);
+@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_init(Object *obj)
-+    tcg_gen_or_i32(var, var, tmp);
+     }
-+    tcg_temp_free_i32(tmp);
-+}
+     /*
-+
+-     * GPT 1, 2
-+static void gen_neon_dup_high16(TCGv_i32 var)
++     * GPTs
-+{
+      */
-+    TCGv_i32 tmp = tcg_temp_new_i32();
+     for (i = 0; i < FSL_IMX6UL_NUM_GPTS; i++) {
-+    tcg_gen_andi_i32(var, var, 0xffff0000);
+         snprintf(name, NAME_SIZE, "gpt%d", i);
-+    tcg_gen_shri_i32(tmp, var, 16);
+@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_init(Object *obj)
-+    tcg_gen_or_i32(var, var, tmp);
+     }
-+    tcg_temp_free_i32(tmp);
-+}
+     /*
-+
+-     * EPIT 1, 2
-+static inline TCGv_i32 neon_get_scalar(int size, int reg)
++     * EPITs
-+{
+      */
-+    TCGv_i32 tmp;
+     for (i = 0; i < FSL_IMX6UL_NUM_EPITS; i++) {
-+    if (size == 1) {
+         snprintf(name, NAME_SIZE, "epit%d", i + 1);
-+        tmp = neon_load_reg(reg & 7, reg >> 4);
+@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_init(Object *obj)
-+        if (reg & 8) {
+     }
-+            gen_neon_dup_high16(tmp);
-+        } else {
+     /*
-+            gen_neon_dup_low16(tmp);
+-     * eCSPI
-+        }
++     * eCSPIs
-+    } else {
+      */
-+        tmp = neon_load_reg(reg & 15, reg >> 4);
+     for (i = 0; i < FSL_IMX6UL_NUM_ECSPIS; i++) {
          snprintf(name, NAME_SIZE, "spi%d", i + 1);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_init(Object *obj)
      }
      /*
 -     * I2C
 +     * I2Cs
       */
      for (i = 0; i < FSL_IMX6UL_NUM_I2CS; i++) {
          snprintf(name, NAME_SIZE, "i2c%d", i + 1);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_init(Object *obj)
      }
      /*
 -     * UART
 +     * UARTs
       */
      for (i = 0; i < FSL_IMX6UL_NUM_UARTS; i++) {
          snprintf(name, NAME_SIZE, "uart%d", i);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_init(Object *obj)
      }
      /*
 -     * Ethernet
 +     * Ethernets
       */
      for (i = 0; i < FSL_IMX6UL_NUM_ETHS; i++) {
          snprintf(name, NAME_SIZE, "eth%d", i);
          object_initialize_child(obj, name, &s->eth[i], TYPE_IMX_ENET);
      }
 -    /* USB */
 +    /*
 +     * USB PHYs
 +     */
      for (i = 0; i < FSL_IMX6UL_NUM_USB_PHYS; i++) {
          snprintf(name, NAME_SIZE, "usbphy%d", i);
          object_initialize_child(obj, name, &s->usbphy[i], TYPE_IMX_USBPHY);
      }
 +
 +    /*
 +     * USBs
 +     */
      for (i = 0; i < FSL_IMX6UL_NUM_USBS; i++) {
          snprintf(name, NAME_SIZE, "usb%d", i);
          object_initialize_child(obj, name, &s->usb[i], TYPE_CHIPIDEA);
      }
      /*
 -     * SDHCI
 +     * SDHCIs
       */
      for (i = 0; i < FSL_IMX6UL_NUM_USDHCS; i++) {
          snprintf(name, NAME_SIZE, "usdhc%d", i);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_init(Object *obj)
      }
      /*
 -     * Watchdog
 +     * Watchdogs
       */
      for (i = 0; i < FSL_IMX6UL_NUM_WDTS; i++) {
          snprintf(name, NAME_SIZE, "wdt%d", i);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
       * A7MPCORE DAP
       */
      create_unimplemented_device("a7mpcore-dap", FSL_IMX6UL_A7MPCORE_DAP_ADDR,
 -                                0x100000);
 +                                FSL_IMX6UL_A7MPCORE_DAP_SIZE);
      /*
 -     * GPT 1, 2
 +     * GPTs
       */
      for (i = 0; i < FSL_IMX6UL_NUM_GPTS; i++) {
          static const hwaddr FSL_IMX6UL_GPTn_ADDR[FSL_IMX6UL_NUM_GPTS] = {
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
      }
      /*
 -     * EPIT 1, 2
 +     * EPITs
       */
      for (i = 0; i < FSL_IMX6UL_NUM_EPITS; i++) {
          static const hwaddr FSL_IMX6UL_EPITn_ADDR[FSL_IMX6UL_NUM_EPITS] = {
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
      }
      /*
 -     * GPIO
 +     * GPIOs
       */
      for (i = 0; i < FSL_IMX6UL_NUM_GPIOS; i++) {
          static const hwaddr FSL_IMX6UL_GPIOn_ADDR[FSL_IMX6UL_NUM_GPIOS] = {
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
      }
      /*
 -     * IOMUXC and IOMUXC_GPR
 +     * IOMUXC
       */
 -    for (i = 0; i < 1; i++) {
 -        static const hwaddr FSL_IMX6UL_IOMUXCn_ADDR[FSL_IMX6UL_NUM_IOMUXCS] = {
 -            FSL_IMX6UL_IOMUXC_ADDR,
 -            FSL_IMX6UL_IOMUXC_GPR_ADDR,
 -        };
 -
 -        snprintf(name, NAME_SIZE, "iomuxc%d", i);
 -        create_unimplemented_device(name, FSL_IMX6UL_IOMUXCn_ADDR[i], 0x4000);
 -    }
 +    create_unimplemented_device("iomuxc", FSL_IMX6UL_IOMUXC_ADDR,
 +                                FSL_IMX6UL_IOMUXC_SIZE);
 +    create_unimplemented_device("iomuxc_gpr", FSL_IMX6UL_IOMUXC_GPR_ADDR,
 +                                FSL_IMX6UL_IOMUXC_GPR_SIZE);
      /*
       * CCM
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
      sysbus_realize(SYS_BUS_DEVICE(&s->gpcv2), &error_abort);
      sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpcv2), 0, FSL_IMX6UL_GPC_ADDR);
 -    /* Initialize all ECSPI */
 +    /*
 +     * ECSPIs
 +     */
      for (i = 0; i < FSL_IMX6UL_NUM_ECSPIS; i++) {
          static const hwaddr FSL_IMX6UL_SPIn_ADDR[FSL_IMX6UL_NUM_ECSPIS] = {
              FSL_IMX6UL_ECSPI1_ADDR,
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
      }
      /*
 -     * I2C
 +     * I2Cs
       */
      for (i = 0; i < FSL_IMX6UL_NUM_I2CS; i++) {
          static const hwaddr FSL_IMX6UL_I2Cn_ADDR[FSL_IMX6UL_NUM_I2CS] = {
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
      }
      /*
 -     * UART
 +     * UARTs
       */
      for (i = 0; i < FSL_IMX6UL_NUM_UARTS; i++) {
          static const hwaddr FSL_IMX6UL_UARTn_ADDR[FSL_IMX6UL_NUM_UARTS] = {
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
      }
      /*
 -     * Ethernet
 +     * Ethernets
       *
       * We must use two loops since phy_connected affects the other interface
       * and we have to set all properties before calling sysbus_realize().
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
                                              FSL_IMX6UL_ENETn_TIMER_IRQ[i]));
      }
 -    /* USB */
 +    /*
 +     * USB PHYs
 +     */
      for (i = 0; i < FSL_IMX6UL_NUM_USB_PHYS; i++) {
 +        static const hwaddr
 +                     FSL_IMX6UL_USB_PHYn_ADDR[FSL_IMX6UL_NUM_USB_PHYS] = {
 +            FSL_IMX6UL_USBPHY1_ADDR,
 +            FSL_IMX6UL_USBPHY2_ADDR,
 +        };
 +
          sysbus_realize(SYS_BUS_DEVICE(&s->usbphy[i]), &error_abort);
          sysbus_mmio_map(SYS_BUS_DEVICE(&s->usbphy[i]), 0,
 -                        FSL_IMX6UL_USBPHY1_ADDR + i * 0x1000);
 +                        FSL_IMX6UL_USB_PHYn_ADDR[i]);
      }
 +    /*
 +     * USBs
 +     */
      for (i = 0; i < FSL_IMX6UL_NUM_USBS; i++) {
 +        static const hwaddr FSL_IMX6UL_USB02_USBn_ADDR[FSL_IMX6UL_NUM_USBS] = {
 +            FSL_IMX6UL_USBO2_USB1_ADDR,
 +            FSL_IMX6UL_USBO2_USB2_ADDR,
 +        };
 +
          static const int FSL_IMX6UL_USBn_IRQ[] = {
              FSL_IMX6UL_USB1_IRQ,
              FSL_IMX6UL_USB2_IRQ,
          };
 +
          sysbus_realize(SYS_BUS_DEVICE(&s->usb[i]), &error_abort);
          sysbus_mmio_map(SYS_BUS_DEVICE(&s->usb[i]), 0,
 -                        FSL_IMX6UL_USBO2_USB_ADDR + i * 0x200);
 +                        FSL_IMX6UL_USB02_USBn_ADDR[i]);
          sysbus_connect_irq(SYS_BUS_DEVICE(&s->usb[i]), 0,
                             qdev_get_gpio_in(DEVICE(&s->a7mpcore),
                                              FSL_IMX6UL_USBn_IRQ[i]));
      }
      /*
 -     * USDHC
 +     * USDHCs
       */
      for (i = 0; i < FSL_IMX6UL_NUM_USDHCS; i++) {
          static const hwaddr FSL_IMX6UL_USDHCn_ADDR[FSL_IMX6UL_NUM_USDHCS] = {
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
      sysbus_mmio_map(SYS_BUS_DEVICE(&s->snvs), 0, FSL_IMX6UL_SNVS_HP_ADDR);
      /*
 -     * Watchdog
 +     * Watchdogs
       */
      for (i = 0; i < FSL_IMX6UL_NUM_WDTS; i++) {
          static const hwaddr FSL_IMX6UL_WDOGn_ADDR[FSL_IMX6UL_NUM_WDTS] = {
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
              FSL_IMX6UL_WDOG2_ADDR,
              FSL_IMX6UL_WDOG3_ADDR,
          };
 +
          static const int FSL_IMX6UL_WDOGn_IRQ[FSL_IMX6UL_NUM_WDTS] = {
              FSL_IMX6UL_WDOG1_IRQ,
              FSL_IMX6UL_WDOG2_IRQ,
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
      /*
       * SDMA
       */
 -    create_unimplemented_device("sdma", FSL_IMX6UL_SDMA_ADDR, 0x4000);
 +    create_unimplemented_device("sdma", FSL_IMX6UL_SDMA_ADDR,
 +                                FSL_IMX6UL_SDMA_SIZE);
      /*
 -     * SAI (Audio SSI (Synchronous Serial Interface))
 +     * SAIs (Audio SSI (Synchronous Serial Interface))
       */
 -    create_unimplemented_device("sai1", FSL_IMX6UL_SAI1_ADDR, 0x4000);
 -    create_unimplemented_device("sai2", FSL_IMX6UL_SAI2_ADDR, 0x4000);
 -    create_unimplemented_device("sai3", FSL_IMX6UL_SAI3_ADDR, 0x4000);
 +    for (i = 0; i < FSL_IMX6UL_NUM_SAIS; i++) {
 +        static const hwaddr FSL_IMX6UL_SAIn_ADDR[FSL_IMX6UL_NUM_SAIS] = {
 +            FSL_IMX6UL_SAI1_ADDR,
 +            FSL_IMX6UL_SAI2_ADDR,
 +            FSL_IMX6UL_SAI3_ADDR,
 +        };
 +
 +        snprintf(name, NAME_SIZE, "sai%d", i);
 +        create_unimplemented_device(name, FSL_IMX6UL_SAIn_ADDR[i],
 +                                    FSL_IMX6UL_SAIn_SIZE);
 +    }
-+    return tmp;
-+}
+     /*
-+
+-     * PWM
-+static bool do_2scalar(DisasContext *s, arg_2scalar *a,
++     * PWMs
-+                       NeonGenTwoOpFn *opfn, NeonGenTwoOpFn *accfn)
+      */
-+{
+-    create_unimplemented_device("pwm1", FSL_IMX6UL_PWM1_ADDR, 0x4000);
-+    /*
+-    create_unimplemented_device("pwm2", FSL_IMX6UL_PWM2_ADDR, 0x4000);
-+     * Two registers and a scalar: perform an operation between
+-    create_unimplemented_device("pwm3", FSL_IMX6UL_PWM3_ADDR, 0x4000);
-+     * the input elements and the scalar, and then possibly
+-    create_unimplemented_device("pwm4", FSL_IMX6UL_PWM4_ADDR, 0x4000);
-+     * perform an accumulation operation of that result into the
++    for (i = 0; i < FSL_IMX6UL_NUM_PWMS; i++) {
-+     * destination.
++        static const hwaddr FSL_IMX6UL_PWMn_ADDR[FSL_IMX6UL_NUM_PWMS] = {
-+     */
++            FSL_IMX6UL_PWM1_ADDR,
-+    TCGv_i32 scalar;
++            FSL_IMX6UL_PWM2_ADDR,
-+    int pass;
++            FSL_IMX6UL_PWM3_ADDR,
-+
++            FSL_IMX6UL_PWM4_ADDR,
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
++        };
-+        return false;
++
 +        snprintf(name, NAME_SIZE, "pwm%d", i);
 +        create_unimplemented_device(name, FSL_IMX6UL_PWMn_ADDR[i],
 +                                    FSL_IMX6UL_PWMn_SIZE);
 +    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+     /*
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+      * Audio ASRC (asynchronous sample rate converter)
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
+      */
-+        return false;
+-    create_unimplemented_device("asrc", FSL_IMX6UL_ASRC_ADDR, 0x4000);
 +    create_unimplemented_device("asrc", FSL_IMX6UL_ASRC_ADDR,
 +                                FSL_IMX6UL_ASRC_SIZE);
      /*
 -     * CAN
 +     * CANs
       */
 -    create_unimplemented_device("can1", FSL_IMX6UL_CAN1_ADDR, 0x4000);
 -    create_unimplemented_device("can2", FSL_IMX6UL_CAN2_ADDR, 0x4000);
 +    for (i = 0; i < FSL_IMX6UL_NUM_CANS; i++) {
 +        static const hwaddr FSL_IMX6UL_CANn_ADDR[FSL_IMX6UL_NUM_CANS] = {
 +            FSL_IMX6UL_CAN1_ADDR,
 +            FSL_IMX6UL_CAN2_ADDR,
 +        };
 +
 +        snprintf(name, NAME_SIZE, "can%d", i);
 +        create_unimplemented_device(name, FSL_IMX6UL_CANn_ADDR[i],
 +                                    FSL_IMX6UL_CANn_SIZE);
 +    }
-+
-+    if (!opfn) {
+     /*
-+        /* Bad size (including size == 3, which is a different insn group) */
+      * APHB_DMA
-+        return false;
+@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
-+    }
+         };
-+
-+    if (a->q && ((a->vd | a->vn) & 1)) {
+         snprintf(name, NAME_SIZE, "adc%d", i);
-+        return false;
+-        create_unimplemented_device(name, FSL_IMX6UL_ADCn_ADDR[i], 0x4000);
-+    }
++        create_unimplemented_device(name, FSL_IMX6UL_ADCn_ADDR[i],
-+
++                                    FSL_IMX6UL_ADCn_SIZE);
-+    if (!vfp_access_check(s)) {
+     }
-+        return true;
-+    }
+     /*
-+
+      * LCD
-+    scalar = neon_get_scalar(a->size, a->vm);
+      */
-+
+-    create_unimplemented_device("lcdif", FSL_IMX6UL_LCDIF_ADDR, 0x4000);
-+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
++    create_unimplemented_device("lcdif", FSL_IMX6UL_LCDIF_ADDR,
-+        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
++                                FSL_IMX6UL_LCDIF_SIZE);
-+        opfn(tmp, tmp, scalar);
-+        if (accfn) {
+     /*
-+            TCGv_i32 rd = neon_load_reg(a->vd, pass);
+      * ROM memory
 +            accfn(tmp, rd, tmp);
 +            tcg_temp_free_i32(rd);
 +        }
 +        neon_store_reg(a->vd, pass, tmp);
 +    }
 +    tcg_temp_free_i32(scalar);
 +    return true;
 +}
 +
 +static bool trans_VMUL_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_helper_neon_mul_u16,
 +        tcg_gen_mul_i32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VMLA_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_helper_neon_mul_u16,
 +        tcg_gen_mul_i32,
 +        NULL,
 +    };
 +    static NeonGenTwoOpFn * const accfn[] = {
 +        NULL,
 +        gen_helper_neon_add_u16,
 +        tcg_gen_add_i32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 +}
 +
 +static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_helper_neon_mul_u16,
 +        tcg_gen_mul_i32,
 +        NULL,
 +    };
 +    static NeonGenTwoOpFn * const accfn[] = {
 +        NULL,
 +        gen_helper_neon_sub_u16,
 +        tcg_gen_sub_i32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
  #define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16,  7)
  #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 -static void gen_neon_dup_low16(TCGv_i32 var)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_ext16u_i32(var, var);
 -    tcg_gen_shli_i32(tmp, var, 16);
 -    tcg_gen_or_i32(var, var, tmp);
 -    tcg_temp_free_i32(tmp);
 -}
 -
 -static void gen_neon_dup_high16(TCGv_i32 var)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_andi_i32(var, var, 0xffff0000);
 -    tcg_gen_shri_i32(tmp, var, 16);
 -    tcg_gen_or_i32(var, var, tmp);
 -    tcg_temp_free_i32(tmp);
 -}
 -
  static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
  {
  #ifndef CONFIG_USER_ONLY
@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
  #define CPU_V001 cpu_V0, cpu_V0, cpu_V1
 -static inline void gen_neon_add(int size, TCGv_i32 t0, TCGv_i32 t1)
 -{
 -    switch (size) {
 -    case 0: gen_helper_neon_add_u8(t0, t0, t1); break;
 -    case 1: gen_helper_neon_add_u16(t0, t0, t1); break;
 -    case 2: tcg_gen_add_i32(t0, t0, t1); break;
 -    default: abort();
 -    }
 -}
 -
 -static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
 -{
 -    switch (size) {
 -    case 0: gen_helper_neon_sub_u8(t0, t1, t0); break;
 -    case 1: gen_helper_neon_sub_u16(t0, t1, t0); break;
 -    case 2: tcg_gen_sub_i32(t0, t1, t0); break;
 -    default: return;
 -    }
 -}
 -
  static TCGv_i32 neon_load_scratch(int scratch)
  {
      TCGv_i32 tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static void neon_store_scratch(int scratch, TCGv_i32 var)
      tcg_temp_free_i32(var);
  }
 -static inline TCGv_i32 neon_get_scalar(int size, int reg)
 -{
 -    TCGv_i32 tmp;
 -    if (size == 1) {
 -        tmp = neon_load_reg(reg & 7, reg >> 4);
 -        if (reg & 8) {
 -            gen_neon_dup_high16(tmp);
 -        } else {
 -            gen_neon_dup_low16(tmp);
 -        }
 -    } else {
 -        tmp = neon_load_reg(reg & 15, reg >> 4);
 -    }
 -    return tmp;
 -}
 -
  static int gen_neon_unzip(int rd, int rm, int size, int q)
  {
      TCGv_ptr pd, pm;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      return 1;
                  }
                  switch (op) {
 +                case 0: /* Integer VMLA scalar */
 +                case 4: /* Integer VMLS scalar */
 +                case 8: /* Integer VMUL scalar */
 +                    return 1; /* handled by decodetree */
 +
                  case 1: /* Float VMLA scalar */
                  case 5: /* Floating point VMLS scalar */
                  case 9: /* Floating point VMUL scalar */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          return 1;
                      }
                      /* fall through */
 -                case 0: /* Integer VMLA scalar */
 -                case 4: /* Integer VMLS scalar */
 -                case 8: /* Integer VMUL scalar */
                  case 12: /* VQDMULH scalar */
                  case 13: /* VQRDMULH scalar */
                      if (u && ((rd | rn) & 1)) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              } else {
                                  gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
                              }
 -                        } else if (op & 1) {
 +                        } else {
                              TCGv_ptr fpstatus = get_fpstatus_ptr(1);
                              gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
                              tcg_temp_free_ptr(fpstatus);
 -                        } else {
 -                            switch (size) {
 -                            case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
 -                            case 1: gen_helper_neon_mul_u16(tmp, tmp, tmp2); break;
 -                            case 2: tcg_gen_mul_i32(tmp, tmp, tmp2); break;
 -                            default: abort();
 -                            }
                          }
                          tcg_temp_free_i32(tmp2);
                          if (op < 8) {
                              /* Accumulate.  */
                              tmp2 = neon_load_reg(rd, pass);
                              switch (op) {
 -                            case 0:
 -                                gen_neon_add(size, tmp, tmp2);
 -                                break;
                              case 1:
                              {
                                  TCGv_ptr fpstatus = get_fpstatus_ptr(1);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                  tcg_temp_free_ptr(fpstatus);
                                  break;
                              }
 -                            case 4:
 -                                gen_neon_rsb(size, tmp, tmp2);
 -                                break;
                              case 5:
                              {
                                  TCGv_ptr fpstatus = get_fpstatus_ptr(1);
 --
-.20.1
+.34.1

-[PULL 23/23] hw: arm: Set vendor property for IMX SDHCI emulations
+[PULL 13/24] Add i.MX6UL missing devices.
-From: Guenter Roeck <linux@roeck-us.net>
+From: Jean-Christophe Dubois <jcd@tribudubois.net>
-Set vendor property to IMX to enable IMX specific functionality
+* Add TZASC as unimplemented device.
-in sdhci code.
+  - Allow bare metal application to access this (unimplemented) device
 * Add CSU as unimplemented device.
   - Allow bare metal application to access this (unimplemented) device
 * Add 4 missing PWM devices
-Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
-Signed-off-by: Guenter Roeck <linux@roeck-us.net>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 59e4dc56e14eccfefd379275ec19048dff9c10b3.1692964892.git.jcd@tribudubois.net
 Message-id: 20200603145258.195920-3-linux@roeck-us.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/fsl-imx25.c  | 6 ++++++
+ include/hw/arm/fsl-imx6ul.h |  2 +-
- hw/arm/fsl-imx6.c   | 6 ++++++
+ hw/arm/fsl-imx6ul.c         | 16 ++++++++++++++++
- hw/arm/fsl-imx6ul.c | 2 ++
+files changed, 17 insertions(+), 1 deletion(-)
  hw/arm/fsl-imx7.c   | 2 ++
 files changed, 16 insertions(+)
-diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
+diff --git a/include/hw/arm/fsl-imx6ul.h b/include/hw/arm/fsl-imx6ul.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/fsl-imx25.c
+--- a/include/hw/arm/fsl-imx6ul.h
-+++ b/hw/arm/fsl-imx25.c
++++ b/include/hw/arm/fsl-imx6ul.h
-@@ -XXX,XX +XXX,XX @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ enum FslIMX6ULConfiguration {
-                                  &err);
+     FSL_IMX6UL_NUM_USBS         = 2,
-         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX25_ESDHC_CAPABILITIES,
+     FSL_IMX6UL_NUM_SAIS         = 3,
-                                  "capareg", &err);
+     FSL_IMX6UL_NUM_CANS         = 2,
-+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+-    FSL_IMX6UL_NUM_PWMS         = 4,
-+                                 "vendor", &err);
++    FSL_IMX6UL_NUM_PWMS         = 8,
-+        if (err) {
+ };
-+            error_propagate(errp, err);
-+            return;
+ struct FslIMX6ULState {
 +        }
          object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
          if (err) {
              error_propagate(errp, err);
 diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/fsl-imx6.c
 +++ b/hw/arm/fsl-imx6.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
                                   &err);
          object_property_set_uint(OBJECT(&s->esdhc[i]), IMX6_ESDHC_CAPABILITIES,
                                   "capareg", &err);
 +        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
 +                                 "vendor", &err);
 +        if (err) {
 +            error_propagate(errp, err);
 +            return;
 +        }
          object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
          if (err) {
              error_propagate(errp, err);
 diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/fsl-imx6ul.c
 +++ b/hw/arm/fsl-imx6ul.c
 @@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
-             FSL_IMX6UL_USDHC2_IRQ,
+             FSL_IMX6UL_PWM2_ADDR,
              FSL_IMX6UL_PWM3_ADDR,
              FSL_IMX6UL_PWM4_ADDR,
 +            FSL_IMX6UL_PWM5_ADDR,
 +            FSL_IMX6UL_PWM6_ADDR,
 +            FSL_IMX6UL_PWM7_ADDR,
 +            FSL_IMX6UL_PWM8_ADDR,
          };
-+        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
+         snprintf(name, NAME_SIZE, "pwm%d", i);
-+                                        "vendor", &error_abort);
+@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
-         object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
+     create_unimplemented_device("lcdif", FSL_IMX6UL_LCDIF_ADDR,
-                                  &error_abort);
+                                 FSL_IMX6UL_LCDIF_SIZE);
-diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
++    /*
-index XXXXXXX..XXXXXXX 100644
++     * CSU
---- a/hw/arm/fsl-imx7.c
++     */
-+++ b/hw/arm/fsl-imx7.c
++    create_unimplemented_device("csu", FSL_IMX6UL_CSU_ADDR,
-@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
++                                FSL_IMX6UL_CSU_SIZE);
-             FSL_IMX7_USDHC3_IRQ,
++
-         };
++    /*
++     * TZASC
-+        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
++     */
-+                                 "vendor", &error_abort);
++    create_unimplemented_device("tzasc", FSL_IMX6UL_TZASC_ADDR,
-         object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
++                                FSL_IMX6UL_TZASC_SIZE);
-                                  &error_abort);
++
+     /*
       * ROM memory
       */
 --
-.20.1
+.34.1

-[PULL 18/23] hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+[PULL 14/24] Refactor i.MX7 processor code
 From: Jean-Christophe Dubois <jcd@tribudubois.net>
-Some bits of the CCM registers are non writable.
+* Add Addr and size definition for all i.MX7 devices in i.MX7 header file.
+* Use those newly defined named constants whenever possible.
-This was left undone in the initial commit (all bits of registers were
+* Standardize the way we init a familly of unimplemented devices
-writable).
+  - SAI
+  - PWM
-This patch adds the required code to protect the non writable bits.
+  - CAN
 * Add/rework few comments
 Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
-Message-id: 20200608133508.550046-1-jcd@tribudubois.net
+Message-id: 59e195d33e4d486a8d131392acd46633c8c10ed7.1692964892.git.jcd@tribudubois.net
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/misc/imx6ul_ccm.c | 76 ++++++++++++++++++++++++++++++++++++--------
+ include/hw/arm/fsl-imx7.h | 330 ++++++++++++++++++++++++++++----------
-file changed, 63 insertions(+), 13 deletions(-)
+ hw/arm/fsl-imx7.c         | 130 ++++++++++-----
 files changed, 335 insertions(+), 125 deletions(-)
-diff --git a/hw/misc/imx6ul_ccm.c b/hw/misc/imx6ul_ccm.c
+diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/imx6ul_ccm.c
+--- a/include/hw/arm/fsl-imx7.h
-+++ b/hw/misc/imx6ul_ccm.c
++++ b/include/hw/arm/fsl-imx7.h
 @@ -XXX,XX +XXX,XX @@
+ #include "hw/misc/imx7_ccm.h"
- #include "trace.h"
+ #include "hw/misc/imx7_snvs.h"
+ #include "hw/misc/imx7_gpr.h"
-+static const uint32_t ccm_mask[CCM_MAX] = {
+-#include "hw/misc/imx6_src.h"
-+    [CCM_CCR] = 0xf01fef80,
+ #include "hw/watchdog/wdt_imx2.h"
-+    [CCM_CCDR] = 0xfffeffff,
+ #include "hw/gpio/imx_gpio.h"
-+    [CCM_CSR] = 0xffffffff,
+ #include "hw/char/imx_serial.h"
-+    [CCM_CCSR] = 0xfffffef2,
+@@ -XXX,XX +XXX,XX @@
-+    [CCM_CACRR] = 0xfffffff8,
+ #include "hw/usb/chipidea.h"
-+    [CCM_CBCDR] = 0xc1f8e000,
+ #include "cpu.h"
-+    [CCM_CBCMR] = 0xfc03cfff,
+ #include "qom/object.h"
-+    [CCM_CSCMR1] = 0x80700000,
++#include "qemu/units.h"
-+    [CCM_CSCMR2] = 0xe01ff003,
-+    [CCM_CSCDR1] = 0xfe00c780,
+ #define TYPE_FSL_IMX7 "fsl-imx7"
-+    [CCM_CS1CDR] = 0xfe00fe00,
+ OBJECT_DECLARE_SIMPLE_TYPE(FslIMX7State, FSL_IMX7)
-+    [CCM_CS2CDR] = 0xf8007000,
+@@ -XXX,XX +XXX,XX @@ enum FslIMX7Configuration {
-+    [CCM_CDCDR] = 0xf00fffff,
+     FSL_IMX7_NUM_ECSPIS       = 4,
-+    [CCM_CHSCCDR] = 0xfffc01ff,
+     FSL_IMX7_NUM_USBS         = 3,
-+    [CCM_CSCDR2] = 0xfe0001ff,
+     FSL_IMX7_NUM_ADCS         = 2,
-+    [CCM_CSCDR3] = 0xffffc1ff,
++    FSL_IMX7_NUM_SAIS         = 3,
-+    [CCM_CDHIPR] = 0xffffffff,
++    FSL_IMX7_NUM_CANS         = 2,
-+    [CCM_CTOR] = 0x00000000,
++    FSL_IMX7_NUM_PWMS         = 4,
-+    [CCM_CLPCR] = 0xf39ff01c,
+ };
-+    [CCM_CISR] = 0xfb85ffbe,
-+    [CCM_CIMR] = 0xfb85ffbf,
+ struct FslIMX7State {
-+    [CCM_CCOSR] = 0xfe00fe00,
+@@ -XXX,XX +XXX,XX @@ struct FslIMX7State {
-+    [CCM_CGPR] = 0xfffc3fea,
-+    [CCM_CCGR0] = 0x00000000,
+ enum FslIMX7MemoryMap {
-+    [CCM_CCGR1] = 0x00000000,
+     FSL_IMX7_MMDC_ADDR            = 0x80000000,
-+    [CCM_CCGR2] = 0x00000000,
+-    FSL_IMX7_MMDC_SIZE            = 2 * 1024 * 1024 * 1024UL,
-+    [CCM_CCGR3] = 0x00000000,
++    FSL_IMX7_MMDC_SIZE            = (2 * GiB),
-+    [CCM_CCGR4] = 0x00000000,
-+    [CCM_CCGR5] = 0x00000000,
+-    FSL_IMX7_GPIO1_ADDR           = 0x30200000,
-+    [CCM_CCGR6] = 0x00000000,
+-    FSL_IMX7_GPIO2_ADDR           = 0x30210000,
-+    [CCM_CMEOR] = 0xafffff1f,
+-    FSL_IMX7_GPIO3_ADDR           = 0x30220000,
-+};
+-    FSL_IMX7_GPIO4_ADDR           = 0x30230000,
-+
+-    FSL_IMX7_GPIO5_ADDR           = 0x30240000,
-+static const uint32_t analog_mask[CCM_ANALOG_MAX] = {
+-    FSL_IMX7_GPIO6_ADDR           = 0x30250000,
-+    [CCM_ANALOG_PLL_ARM] = 0xfff60f80,
+-    FSL_IMX7_GPIO7_ADDR           = 0x30260000,
-+    [CCM_ANALOG_PLL_USB1] = 0xfffe0fbc,
++    FSL_IMX7_QSPI1_MEM_ADDR       = 0x60000000,
-+    [CCM_ANALOG_PLL_USB2] = 0xfffe0fbc,
++    FSL_IMX7_QSPI1_MEM_SIZE       = (256 * MiB),
-+    [CCM_ANALOG_PLL_SYS] = 0xfffa0ffe,
-+    [CCM_ANALOG_PLL_SYS_SS] = 0x00000000,
+-    FSL_IMX7_IOMUXC_LPSR_GPR_ADDR = 0x30270000,
-+    [CCM_ANALOG_PLL_SYS_NUM] = 0xc0000000,
++    FSL_IMX7_PCIE1_MEM_ADDR       = 0x40000000,
-+    [CCM_ANALOG_PLL_SYS_DENOM] = 0xc0000000,
++    FSL_IMX7_PCIE1_MEM_SIZE       = (256 * MiB),
-+    [CCM_ANALOG_PLL_AUDIO] = 0xffe20f80,
-+    [CCM_ANALOG_PLL_AUDIO_NUM] = 0xc0000000,
+-    FSL_IMX7_WDOG1_ADDR           = 0x30280000,
-+    [CCM_ANALOG_PLL_AUDIO_DENOM] = 0xc0000000,
+-    FSL_IMX7_WDOG2_ADDR           = 0x30290000,
-+    [CCM_ANALOG_PLL_VIDEO] = 0xffe20f80,
+-    FSL_IMX7_WDOG3_ADDR           = 0x302A0000,
-+    [CCM_ANALOG_PLL_VIDEO_NUM] = 0xc0000000,
+-    FSL_IMX7_WDOG4_ADDR           = 0x302B0000,
-+    [CCM_ANALOG_PLL_VIDEO_DENOM] = 0xc0000000,
++    FSL_IMX7_QSPI1_RX_BUF_ADDR    = 0x34000000,
-+    [CCM_ANALOG_PLL_ENET] = 0xffc20ff0,
++    FSL_IMX7_QSPI1_RX_BUF_SIZE    = (32 * MiB),
-+    [CCM_ANALOG_PFD_480] = 0x40404040,
-+    [CCM_ANALOG_PFD_528] = 0x40404040,
+-    FSL_IMX7_IOMUXC_LPSR_ADDR     = 0x302C0000,
-+    [PMU_MISC0] = 0x01fe8306,
++    /* PCIe Peripherals */
-+    [PMU_MISC1] = 0x07fcede0,
++    FSL_IMX7_PCIE_REG_ADDR        = 0x33800000,
-+    [PMU_MISC2] = 0x005f5f5f,
-+};
+-    FSL_IMX7_GPT1_ADDR            = 0x302D0000,
-+
+-    FSL_IMX7_GPT2_ADDR            = 0x302E0000,
- static const char *imx6ul_ccm_reg_name(uint32_t reg)
+-    FSL_IMX7_GPT3_ADDR            = 0x302F0000,
- {
+-    FSL_IMX7_GPT4_ADDR            = 0x30300000,
-     static char unknown[20];
++    /* MMAP Peripherals */
-@@ -XXX,XX +XXX,XX @@ static void imx6ul_ccm_write(void *opaque, hwaddr offset, uint64_t value,
++    FSL_IMX7_DMA_APBH_ADDR        = 0x33000000,
++    FSL_IMX7_DMA_APBH_SIZE        = 0x8000,
-     trace_ccm_write_reg(imx6ul_ccm_reg_name(index), (uint32_t)value);
+-    FSL_IMX7_IOMUXC_ADDR          = 0x30330000,
--    /*
+-    FSL_IMX7_IOMUXC_GPR_ADDR      = 0x30340000,
--     * We will do a better implementation later. In particular some bits
+-    FSL_IMX7_IOMUXCn_SIZE         = 0x1000,
--     * cannot be written to.
++    /* GPV configuration */
--     */
++    FSL_IMX7_GPV6_ADDR            = 0x32600000,
--    s->ccm[index] = (uint32_t)value;
++    FSL_IMX7_GPV5_ADDR            = 0x32500000,
-+    s->ccm[index] = (s->ccm[index] & ccm_mask[index]) |
++    FSL_IMX7_GPV4_ADDR            = 0x32400000,
-+                           ((uint32_t)value & ~ccm_mask[index]);
++    FSL_IMX7_GPV3_ADDR            = 0x32300000,
 +    FSL_IMX7_GPV2_ADDR            = 0x32200000,
 +    FSL_IMX7_GPV1_ADDR            = 0x32100000,
 +    FSL_IMX7_GPV0_ADDR            = 0x32000000,
 +    FSL_IMX7_GPVn_SIZE            = (1 * MiB),
 -    FSL_IMX7_OCOTP_ADDR           = 0x30350000,
 -    FSL_IMX7_OCOTP_SIZE           = 0x10000,
 +    /* Arm Peripherals */
 +    FSL_IMX7_A7MPCORE_ADDR        = 0x31000000,
 -    FSL_IMX7_ANALOG_ADDR          = 0x30360000,
 -    FSL_IMX7_SNVS_ADDR            = 0x30370000,
 -    FSL_IMX7_CCM_ADDR             = 0x30380000,
 +    /* AIPS-3 Begin */
 -    FSL_IMX7_SRC_ADDR             = 0x30390000,
 -    FSL_IMX7_SRC_SIZE             = 0x1000,
 +    FSL_IMX7_ENET2_ADDR           = 0x30BF0000,
 +    FSL_IMX7_ENET1_ADDR           = 0x30BE0000,
 -    FSL_IMX7_ADC1_ADDR            = 0x30610000,
 -    FSL_IMX7_ADC2_ADDR            = 0x30620000,
 -    FSL_IMX7_ADCn_SIZE            = 0x1000,
 +    FSL_IMX7_SDMA_ADDR            = 0x30BD0000,
 +    FSL_IMX7_SDMA_SIZE            = (4 * KiB),
 -    FSL_IMX7_PWM1_ADDR            = 0x30660000,
 -    FSL_IMX7_PWM2_ADDR            = 0x30670000,
 -    FSL_IMX7_PWM3_ADDR            = 0x30680000,
 -    FSL_IMX7_PWM4_ADDR            = 0x30690000,
 -    FSL_IMX7_PWMn_SIZE            = 0x10000,
 +    FSL_IMX7_EIM_ADDR             = 0x30BC0000,
 +    FSL_IMX7_EIM_SIZE             = (4 * KiB),
 -    FSL_IMX7_PCIE_PHY_ADDR        = 0x306D0000,
 -    FSL_IMX7_PCIE_PHY_SIZE        = 0x10000,
 +    FSL_IMX7_QSPI_ADDR            = 0x30BB0000,
 +    FSL_IMX7_QSPI_SIZE            = 0x8000,
 -    FSL_IMX7_GPC_ADDR             = 0x303A0000,
 +    FSL_IMX7_SIM2_ADDR            = 0x30BA0000,
 +    FSL_IMX7_SIM1_ADDR            = 0x30B90000,
 +    FSL_IMX7_SIMn_SIZE            = (4 * KiB),
 +
 +    FSL_IMX7_USDHC3_ADDR          = 0x30B60000,
 +    FSL_IMX7_USDHC2_ADDR          = 0x30B50000,
 +    FSL_IMX7_USDHC1_ADDR          = 0x30B40000,
 +
 +    FSL_IMX7_USB3_ADDR            = 0x30B30000,
 +    FSL_IMX7_USBMISC3_ADDR        = 0x30B30200,
 +    FSL_IMX7_USB2_ADDR            = 0x30B20000,
 +    FSL_IMX7_USBMISC2_ADDR        = 0x30B20200,
 +    FSL_IMX7_USB1_ADDR            = 0x30B10000,
 +    FSL_IMX7_USBMISC1_ADDR        = 0x30B10200,
 +    FSL_IMX7_USBMISCn_SIZE        = 0x200,
 +
 +    FSL_IMX7_USB_PL301_ADDR       = 0x30AD0000,
 +    FSL_IMX7_USB_PL301_SIZE       = (64 * KiB),
 +
 +    FSL_IMX7_SEMAPHORE_HS_ADDR    = 0x30AC0000,
 +    FSL_IMX7_SEMAPHORE_HS_SIZE    = (64 * KiB),
 +
 +    FSL_IMX7_MUB_ADDR             = 0x30AB0000,
 +    FSL_IMX7_MUA_ADDR             = 0x30AA0000,
 +    FSL_IMX7_MUn_SIZE             = (KiB),
 +
 +    FSL_IMX7_UART7_ADDR           = 0x30A90000,
 +    FSL_IMX7_UART6_ADDR           = 0x30A80000,
 +    FSL_IMX7_UART5_ADDR           = 0x30A70000,
 +    FSL_IMX7_UART4_ADDR           = 0x30A60000,
 +
 +    FSL_IMX7_I2C4_ADDR            = 0x30A50000,
 +    FSL_IMX7_I2C3_ADDR            = 0x30A40000,
 +    FSL_IMX7_I2C2_ADDR            = 0x30A30000,
 +    FSL_IMX7_I2C1_ADDR            = 0x30A20000,
 +
 +    FSL_IMX7_CAN2_ADDR            = 0x30A10000,
 +    FSL_IMX7_CAN1_ADDR            = 0x30A00000,
 +    FSL_IMX7_CANn_SIZE            = (4 * KiB),
 +
 +    FSL_IMX7_AIPS3_CONF_ADDR      = 0x309F0000,
 +    FSL_IMX7_AIPS3_CONF_SIZE      = (64 * KiB),
      FSL_IMX7_CAAM_ADDR            = 0x30900000,
 -    FSL_IMX7_CAAM_SIZE            = 0x40000,
 +    FSL_IMX7_CAAM_SIZE            = (256 * KiB),
 -    FSL_IMX7_CAN1_ADDR            = 0x30A00000,
 -    FSL_IMX7_CAN2_ADDR            = 0x30A10000,
 -    FSL_IMX7_CANn_SIZE            = 0x10000,
 +    FSL_IMX7_SPBA_ADDR            = 0x308F0000,
 +    FSL_IMX7_SPBA_SIZE            = (4 * KiB),
 -    FSL_IMX7_I2C1_ADDR            = 0x30A20000,
 -    FSL_IMX7_I2C2_ADDR            = 0x30A30000,
 -    FSL_IMX7_I2C3_ADDR            = 0x30A40000,
 -    FSL_IMX7_I2C4_ADDR            = 0x30A50000,
 +    FSL_IMX7_SAI3_ADDR            = 0x308C0000,
 +    FSL_IMX7_SAI2_ADDR            = 0x308B0000,
 +    FSL_IMX7_SAI1_ADDR            = 0x308A0000,
 +    FSL_IMX7_SAIn_SIZE            = (4 * KiB),
 -    FSL_IMX7_ECSPI1_ADDR          = 0x30820000,
 -    FSL_IMX7_ECSPI2_ADDR          = 0x30830000,
 -    FSL_IMX7_ECSPI3_ADDR          = 0x30840000,
 -    FSL_IMX7_ECSPI4_ADDR          = 0x30630000,
 -
 -    FSL_IMX7_LCDIF_ADDR           = 0x30730000,
 -    FSL_IMX7_LCDIF_SIZE           = 0x1000,
 -
 -    FSL_IMX7_UART1_ADDR           = 0x30860000,
 +    FSL_IMX7_UART3_ADDR           = 0x30880000,
      /*
       * Some versions of the reference manual claim that UART2 is @
       * 0x30870000, but experiments with HW + DT files in upstream
@@ -XXX,XX +XXX,XX @@ enum FslIMX7MemoryMap {
       * actually located @ 0x30890000
       */
      FSL_IMX7_UART2_ADDR           = 0x30890000,
 -    FSL_IMX7_UART3_ADDR           = 0x30880000,
 -    FSL_IMX7_UART4_ADDR           = 0x30A60000,
 -    FSL_IMX7_UART5_ADDR           = 0x30A70000,
 -    FSL_IMX7_UART6_ADDR           = 0x30A80000,
 -    FSL_IMX7_UART7_ADDR           = 0x30A90000,
 +    FSL_IMX7_UART1_ADDR           = 0x30860000,
 -    FSL_IMX7_SAI1_ADDR            = 0x308A0000,
 -    FSL_IMX7_SAI2_ADDR            = 0x308B0000,
 -    FSL_IMX7_SAI3_ADDR            = 0x308C0000,
 -    FSL_IMX7_SAIn_SIZE            = 0x10000,
 +    FSL_IMX7_ECSPI3_ADDR          = 0x30840000,
 +    FSL_IMX7_ECSPI2_ADDR          = 0x30830000,
 +    FSL_IMX7_ECSPI1_ADDR          = 0x30820000,
 +    FSL_IMX7_ECSPIn_SIZE          = (4 * KiB),
 -    FSL_IMX7_ENET1_ADDR           = 0x30BE0000,
 -    FSL_IMX7_ENET2_ADDR           = 0x30BF0000,
 +    /* AIPS-3 End */
 -    FSL_IMX7_USB1_ADDR            = 0x30B10000,
 -    FSL_IMX7_USBMISC1_ADDR        = 0x30B10200,
 -    FSL_IMX7_USB2_ADDR            = 0x30B20000,
 -    FSL_IMX7_USBMISC2_ADDR        = 0x30B20200,
 -    FSL_IMX7_USB3_ADDR            = 0x30B30000,
 -    FSL_IMX7_USBMISC3_ADDR        = 0x30B30200,
 -    FSL_IMX7_USBMISCn_SIZE        = 0x200,
 +    /* AIPS-2 Begin */
 -    FSL_IMX7_USDHC1_ADDR          = 0x30B40000,
 -    FSL_IMX7_USDHC2_ADDR          = 0x30B50000,
 -    FSL_IMX7_USDHC3_ADDR          = 0x30B60000,
 +    FSL_IMX7_AXI_DEBUG_MON_ADDR   = 0x307E0000,
 +    FSL_IMX7_AXI_DEBUG_MON_SIZE   = (64 * KiB),
 -    FSL_IMX7_SDMA_ADDR            = 0x30BD0000,
 -    FSL_IMX7_SDMA_SIZE            = 0x1000,
 +    FSL_IMX7_PERFMON2_ADDR        = 0x307D0000,
 +    FSL_IMX7_PERFMON1_ADDR        = 0x307C0000,
 +    FSL_IMX7_PERFMONn_SIZE        = (64 * KiB),
 +
 +    FSL_IMX7_DDRC_ADDR            = 0x307A0000,
 +    FSL_IMX7_DDRC_SIZE            = (4 * KiB),
 +
 +    FSL_IMX7_DDRC_PHY_ADDR        = 0x30790000,
 +    FSL_IMX7_DDRC_PHY_SIZE        = (4 * KiB),
 +
 +    FSL_IMX7_TZASC_ADDR           = 0x30780000,
 +    FSL_IMX7_TZASC_SIZE           = (64 * KiB),
 +
 +    FSL_IMX7_MIPI_DSI_ADDR        = 0x30760000,
 +    FSL_IMX7_MIPI_DSI_SIZE        = (4 * KiB),
 +
 +    FSL_IMX7_MIPI_CSI_ADDR        = 0x30750000,
 +    FSL_IMX7_MIPI_CSI_SIZE        = 0x4000,
 +
 +    FSL_IMX7_LCDIF_ADDR           = 0x30730000,
 +    FSL_IMX7_LCDIF_SIZE           = 0x8000,
 +
 +    FSL_IMX7_CSI_ADDR             = 0x30710000,
 +    FSL_IMX7_CSI_SIZE             = (4 * KiB),
 +
 +    FSL_IMX7_PXP_ADDR             = 0x30700000,
 +    FSL_IMX7_PXP_SIZE             = 0x4000,
 +
 +    FSL_IMX7_EPDC_ADDR            = 0x306F0000,
 +    FSL_IMX7_EPDC_SIZE            = (4 * KiB),
 +
 +    FSL_IMX7_PCIE_PHY_ADDR        = 0x306D0000,
 +    FSL_IMX7_PCIE_PHY_SIZE        = (4 * KiB),
 +
 +    FSL_IMX7_SYSCNT_CTRL_ADDR     = 0x306C0000,
 +    FSL_IMX7_SYSCNT_CMP_ADDR      = 0x306B0000,
 +    FSL_IMX7_SYSCNT_RD_ADDR       = 0x306A0000,
 +
 +    FSL_IMX7_PWM4_ADDR            = 0x30690000,
 +    FSL_IMX7_PWM3_ADDR            = 0x30680000,
 +    FSL_IMX7_PWM2_ADDR            = 0x30670000,
 +    FSL_IMX7_PWM1_ADDR            = 0x30660000,
 +    FSL_IMX7_PWMn_SIZE            = (4 * KiB),
 +
 +    FSL_IMX7_FlEXTIMER2_ADDR      = 0x30650000,
 +    FSL_IMX7_FlEXTIMER1_ADDR      = 0x30640000,
 +    FSL_IMX7_FLEXTIMERn_SIZE      = (4 * KiB),
 +
 +    FSL_IMX7_ECSPI4_ADDR          = 0x30630000,
 +
 +    FSL_IMX7_ADC2_ADDR            = 0x30620000,
 +    FSL_IMX7_ADC1_ADDR            = 0x30610000,
 +    FSL_IMX7_ADCn_SIZE            = (4 * KiB),
 +
 +    FSL_IMX7_AIPS2_CONF_ADDR      = 0x305F0000,
 +    FSL_IMX7_AIPS2_CONF_SIZE      = (64 * KiB),
 +
 +    /* AIPS-2 End */
 +
 +    /* AIPS-1 Begin */
 +
 +    FSL_IMX7_CSU_ADDR             = 0x303E0000,
 +    FSL_IMX7_CSU_SIZE             = (64 * KiB),
 +
 +    FSL_IMX7_RDC_ADDR             = 0x303D0000,
 +    FSL_IMX7_RDC_SIZE             = (4 * KiB),
 +
 +    FSL_IMX7_SEMAPHORE2_ADDR      = 0x303C0000,
 +    FSL_IMX7_SEMAPHORE1_ADDR      = 0x303B0000,
 +    FSL_IMX7_SEMAPHOREn_SIZE      = (4 * KiB),
 +
 +    FSL_IMX7_GPC_ADDR             = 0x303A0000,
 +
 +    FSL_IMX7_SRC_ADDR             = 0x30390000,
 +    FSL_IMX7_SRC_SIZE             = (4 * KiB),
 +
 +    FSL_IMX7_CCM_ADDR             = 0x30380000,
 +
 +    FSL_IMX7_SNVS_HP_ADDR         = 0x30370000,
 +
 +    FSL_IMX7_ANALOG_ADDR          = 0x30360000,
 +
 +    FSL_IMX7_OCOTP_ADDR           = 0x30350000,
 +    FSL_IMX7_OCOTP_SIZE           = 0x10000,
 +
 +    FSL_IMX7_IOMUXC_GPR_ADDR      = 0x30340000,
 +    FSL_IMX7_IOMUXC_GPR_SIZE      = (4 * KiB),
 +
 +    FSL_IMX7_IOMUXC_ADDR          = 0x30330000,
 +    FSL_IMX7_IOMUXC_SIZE          = (4 * KiB),
 +
 +    FSL_IMX7_KPP_ADDR             = 0x30320000,
 +    FSL_IMX7_KPP_SIZE             = (4 * KiB),
 +
 +    FSL_IMX7_ROMCP_ADDR           = 0x30310000,
 +    FSL_IMX7_ROMCP_SIZE           = (4 * KiB),
 +
 +    FSL_IMX7_GPT4_ADDR            = 0x30300000,
 +    FSL_IMX7_GPT3_ADDR            = 0x302F0000,
 +    FSL_IMX7_GPT2_ADDR            = 0x302E0000,
 +    FSL_IMX7_GPT1_ADDR            = 0x302D0000,
 +
 +    FSL_IMX7_IOMUXC_LPSR_ADDR     = 0x302C0000,
 +    FSL_IMX7_IOMUXC_LPSR_SIZE     = (4 * KiB),
 +
 +    FSL_IMX7_WDOG4_ADDR           = 0x302B0000,
 +    FSL_IMX7_WDOG3_ADDR           = 0x302A0000,
 +    FSL_IMX7_WDOG2_ADDR           = 0x30290000,
 +    FSL_IMX7_WDOG1_ADDR           = 0x30280000,
 +
 +    FSL_IMX7_IOMUXC_LPSR_GPR_ADDR = 0x30270000,
 +
 +    FSL_IMX7_GPIO7_ADDR           = 0x30260000,
 +    FSL_IMX7_GPIO6_ADDR           = 0x30250000,
 +    FSL_IMX7_GPIO5_ADDR           = 0x30240000,
 +    FSL_IMX7_GPIO4_ADDR           = 0x30230000,
 +    FSL_IMX7_GPIO3_ADDR           = 0x30220000,
 +    FSL_IMX7_GPIO2_ADDR           = 0x30210000,
 +    FSL_IMX7_GPIO1_ADDR           = 0x30200000,
 +
 +    FSL_IMX7_AIPS1_CONF_ADDR      = 0x301F0000,
 +    FSL_IMX7_AIPS1_CONF_SIZE      = (64 * KiB),
 -    FSL_IMX7_A7MPCORE_ADDR        = 0x31000000,
      FSL_IMX7_A7MPCORE_DAP_ADDR    = 0x30000000,
 +    FSL_IMX7_A7MPCORE_DAP_SIZE    = (1 * MiB),
 -    FSL_IMX7_PCIE_REG_ADDR        = 0x33800000,
 -    FSL_IMX7_PCIE_REG_SIZE        = 16 * 1024,
 +    /* AIPS-1 End */
 -    FSL_IMX7_GPR_ADDR             = 0x30340000,
 +    FSL_IMX7_EIM_CS0_ADDR         = 0x28000000,
 +    FSL_IMX7_EIM_CS0_SIZE         = (128 * MiB),
 -    FSL_IMX7_DMA_APBH_ADDR        = 0x33000000,
 -    FSL_IMX7_DMA_APBH_SIZE        = 0x2000,
 +    FSL_IMX7_OCRAM_PXP_ADDR       = 0x00940000,
 +    FSL_IMX7_OCRAM_PXP_SIZE       = (32 * KiB),
 +
 +    FSL_IMX7_OCRAM_EPDC_ADDR      = 0x00920000,
 +    FSL_IMX7_OCRAM_EPDC_SIZE      = (128 * KiB),
 +
 +    FSL_IMX7_OCRAM_MEM_ADDR       = 0x00900000,
 +    FSL_IMX7_OCRAM_MEM_SIZE       = (128 * KiB),
 +
 +    FSL_IMX7_TCMU_ADDR            = 0x00800000,
 +    FSL_IMX7_TCMU_SIZE            = (32 * KiB),
 +
 +    FSL_IMX7_TCML_ADDR            = 0x007F8000,
 +    FSL_IMX7_TCML_SIZE            = (32 * KiB),
 +
 +    FSL_IMX7_OCRAM_S_ADDR         = 0x00180000,
 +    FSL_IMX7_OCRAM_S_SIZE         = (32 * KiB),
 +
 +    FSL_IMX7_CAAM_MEM_ADDR        = 0x00100000,
 +    FSL_IMX7_CAAM_MEM_SIZE        = (32 * KiB),
 +
 +    FSL_IMX7_ROM_ADDR             = 0x00000000,
 +    FSL_IMX7_ROM_SIZE             = (96 * KiB),
  };
  enum FslIMX7IRQs {
 diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/fsl-imx7.c
 +++ b/hw/arm/fsl-imx7.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_init(Object *obj)
      char name[NAME_SIZE];
      int i;
 +    /*
 +     * CPUs
 +     */
      for (i = 0; i < MIN(ms->smp.cpus, FSL_IMX7_NUM_CPUS); i++) {
          snprintf(name, NAME_SIZE, "cpu%d", i);
          object_initialize_child(obj, name, &s->cpu[i],
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_init(Object *obj)
                              TYPE_A15MPCORE_PRIV);
      /*
 -     * GPIOs 1 to 7
 +     * GPIOs
       */
      for (i = 0; i < FSL_IMX7_NUM_GPIOS; i++) {
          snprintf(name, NAME_SIZE, "gpio%d", i);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_init(Object *obj)
      }
      /*
 -     * GPT1, 2, 3, 4
 +     * GPTs
       */
      for (i = 0; i < FSL_IMX7_NUM_GPTS; i++) {
          snprintf(name, NAME_SIZE, "gpt%d", i);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_init(Object *obj)
       */
      object_initialize_child(obj, "gpcv2", &s->gpcv2, TYPE_IMX_GPCV2);
 +    /*
 +     * ECSPIs
 +     */
      for (i = 0; i < FSL_IMX7_NUM_ECSPIS; i++) {
          snprintf(name, NAME_SIZE, "spi%d", i + 1);
          object_initialize_child(obj, name, &s->spi[i], TYPE_IMX_SPI);
      }
 -
 +    /*
 +     * I2Cs
 +     */
      for (i = 0; i < FSL_IMX7_NUM_I2CS; i++) {
          snprintf(name, NAME_SIZE, "i2c%d", i + 1);
          object_initialize_child(obj, name, &s->i2c[i], TYPE_IMX_I2C);
      }
      /*
 -     * UART
 +     * UARTs
       */
      for (i = 0; i < FSL_IMX7_NUM_UARTS; i++) {
              snprintf(name, NAME_SIZE, "uart%d", i);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_init(Object *obj)
      }
      /*
 -     * Ethernet
 +     * Ethernets
       */
      for (i = 0; i < FSL_IMX7_NUM_ETHS; i++) {
              snprintf(name, NAME_SIZE, "eth%d", i);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_init(Object *obj)
      }
      /*
 -     * SDHCI
 +     * SDHCIs
       */
      for (i = 0; i < FSL_IMX7_NUM_USDHCS; i++) {
              snprintf(name, NAME_SIZE, "usdhc%d", i);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_init(Object *obj)
      object_initialize_child(obj, "snvs", &s->snvs, TYPE_IMX7_SNVS);
      /*
 -     * Watchdog
 +     * Watchdogs
       */
      for (i = 0; i < FSL_IMX7_NUM_WDTS; i++) {
              snprintf(name, NAME_SIZE, "wdt%d", i);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_init(Object *obj)
       */
      object_initialize_child(obj, "gpr", &s->gpr, TYPE_IMX7_GPR);
 +    /*
 +     * PCIE
 +     */
      object_initialize_child(obj, "pcie", &s->pcie, TYPE_DESIGNWARE_PCIE_HOST);
 +    /*
 +     * USBs
 +     */
      for (i = 0; i < FSL_IMX7_NUM_USBS; i++) {
          snprintf(name, NAME_SIZE, "usb%d", i);
          object_initialize_child(obj, name, &s->usb[i], TYPE_CHIPIDEA);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
          return;
      }
 +    /*
 +     * CPUs
 +     */
      for (i = 0; i < smp_cpus; i++) {
          o = OBJECT(&s->cpu[i]);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
       * A7MPCORE DAP
       */
      create_unimplemented_device("a7mpcore-dap", FSL_IMX7_A7MPCORE_DAP_ADDR,
 -                                0x100000);
 +                                FSL_IMX7_A7MPCORE_DAP_SIZE);
      /*
 -     * GPT1, 2, 3, 4
 +     * GPTs
       */
      for (i = 0; i < FSL_IMX7_NUM_GPTS; i++) {
          static const hwaddr FSL_IMX7_GPTn_ADDR[FSL_IMX7_NUM_GPTS] = {
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
                                              FSL_IMX7_GPTn_IRQ[i]));
      }
 +    /*
 +     * GPIOs
 +     */
      for (i = 0; i < FSL_IMX7_NUM_GPIOS; i++) {
          static const hwaddr FSL_IMX7_GPIOn_ADDR[FSL_IMX7_NUM_GPIOS] = {
              FSL_IMX7_GPIO1_ADDR,
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      /*
       * IOMUXC and IOMUXC_LPSR
       */
 -    for (i = 0; i < FSL_IMX7_NUM_IOMUXCS; i++) {
 -        static const hwaddr FSL_IMX7_IOMUXCn_ADDR[FSL_IMX7_NUM_IOMUXCS] = {
 -            FSL_IMX7_IOMUXC_ADDR,
 -            FSL_IMX7_IOMUXC_LPSR_ADDR,
 -        };
 -
 -        snprintf(name, NAME_SIZE, "iomuxc%d", i);
 -        create_unimplemented_device(name, FSL_IMX7_IOMUXCn_ADDR[i],
 -                                    FSL_IMX7_IOMUXCn_SIZE);
 -    }
 +    create_unimplemented_device("iomuxc", FSL_IMX7_IOMUXC_ADDR,
 +                                FSL_IMX7_IOMUXC_SIZE);
 +    create_unimplemented_device("iomuxc_lspr", FSL_IMX7_IOMUXC_LPSR_ADDR,
 +                                FSL_IMX7_IOMUXC_LPSR_SIZE);
      /*
       * CCM
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      sysbus_realize(SYS_BUS_DEVICE(&s->gpcv2), &error_abort);
      sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpcv2), 0, FSL_IMX7_GPC_ADDR);
 -    /* Initialize all ECSPI */
 +    /*
 +     * ECSPIs
 +     */
      for (i = 0; i < FSL_IMX7_NUM_ECSPIS; i++) {
          static const hwaddr FSL_IMX7_SPIn_ADDR[FSL_IMX7_NUM_ECSPIS] = {
              FSL_IMX7_ECSPI1_ADDR,
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
                                              FSL_IMX7_SPIn_IRQ[i]));
      }
 +    /*
 +     * I2Cs
 +     */
      for (i = 0; i < FSL_IMX7_NUM_I2CS; i++) {
          static const hwaddr FSL_IMX7_I2Cn_ADDR[FSL_IMX7_NUM_I2CS] = {
              FSL_IMX7_I2C1_ADDR,
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      }
      /*
 -     * UART
 +     * UARTs
       */
      for (i = 0; i < FSL_IMX7_NUM_UARTS; i++) {
          static const hwaddr FSL_IMX7_UARTn_ADDR[FSL_IMX7_NUM_UARTS] = {
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      }
      /*
 -     * Ethernet
 +     * Ethernets
       *
       * We must use two loops since phy_connected affects the other interface
       * and we have to set all properties before calling sysbus_realize().
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      }
      /*
 -     * USDHC
 +     * USDHCs
       */
      for (i = 0; i < FSL_IMX7_NUM_USDHCS; i++) {
          static const hwaddr FSL_IMX7_USDHCn_ADDR[FSL_IMX7_NUM_USDHCS] = {
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
       * SNVS
       */
      sysbus_realize(SYS_BUS_DEVICE(&s->snvs), &error_abort);
 -    sysbus_mmio_map(SYS_BUS_DEVICE(&s->snvs), 0, FSL_IMX7_SNVS_ADDR);
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&s->snvs), 0, FSL_IMX7_SNVS_HP_ADDR);
      /*
       * SRC
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      create_unimplemented_device("src", FSL_IMX7_SRC_ADDR, FSL_IMX7_SRC_SIZE);
      /*
 -     * Watchdog
 +     * Watchdogs
       */
      for (i = 0; i < FSL_IMX7_NUM_WDTS; i++) {
          static const hwaddr FSL_IMX7_WDOGn_ADDR[FSL_IMX7_NUM_WDTS] = {
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      create_unimplemented_device("caam", FSL_IMX7_CAAM_ADDR, FSL_IMX7_CAAM_SIZE);
      /*
 -     * PWM
 +     * PWMs
       */
 -    create_unimplemented_device("pwm1", FSL_IMX7_PWM1_ADDR, FSL_IMX7_PWMn_SIZE);
 -    create_unimplemented_device("pwm2", FSL_IMX7_PWM2_ADDR, FSL_IMX7_PWMn_SIZE);
 -    create_unimplemented_device("pwm3", FSL_IMX7_PWM3_ADDR, FSL_IMX7_PWMn_SIZE);
 -    create_unimplemented_device("pwm4", FSL_IMX7_PWM4_ADDR, FSL_IMX7_PWMn_SIZE);
 +    for (i = 0; i < FSL_IMX7_NUM_PWMS; i++) {
 +        static const hwaddr FSL_IMX7_PWMn_ADDR[FSL_IMX7_NUM_PWMS] = {
 +            FSL_IMX7_PWM1_ADDR,
 +            FSL_IMX7_PWM2_ADDR,
 +            FSL_IMX7_PWM3_ADDR,
 +            FSL_IMX7_PWM4_ADDR,
 +        };
 +
 +        snprintf(name, NAME_SIZE, "pwm%d", i);
 +        create_unimplemented_device(name, FSL_IMX7_PWMn_ADDR[i],
 +                                    FSL_IMX7_PWMn_SIZE);
 +    }
      /*
 -     * CAN
 +     * CANs
       */
 -    create_unimplemented_device("can1", FSL_IMX7_CAN1_ADDR, FSL_IMX7_CANn_SIZE);
 -    create_unimplemented_device("can2", FSL_IMX7_CAN2_ADDR, FSL_IMX7_CANn_SIZE);
 +    for (i = 0; i < FSL_IMX7_NUM_CANS; i++) {
 +        static const hwaddr FSL_IMX7_CANn_ADDR[FSL_IMX7_NUM_CANS] = {
 +            FSL_IMX7_CAN1_ADDR,
 +            FSL_IMX7_CAN2_ADDR,
 +        };
 +
 +        snprintf(name, NAME_SIZE, "can%d", i);
 +        create_unimplemented_device(name, FSL_IMX7_CANn_ADDR[i],
 +                                    FSL_IMX7_CANn_SIZE);
 +    }
      /*
 -     * SAI (Audio SSI (Synchronous Serial Interface))
 +     * SAIs (Audio SSI (Synchronous Serial Interface))
       */
 -    create_unimplemented_device("sai1", FSL_IMX7_SAI1_ADDR, FSL_IMX7_SAIn_SIZE);
 -    create_unimplemented_device("sai2", FSL_IMX7_SAI2_ADDR, FSL_IMX7_SAIn_SIZE);
 -    create_unimplemented_device("sai2", FSL_IMX7_SAI3_ADDR, FSL_IMX7_SAIn_SIZE);
 +    for (i = 0; i < FSL_IMX7_NUM_SAIS; i++) {
 +        static const hwaddr FSL_IMX7_SAIn_ADDR[FSL_IMX7_NUM_SAIS] = {
 +            FSL_IMX7_SAI1_ADDR,
 +            FSL_IMX7_SAI2_ADDR,
 +            FSL_IMX7_SAI3_ADDR,
 +        };
 +
 +        snprintf(name, NAME_SIZE, "sai%d", i);
 +        create_unimplemented_device(name, FSL_IMX7_SAIn_ADDR[i],
 +                                    FSL_IMX7_SAIn_SIZE);
 +    }
      /*
       * OCOTP
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      create_unimplemented_device("ocotp", FSL_IMX7_OCOTP_ADDR,
                                  FSL_IMX7_OCOTP_SIZE);
 +    /*
 +     * GPR
 +     */
      sysbus_realize(SYS_BUS_DEVICE(&s->gpr), &error_abort);
 -    sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpr), 0, FSL_IMX7_GPR_ADDR);
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpr), 0, FSL_IMX7_IOMUXC_GPR_ADDR);
 +    /*
 +     * PCIE
 +     */
      sysbus_realize(SYS_BUS_DEVICE(&s->pcie), &error_abort);
      sysbus_mmio_map(SYS_BUS_DEVICE(&s->pcie), 0, FSL_IMX7_PCIE_REG_ADDR);
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      irq = qdev_get_gpio_in(DEVICE(&s->a7mpcore), FSL_IMX7_PCI_INTD_IRQ);
      sysbus_connect_irq(SYS_BUS_DEVICE(&s->pcie), 3, irq);
 -
 +    /*
 +     * USBs
 +     */
      for (i = 0; i < FSL_IMX7_NUM_USBS; i++) {
          static const hwaddr FSL_IMX7_USBMISCn_ADDR[FSL_IMX7_NUM_USBS] = {
              FSL_IMX7_USBMISC1_ADDR,
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
       */
      create_unimplemented_device("pcie-phy", FSL_IMX7_PCIE_PHY_ADDR,
                                  FSL_IMX7_PCIE_PHY_SIZE);
 +
  }
- static uint64_t imx6ul_analog_read(void *opaque, hwaddr offset, unsigned size)
+ static Property fsl_imx7_properties[] = {
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
           * the REG_NAME register. So we change the value of the
           * REG_NAME register, setting bits passed in the value.
           */
 -        s->analog[index - 1] |= value;
 +        s->analog[index - 1] |= (value & ~analog_mask[index - 1]);
          break;
      case CCM_ANALOG_PLL_ARM_CLR:
      case CCM_ANALOG_PLL_USB1_CLR:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
           * the REG_NAME register. So we change the value of the
           * REG_NAME register, unsetting bits passed in the value.
           */
 -        s->analog[index - 2] &= ~value;
 +        s->analog[index - 2] &= ~(value & ~analog_mask[index - 2]);
          break;
      case CCM_ANALOG_PLL_ARM_TOG:
      case CCM_ANALOG_PLL_USB1_TOG:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
           * the REG_NAME register. So we change the value of the
           * REG_NAME register, toggling bits passed in the value.
           */
 -        s->analog[index - 3] ^= value;
 +        s->analog[index - 3] ^= (value & ~analog_mask[index - 3]);
          break;
      default:
 -        /*
 -         * We will do a better implementation later. In particular some bits
 -         * cannot be written to.
 -         */
 -        s->analog[index] = value;
 +        s->analog[index] = (s->analog[index] & analog_mask[index]) |
 +                           (value & ~analog_mask[index]);
          break;
      }
  }
 --
-.20.1
+.34.1

-[PULL 21/23] hw/net/imx_fec: Convert debug fprintf() to trace events
+[PULL 15/24] Add i.MX7 missing TZ devices and memory regions
 From: Jean-Christophe Dubois <jcd@tribudubois.net>
+* Add TZASC as unimplemented device.
+  - Allow bare metal application to access this (unimplemented) device
+* Add CSU as unimplemented device.
+  - Allow bare metal application to access this (unimplemented) device
+* Add various memory segments
+  - OCRAM
+  - OCRAM EPDC
+  - OCRAM PXP
+  - OCRAM S
+  - ROM
+  - CAAM
 Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: f887a3483996ba06d40bd62ffdfb0ecf68621987.1692964892.git.jcd@tribudubois.net
 [PMD: Fixed 32-bit format string using PRIx32/PRIx64]
 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/net/imx_fec.c    | 106 +++++++++++++++++++-------------------------
+ include/hw/arm/fsl-imx7.h |  7 +++++
- hw/net/trace-events |  18 ++++++++
+ hw/arm/fsl-imx7.c         | 63 +++++++++++++++++++++++++++++++++++++++
-files changed, 63 insertions(+), 61 deletions(-)
+files changed, 70 insertions(+)
-diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
+diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/imx_fec.c
+--- a/include/hw/arm/fsl-imx7.h
-+++ b/hw/net/imx_fec.c
++++ b/include/hw/arm/fsl-imx7.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ struct FslIMX7State {
- #include "qemu/module.h"
+     IMX7GPRState       gpr;
- #include "net/checksum.h"
+     ChipideaState      usb[FSL_IMX7_NUM_USBS];
- #include "net/eth.h"
+     DesignwarePCIEHost pcie;
-+#include "trace.h"
++    MemoryRegion       rom;
++    MemoryRegion       caam;
- /* For crc32 */
++    MemoryRegion       ocram;
- #include <zlib.h>
++    MemoryRegion       ocram_epdc;
++    MemoryRegion       ocram_pxp;
--#ifndef DEBUG_IMX_FEC
++    MemoryRegion       ocram_s;
--#define DEBUG_IMX_FEC 0
++
--#endif
+     uint32_t           phy_num[FSL_IMX7_NUM_ETHS];
--
+     bool               phy_connected[FSL_IMX7_NUM_ETHS];
--#define FEC_PRINTF(fmt, args...) \
+ };
--    do { \
+diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
--        if (DEBUG_IMX_FEC) { \
+index XXXXXXX..XXXXXXX 100644
--            fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_FEC, \
+--- a/hw/arm/fsl-imx7.c
--                                             __func__, ##args); \
++++ b/hw/arm/fsl-imx7.c
--        } \
+@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
--    } while (0)
+     create_unimplemented_device("pcie-phy", FSL_IMX7_PCIE_PHY_ADDR,
--
+                                 FSL_IMX7_PCIE_PHY_SIZE);
--#ifndef DEBUG_IMX_PHY
--#define DEBUG_IMX_PHY 0
++    /*
--#endif
++     * CSU
--
++     */
--#define PHY_PRINTF(fmt, args...) \
++    create_unimplemented_device("csu", FSL_IMX7_CSU_ADDR,
--    do { \
++                                FSL_IMX7_CSU_SIZE);
--        if (DEBUG_IMX_PHY) { \
++
--            fprintf(stderr, "[%s.phy]%s: " fmt , TYPE_IMX_FEC, \
++    /*
--                                                 __func__, ##args); \
++     * TZASC
--        } \
++     */
--    } while (0)
++    create_unimplemented_device("tzasc", FSL_IMX7_TZASC_ADDR,
--
++                                FSL_IMX7_TZASC_SIZE);
- #define IMX_MAX_DESC    1024
++
++    /*
- static const char *imx_default_reg_name(IMXFECState *s, uint32_t index)
++     * OCRAM memory
-@@ -XXX,XX +XXX,XX @@ static void imx_eth_update(IMXFECState *s);
++     */
-  * For now we don't handle any GPIO/interrupt line, so the OS will
++    memory_region_init_ram(&s->ocram, NULL, "imx7.ocram",
-  * have to poll for the PHY status.
++                           FSL_IMX7_OCRAM_MEM_SIZE,
-  */
++                           &error_abort);
--static void phy_update_irq(IMXFECState *s)
++    memory_region_add_subregion(get_system_memory(), FSL_IMX7_OCRAM_MEM_ADDR,
-+static void imx_phy_update_irq(IMXFECState *s)
++                                &s->ocram);
- {
++
-     imx_eth_update(s);
++    /*
 +     * OCRAM EPDC memory
 +     */
 +    memory_region_init_ram(&s->ocram_epdc, NULL, "imx7.ocram_epdc",
 +                           FSL_IMX7_OCRAM_EPDC_SIZE,
 +                           &error_abort);
 +    memory_region_add_subregion(get_system_memory(), FSL_IMX7_OCRAM_EPDC_ADDR,
 +                                &s->ocram_epdc);
 +
 +    /*
 +     * OCRAM PXP memory
 +     */
 +    memory_region_init_ram(&s->ocram_pxp, NULL, "imx7.ocram_pxp",
 +                           FSL_IMX7_OCRAM_PXP_SIZE,
 +                           &error_abort);
 +    memory_region_add_subregion(get_system_memory(), FSL_IMX7_OCRAM_PXP_ADDR,
 +                                &s->ocram_pxp);
 +
 +    /*
 +     * OCRAM_S memory
 +     */
 +    memory_region_init_ram(&s->ocram_s, NULL, "imx7.ocram_s",
 +                           FSL_IMX7_OCRAM_S_SIZE,
 +                           &error_abort);
 +    memory_region_add_subregion(get_system_memory(), FSL_IMX7_OCRAM_S_ADDR,
 +                                &s->ocram_s);
 +
 +    /*
 +     * ROM memory
 +     */
 +    memory_region_init_rom(&s->rom, OBJECT(dev), "imx7.rom",
 +                           FSL_IMX7_ROM_SIZE, &error_abort);
 +    memory_region_add_subregion(get_system_memory(), FSL_IMX7_ROM_ADDR,
 +                                &s->rom);
 +
 +    /*
 +     * CAAM memory
 +     */
 +    memory_region_init_rom(&s->caam, OBJECT(dev), "imx7.caam",
 +                           FSL_IMX7_CAAM_MEM_SIZE, &error_abort);
 +    memory_region_add_subregion(get_system_memory(), FSL_IMX7_CAAM_MEM_ADDR,
 +                                &s->caam);
  }
--static void phy_update_link(IMXFECState *s)
+ static Property fsl_imx7_properties[] = {
 +static void imx_phy_update_link(IMXFECState *s)
  {
      /* Autonegotiation status mirrors link status.  */
      if (qemu_get_queue(s->nic)->link_down) {
 -        PHY_PRINTF("link is down\n");
 +        trace_imx_phy_update_link("down");
          s->phy_status &= ~0x0024;
          s->phy_int |= PHY_INT_DOWN;
      } else {
 -        PHY_PRINTF("link is up\n");
 +        trace_imx_phy_update_link("up");
          s->phy_status |= 0x0024;
          s->phy_int |= PHY_INT_ENERGYON;
          s->phy_int |= PHY_INT_AUTONEG_COMPLETE;
      }
 -    phy_update_irq(s);
 +    imx_phy_update_irq(s);
  }
  static void imx_eth_set_link(NetClientState *nc)
  {
 -    phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
 +    imx_phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
  }
 -static void phy_reset(IMXFECState *s)
 +static void imx_phy_reset(IMXFECState *s)
  {
 +    trace_imx_phy_reset();
 +
      s->phy_status = 0x7809;
      s->phy_control = 0x3000;
      s->phy_advertise = 0x01e1;
      s->phy_int_mask = 0;
      s->phy_int = 0;
 -    phy_update_link(s);
 +    imx_phy_update_link(s);
  }
 -static uint32_t do_phy_read(IMXFECState *s, int reg)
 +static uint32_t imx_phy_read(IMXFECState *s, int reg)
  {
      uint32_t val;
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
      case 29:    /* Interrupt source.  */
          val = s->phy_int;
          s->phy_int = 0;
 -        phy_update_irq(s);
 +        imx_phy_update_irq(s);
          break;
      case 30:    /* Interrupt mask */
          val = s->phy_int_mask;
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
          break;
      }
 -    PHY_PRINTF("read 0x%04x @ %d\n", val, reg);
 +    trace_imx_phy_read(val, reg);
      return val;
  }
 -static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
 +static void imx_phy_write(IMXFECState *s, int reg, uint32_t val)
  {
 -    PHY_PRINTF("write 0x%04x @ %d\n", val, reg);
 +    trace_imx_phy_write(val, reg);
      if (reg > 31) {
          /* we only advertise one phy */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
      switch (reg) {
      case 0:     /* Basic Control */
          if (val & 0x8000) {
 -            phy_reset(s);
 +            imx_phy_reset(s);
          } else {
              s->phy_control = val & 0x7980;
              /* Complete autonegotiation immediately.  */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
          break;
      case 30:    /* Interrupt mask */
          s->phy_int_mask = val & 0xff;
 -        phy_update_irq(s);
 +        imx_phy_update_irq(s);
          break;
      case 17:
      case 18:
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
  static void imx_fec_read_bd(IMXFECBufDesc *bd, dma_addr_t addr)
  {
      dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
 +
 +    trace_imx_fec_read_bd(addr, bd->flags, bd->length, bd->data);
  }
  static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
  static void imx_enet_read_bd(IMXENETBufDesc *bd, dma_addr_t addr)
  {
      dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
 +
 +    trace_imx_enet_read_bd(addr, bd->flags, bd->length, bd->data,
 +                   bd->option, bd->status);
  }
  static void imx_enet_write_bd(IMXENETBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_do_tx(IMXFECState *s)
          int len;
          imx_fec_read_bd(&bd, addr);
 -        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x\n",
 -                   addr, bd.flags, bd.length, bd.data);
          if ((bd.flags & ENET_BD_R) == 0) {
 +
              /* Run out of descriptors to transmit.  */
 -            FEC_PRINTF("tx_bd ran out of descriptors to transmit\n");
 +            trace_imx_eth_tx_bd_busy();
 +
              break;
          }
          len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_enet_do_tx(IMXFECState *s, uint32_t index)
          int len;
          imx_enet_read_bd(&bd, addr);
 -        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x option %04x "
 -                   "status %04x\n", addr, bd.flags, bd.length, bd.data,
 -                   bd.option, bd.status);
          if ((bd.flags & ENET_BD_R) == 0) {
              /* Run out of descriptors to transmit.  */
 +
 +            trace_imx_eth_tx_bd_busy();
 +
              break;
          }
          len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_eth_enable_rx(IMXFECState *s, bool flush)
      s->regs[ENET_RDAR] = (bd.flags & ENET_BD_E) ? ENET_RDAR_RDAR : 0;
      if (!s->regs[ENET_RDAR]) {
 -        FEC_PRINTF("RX buffer full\n");
 +        trace_imx_eth_rx_bd_full();
      } else if (flush) {
          qemu_flush_queued_packets(qemu_get_queue(s->nic));
      }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_reset(DeviceState *d)
      memset(s->tx_descriptor, 0, sizeof(s->tx_descriptor));
      /* We also reset the PHY */
 -    phy_reset(s);
 +    imx_phy_reset(s);
  }
  static uint32_t imx_default_read(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_eth_read(void *opaque, hwaddr offset, unsigned size)
          break;
      }
 -    FEC_PRINTF("reg[%s] => 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
 -                                              value);
 +    trace_imx_eth_read(index, imx_eth_reg_name(s, index), value);
      return value;
  }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
      const bool single_tx_ring = !imx_eth_is_multi_tx_ring(s);
      uint32_t index = offset >> 2;
 -    FEC_PRINTF("reg[%s] <= 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
 -                (uint32_t)value);
 +    trace_imx_eth_write(index, imx_eth_reg_name(s, index), value);
      switch (index) {
      case ENET_EIR:
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
          if (extract32(value, 29, 1)) {
              /* This is a read operation */
              s->regs[ENET_MMFR] = deposit32(s->regs[ENET_MMFR], 0, 16,
 -                                           do_phy_read(s,
 +                                           imx_phy_read(s,
                                                         extract32(value,
 , 10)));
          } else {
              /* This a write operation */
 -            do_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
 +            imx_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
          }
          /* raise the interrupt as the PHY operation is done */
          s->regs[ENET_EIR] |= ENET_INT_MII;
@@ -XXX,XX +XXX,XX @@ static bool imx_eth_can_receive(NetClientState *nc)
  {
      IMXFECState *s = IMX_FEC(qemu_get_nic_opaque(nc));
 -    FEC_PRINTF("\n");
 -
      return !!s->regs[ENET_RDAR];
  }
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
      unsigned int buf_len;
      size_t size = len;
 -    FEC_PRINTF("len %d\n", (int)size);
 +    trace_imx_fec_receive(size);
      if (!s->regs[ENET_RDAR]) {
          qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
          bd.length = buf_len;
          size -= buf_len;
 -        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
 +        trace_imx_fec_receive_len(addr, bd.length);
          /* The last 4 bytes are the CRC.  */
          if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
          if (size == 0) {
              /* Last buffer in frame.  */
              bd.flags |= flags | ENET_BD_L;
 -            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
 +
 +            trace_imx_fec_receive_last(bd.flags);
 +
              s->regs[ENET_EIR] |= ENET_INT_RXF;
          } else {
              s->regs[ENET_EIR] |= ENET_INT_RXB;
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
      size_t size = len;
      bool shift16 = s->regs[ENET_RACC] & ENET_RACC_SHIFT16;
 -    FEC_PRINTF("len %d\n", (int)size);
 +    trace_imx_enet_receive(size);
      if (!s->regs[ENET_RDAR]) {
          qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
          bd.length = buf_len;
          size -= buf_len;
 -        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
 +        trace_imx_enet_receive_len(addr, bd.length);
          /* The last 4 bytes are the CRC.  */
          if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
          if (size == 0) {
              /* Last buffer in frame.  */
              bd.flags |= flags | ENET_BD_L;
 -            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
 +
 +            trace_imx_enet_receive_last(bd.flags);
 +
              /* Indicate that we've updated the last buffer descriptor. */
              bd.last_buffer = ENET_BD_BDU;
              if (bd.option & ENET_BD_RX_INT) {
 diff --git a/hw/net/trace-events b/hw/net/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/net/trace-events
 +++ b/hw/net/trace-events
@@ -XXX,XX +XXX,XX @@ i82596_receive_packet(size_t sz) "len=%zu"
  i82596_new_mac(const char *id_with_mac) "New MAC for: %s"
  i82596_set_multicast(uint16_t count) "Added %d multicast entries"
  i82596_channel_attention(void *s) "%p: Received CHANNEL ATTENTION"
 +
 +# imx_fec.c
 +imx_phy_read(uint32_t val, int reg) "0x%04"PRIx32" <= reg[%d]"
 +imx_phy_write(uint32_t val, int reg) "0x%04"PRIx32" => reg[%d]"
 +imx_phy_update_link(const char *s) "%s"
 +imx_phy_reset(void) ""
 +imx_fec_read_bd(uint64_t addr, int flags, int len, int data) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x"
 +imx_enet_read_bd(uint64_t addr, int flags, int len, int data, int options, int status) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x option 0x%04x status 0x%04x"
 +imx_eth_tx_bd_busy(void) "tx_bd ran out of descriptors to transmit"
 +imx_eth_rx_bd_full(void) "RX buffer is full"
 +imx_eth_read(int reg, const char *reg_name, uint32_t value) "reg[%d:%s] => 0x%08"PRIx32
 +imx_eth_write(int reg, const char *reg_name, uint64_t value) "reg[%d:%s] <= 0x%08"PRIx64
 +imx_fec_receive(size_t size) "len %zu"
 +imx_fec_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
 +imx_fec_receive_last(int last) "rx frame flags 0x%04x"
 +imx_enet_receive(size_t size) "len %zu"
 +imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
 +imx_enet_receive_last(int last) "rx frame flags 0x%04x"
 --
-.20.1
+.34.1

-[PULL 07/23] target/arm: Convert Neon 3-reg-diff polynomial VMULL
+[PULL 16/24] Add i.MX7 SRC device implementation
-Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
+From: Jean-Christophe Dubois <jcd@tribudubois.net>
-insn in this group to be converted.
+The SRC device is normally used to start the secondary CPU.
 When running Linux directly, QEMU is emulating a PSCI interface that UBOOT
 is installing at boot time and therefore the fact that the SRC device is
 unimplemented is hidden as Qemu respond directly to PSCI requets without
 using the SRC device.
 But if you try to run a more bare metal application (maybe uboot itself),
 then it is not possible to start the secondary CPU as the SRC is an
 unimplemented device.
 This patch adds the ability to start the secondary CPU through the SRC
 device so that you can use this feature in bare metal applications.
 Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: ce9a0162defd2acee5dc7f8a674743de0cded569.1692964892.git.jcd@tribudubois.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/neon-dp.decode       |  2 ++
+ include/hw/arm/fsl-imx7.h  |   3 +-
- target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
+ include/hw/misc/imx7_src.h |  66 +++++++++
- target/arm/translate.c          | 60 ++-------------------------------
+ hw/arm/fsl-imx7.c          |   8 +-
-files changed, 48 insertions(+), 57 deletions(-)
+ hw/misc/imx7_src.c         | 276 +++++++++++++++++++++++++++++++++++++
+ hw/misc/meson.build        |   1 +
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+ hw/misc/trace-events       |   4 +
 files changed, 356 insertions(+), 2 deletions(-)
  create mode 100644 include/hw/misc/imx7_src.h
  create mode 100644 hw/misc/imx7_src.c
 diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/include/hw/arm/fsl-imx7.h
-+++ b/target/arm/neon-dp.decode
++++ b/include/hw/arm/fsl-imx7.h
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@
-     VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
+ #include "hw/misc/imx7_ccm.h"
+ #include "hw/misc/imx7_snvs.h"
-     VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
+ #include "hw/misc/imx7_gpr.h"
-+
++#include "hw/misc/imx7_src.h"
-+    VMULL_P_3d   1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
+ #include "hw/watchdog/wdt_imx2.h"
-   ]
+ #include "hw/gpio/imx_gpio.h"
- }
+ #include "hw/char/imx_serial.h"
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@ struct FslIMX7State {
      IMX7CCMState       ccm;
      IMX7AnalogState    analog;
      IMX7SNVSState      snvs;
 +    IMX7SRCState       src;
      IMXGPCv2State      gpcv2;
      IMXSPIState        spi[FSL_IMX7_NUM_ECSPIS];
      IMXI2CState        i2c[FSL_IMX7_NUM_I2CS];
@@ -XXX,XX +XXX,XX @@ enum FslIMX7MemoryMap {
      FSL_IMX7_GPC_ADDR             = 0x303A0000,
      FSL_IMX7_SRC_ADDR             = 0x30390000,
 -    FSL_IMX7_SRC_SIZE             = (4 * KiB),
      FSL_IMX7_CCM_ADDR             = 0x30380000,
 diff --git a/include/hw/misc/imx7_src.h b/include/hw/misc/imx7_src.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/misc/imx7_src.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * IMX7 System Reset Controller
 + *
 + * Copyright (C) 2023 Jean-Christophe Dubois <jcd@tribudubois.net>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
 +#ifndef IMX7_SRC_H
 +#define IMX7_SRC_H
 +
 +#include "hw/sysbus.h"
 +#include "qemu/bitops.h"
 +#include "qom/object.h"
 +
 +#define SRC_SCR 0
 +#define SRC_A7RCR0 1
 +#define SRC_A7RCR1 2
 +#define SRC_M4RCR 3
 +#define SRC_ERCR 5
 +#define SRC_HSICPHY_RCR 7
 +#define SRC_USBOPHY1_RCR 8
 +#define SRC_USBOPHY2_RCR 9
 +#define SRC_MPIPHY_RCR 10
 +#define SRC_PCIEPHY_RCR 11
 +#define SRC_SBMR1 22
 +#define SRC_SRSR 23
 +#define SRC_SISR 26
 +#define SRC_SIMR 27
 +#define SRC_SBMR2 28
 +#define SRC_GPR1 29
 +#define SRC_GPR2 30
 +#define SRC_GPR3 31
 +#define SRC_GPR4 32
 +#define SRC_GPR5 33
 +#define SRC_GPR6 34
 +#define SRC_GPR7 35
 +#define SRC_GPR8 36
 +#define SRC_GPR9 37
 +#define SRC_GPR10 38
 +#define SRC_MAX 39
 +
 +/* SRC_A7SCR1 */
 +#define R_CORE1_ENABLE_SHIFT     1
 +#define R_CORE1_ENABLE_LENGTH    1
 +/* SRC_A7SCR0 */
 +#define R_CORE1_RST_SHIFT        5
 +#define R_CORE1_RST_LENGTH       1
 +#define R_CORE0_RST_SHIFT        4
 +#define R_CORE0_RST_LENGTH       1
 +
 +#define TYPE_IMX7_SRC "imx7.src"
 +OBJECT_DECLARE_SIMPLE_TYPE(IMX7SRCState, IMX7_SRC)
 +
 +struct IMX7SRCState {
 +    /* <private> */
 +    SysBusDevice parent_obj;
 +
 +    /* <public> */
 +    MemoryRegion iomem;
 +
 +    uint32_t regs[SRC_MAX];
 +};
 +
 +#endif /* IMX7_SRC_H */
 diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/hw/arm/fsl-imx7.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/hw/arm/fsl-imx7.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
+@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_init(Object *obj)
+      */
-     return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
+     object_initialize_child(obj, "gpcv2", &s->gpcv2, TYPE_IMX_GPCV2);
- }
-+
++    /*
-+static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
++     * SRC
-+{
++     */
-+    gen_helper_gvec_3 *fn_gvec;
++    object_initialize_child(obj, "src", &s->src, TYPE_IMX7_SRC);
 +
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+     /*
-+        return false;
+      * ECSPIs
       */
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      /*
       * SRC
       */
 -    create_unimplemented_device("src", FSL_IMX7_SRC_ADDR, FSL_IMX7_SRC_SIZE);
 +    sysbus_realize(SYS_BUS_DEVICE(&s->src), &error_abort);
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&s->src), 0, FSL_IMX7_SRC_ADDR);
      /*
       * Watchdogs
 diff --git a/hw/misc/imx7_src.c b/hw/misc/imx7_src.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/misc/imx7_src.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * IMX7 System Reset Controller
 + *
 + * Copyright (c) 2023 Jean-Christophe Dubois <jcd@tribudubois.net>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + *
 + */
 +
 +#include "qemu/osdep.h"
 +#include "hw/misc/imx7_src.h"
 +#include "migration/vmstate.h"
 +#include "qemu/bitops.h"
 +#include "qemu/log.h"
 +#include "qemu/main-loop.h"
 +#include "qemu/module.h"
 +#include "target/arm/arm-powerctl.h"
 +#include "hw/core/cpu.h"
 +#include "hw/registerfields.h"
 +
 +#include "trace.h"
 +
 +static const char *imx7_src_reg_name(uint32_t reg)
 +{
 +    static char unknown[20];
 +
 +    switch (reg) {
 +    case SRC_SCR:
 +        return "SRC_SCR";
 +    case SRC_A7RCR0:
 +        return "SRC_A7RCR0";
 +    case SRC_A7RCR1:
 +        return "SRC_A7RCR1";
 +    case SRC_M4RCR:
 +        return "SRC_M4RCR";
 +    case SRC_ERCR:
 +        return "SRC_ERCR";
 +    case SRC_HSICPHY_RCR:
 +        return "SRC_HSICPHY_RCR";
 +    case SRC_USBOPHY1_RCR:
 +        return "SRC_USBOPHY1_RCR";
 +    case SRC_USBOPHY2_RCR:
 +        return "SRC_USBOPHY2_RCR";
 +    case SRC_PCIEPHY_RCR:
 +        return "SRC_PCIEPHY_RCR";
 +    case SRC_SBMR1:
 +        return "SRC_SBMR1";
 +    case SRC_SRSR:
 +        return "SRC_SRSR";
 +    case SRC_SISR:
 +        return "SRC_SISR";
 +    case SRC_SIMR:
 +        return "SRC_SIMR";
 +    case SRC_SBMR2:
 +        return "SRC_SBMR2";
 +    case SRC_GPR1:
 +        return "SRC_GPR1";
 +    case SRC_GPR2:
 +        return "SRC_GPR2";
 +    case SRC_GPR3:
 +        return "SRC_GPR3";
 +    case SRC_GPR4:
 +        return "SRC_GPR4";
 +    case SRC_GPR5:
 +        return "SRC_GPR5";
 +    case SRC_GPR6:
 +        return "SRC_GPR6";
 +    case SRC_GPR7:
 +        return "SRC_GPR7";
 +    case SRC_GPR8:
 +        return "SRC_GPR8";
 +    case SRC_GPR9:
 +        return "SRC_GPR9";
 +    case SRC_GPR10:
 +        return "SRC_GPR10";
 +    default:
 +        sprintf(unknown, "%u ?", reg);
 +        return unknown;
 +    }
-+
++}
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
++static const VMStateDescription vmstate_imx7_src = {
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
++    .name = TYPE_IMX7_SRC,
-+        return false;
++    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_UINT32_ARRAY(regs, IMX7SRCState, SRC_MAX),
 +        VMSTATE_END_OF_LIST()
 +    },
 +};
 +
 +static void imx7_src_reset(DeviceState *dev)
 +{
 +    IMX7SRCState *s = IMX7_SRC(dev);
 +
 +    memset(s->regs, 0, sizeof(s->regs));
 +
 +    /* Set reset values */
 +    s->regs[SRC_SCR] = 0xA0;
 +    s->regs[SRC_SRSR] = 0x1;
 +    s->regs[SRC_SIMR] = 0x1F;
 +}
 +
 +static uint64_t imx7_src_read(void *opaque, hwaddr offset, unsigned size)
 +{
 +    uint32_t value = 0;
 +    IMX7SRCState *s = (IMX7SRCState *)opaque;
 +    uint32_t index = offset >> 2;
 +
 +    if (index < SRC_MAX) {
 +        value = s->regs[index];
 +    } else {
 +        qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Bad register at offset 0x%"
 +                      HWADDR_PRIx "\n", TYPE_IMX7_SRC, __func__, offset);
 +    }
 +
-+    if (a->vd & 1) {
++    trace_imx7_src_read(imx7_src_reg_name(index), value);
-+        return false;
++
 +    return value;
 +}
 +
 +
 +/*
 + * The reset is asynchronous so we need to defer clearing the reset
 + * bit until the work is completed.
 + */
 +
 +struct SRCSCRResetInfo {
 +    IMX7SRCState *s;
 +    uint32_t reset_bit;
 +};
 +
 +static void imx7_clear_reset_bit(CPUState *cpu, run_on_cpu_data data)
 +{
 +    struct SRCSCRResetInfo *ri = data.host_ptr;
 +    IMX7SRCState *s = ri->s;
 +
 +    assert(qemu_mutex_iothread_locked());
 +
 +    s->regs[SRC_A7RCR0] = deposit32(s->regs[SRC_A7RCR0], ri->reset_bit, 1, 0);
 +
 +    trace_imx7_src_write(imx7_src_reg_name(SRC_A7RCR0), s->regs[SRC_A7RCR0]);
 +
 +    g_free(ri);
 +}
 +
 +static void imx7_defer_clear_reset_bit(uint32_t cpuid,
 +                                       IMX7SRCState *s,
 +                                       uint32_t reset_shift)
 +{
 +    struct SRCSCRResetInfo *ri;
 +    CPUState *cpu = arm_get_cpu_by_id(cpuid);
 +
 +    if (!cpu) {
 +        return;
 +    }
 +
-+    switch (a->size) {
++    ri = g_new(struct SRCSCRResetInfo, 1);
-+    case 0:
++    ri->s = s;
-+        fn_gvec = gen_helper_neon_pmull_h;
++    ri->reset_bit = reset_shift;
 +
 +    async_run_on_cpu(cpu, imx7_clear_reset_bit, RUN_ON_CPU_HOST_PTR(ri));
 +}
 +
 +
 +static void imx7_src_write(void *opaque, hwaddr offset, uint64_t value,
 +                           unsigned size)
 +{
 +    IMX7SRCState *s = (IMX7SRCState *)opaque;
 +    uint32_t index = offset >> 2;
 +    long unsigned int change_mask;
 +    uint32_t current_value = value;
 +
 +    if (index >= SRC_MAX) {
 +        qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Bad register at offset 0x%"
 +                      HWADDR_PRIx "\n", TYPE_IMX7_SRC, __func__, offset);
 +        return;
 +    }
 +
 +    trace_imx7_src_write(imx7_src_reg_name(SRC_A7RCR0), s->regs[SRC_A7RCR0]);
 +
 +    change_mask = s->regs[index] ^ (uint32_t)current_value;
 +
 +    switch (index) {
 +    case SRC_A7RCR0:
 +        if (FIELD_EX32(change_mask, CORE0, RST)) {
 +            arm_reset_cpu(0);
 +            imx7_defer_clear_reset_bit(0, s, R_CORE0_RST_SHIFT);
 +        }
 +        if (FIELD_EX32(change_mask, CORE1, RST)) {
 +            arm_reset_cpu(1);
 +            imx7_defer_clear_reset_bit(1, s, R_CORE1_RST_SHIFT);
 +        }
 +        s->regs[index] = current_value;
 +        break;
-+    case 2:
++    case SRC_A7RCR1:
-+        if (!dc_isar_feature(aa32_pmull, s)) {
++        /*
-+            return false;
++         * On real hardware when the system reset controller starts a
 +         * secondary CPU it runs through some boot ROM code which reads
 +         * the SRC_GPRX registers controlling the start address and branches
 +         * to it.
 +         * Here we are taking a short cut and branching directly to the
 +         * requested address (we don't want to run the boot ROM code inside
 +         * QEMU)
 +         */
 +        if (FIELD_EX32(change_mask, CORE1, ENABLE)) {
 +            if (FIELD_EX32(current_value, CORE1, ENABLE)) {
 +                /* CORE 1 is brought up */
 +                arm_set_cpu_on(1, s->regs[SRC_GPR3], s->regs[SRC_GPR4],
 +                               3, false);
 +            } else {
 +                /* CORE 1 is shut down */
 +                arm_set_cpu_off(1);
 +            }
 +            /* We clear the reset bits as the processor changed state */
 +            imx7_defer_clear_reset_bit(1, s, R_CORE1_RST_SHIFT);
 +            clear_bit(R_CORE1_RST_SHIFT, &change_mask);
 +        }
-+        fn_gvec = gen_helper_gvec_pmull_q;
++        s->regs[index] = current_value;
 +        break;
 +    default:
-+        return false;
++        s->regs[index] = current_value;
 +        break;
 +    }
-+
++}
-+    if (!vfp_access_check(s)) {
++
-+        return true;
++static const struct MemoryRegionOps imx7_src_ops = {
-+    }
++    .read = imx7_src_read,
-+
++    .write = imx7_src_write,
-+    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
++    .endianness = DEVICE_NATIVE_ENDIAN,
-+                       neon_reg_offset(a->vn, 0),
++    .valid = {
-+                       neon_reg_offset(a->vm, 0),
++        /*
-+                       16, 16, 0, fn_gvec);
++         * Our device would not work correctly if the guest was doing
-+    return true;
++         * unaligned access. This might not be a limitation on the real
-+}
++         * device but in practice there is no reason for a guest to access
-diff --git a/target/arm/translate.c b/target/arm/translate.c
++         * this device unaligned.
 +         */
 +        .min_access_size = 4,
 +        .max_access_size = 4,
 +        .unaligned = false,
 +    },
 +};
 +
 +static void imx7_src_realize(DeviceState *dev, Error **errp)
 +{
 +    IMX7SRCState *s = IMX7_SRC(dev);
 +
 +    memory_region_init_io(&s->iomem, OBJECT(dev), &imx7_src_ops, s,
 +                          TYPE_IMX7_SRC, 0x1000);
 +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
 +}
 +
 +static void imx7_src_class_init(ObjectClass *klass, void *data)
 +{
 +    DeviceClass *dc = DEVICE_CLASS(klass);
 +
 +    dc->realize = imx7_src_realize;
 +    dc->reset = imx7_src_reset;
 +    dc->vmsd = &vmstate_imx7_src;
 +    dc->desc = "i.MX6 System Reset Controller";
 +}
 +
 +static const TypeInfo imx7_src_info = {
 +    .name          = TYPE_IMX7_SRC,
 +    .parent        = TYPE_SYS_BUS_DEVICE,
 +    .instance_size = sizeof(IMX7SRCState),
 +    .class_init    = imx7_src_class_init,
 +};
 +
 +static void imx7_src_register_types(void)
 +{
 +    type_register_static(&imx7_src_info);
 +}
 +
 +type_init(imx7_src_register_types)
 diff --git a/hw/misc/meson.build b/hw/misc/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/misc/meson.build
-+++ b/target/arm/translate.c
++++ b/hw/misc/meson.build
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ system_ss.add(when: 'CONFIG_IMX', if_true: files(
- {
+   'imx6_src.c',
-     int op;
+   'imx6ul_ccm.c',
-     int q;
+   'imx7_ccm.c',
--    int rd, rn, rm, rd_ofs, rn_ofs, rm_ofs;
++  'imx7_src.c',
-+    int rd, rn, rm, rd_ofs, rm_ofs;
+   'imx7_gpr.c',
-     int size;
+   'imx7_snvs.c',
-     int pass;
+   'imx_ccm.c',
-     int u;
+diff --git a/hw/misc/trace-events b/hw/misc/trace-events
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+index XXXXXXX..XXXXXXX 100644
-     size = (insn >> 20) & 3;
+--- a/hw/misc/trace-events
-     vec_size = q ? 16 : 8;
++++ b/hw/misc/trace-events
-     rd_ofs = neon_reg_offset(rd, 0);
+@@ -XXX,XX +XXX,XX @@ ccm_clock_freq(uint32_t clock, uint32_t freq) "(Clock = %d) = %d"
--    rn_ofs = neon_reg_offset(rn, 0);
+ ccm_read_reg(const char *reg_name, uint32_t value) "reg[%s] <= 0x%" PRIx32
-     rm_ofs = neon_reg_offset(rm, 0);
+ ccm_write_reg(const char *reg_name, uint32_t value) "reg[%s] => 0x%" PRIx32
-     if ((insn & (1 << 23)) == 0) {
++# imx7_src.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++imx7_src_read(const char *reg_name, uint32_t value) "reg[%s] => 0x%" PRIx32
-         if (size != 3) {
++imx7_src_write(const char *reg_name, uint32_t value) "reg[%s] <= 0x%" PRIx32
-             op = (insn >> 8) & 0xf;
++
-             if ((insn & (1 << 6)) == 0) {
+ # iotkit-sysinfo.c
--                /* Three registers of different lengths.  */
+ iotkit_sysinfo_read(uint64_t offset, uint64_t data, unsigned size) "IoTKit SysInfo read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
--                /* undefreq: bit 0 : UNDEF if size == 0
+ iotkit_sysinfo_write(uint64_t offset, uint64_t data, unsigned size) "IoTKit SysInfo write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 -                 *           bit 1 : UNDEF if size == 1
 -                 *           bit 2 : UNDEF if size == 2
 -                 *           bit 3 : UNDEF if U == 1
 -                 * Note that [2:0] set implies 'always UNDEF'
 -                 */
 -                int undefreq;
 -                /* prewiden, src1_wide, src2_wide, undefreq */
 -                static const int neon_3reg_wide[16][4] = {
 -                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VABAL */
 -                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VABDL */
 -                    {0, 0, 0, 7}, /* VMLAL */
 -                    {0, 0, 0, 7}, /* VQDMLAL */
 -                    {0, 0, 0, 7}, /* VMLSL */
 -                    {0, 0, 0, 7}, /* VQDMLSL */
 -                    {0, 0, 0, 7}, /* Integer VMULL */
 -                    {0, 0, 0, 7}, /* VQDMULL */
 -                    {0, 0, 0, 0xa}, /* Polynomial VMULL */
 -                    {0, 0, 0, 7}, /* Reserved: always UNDEF */
 -                };
 -
 -                undefreq = neon_3reg_wide[op][3];
 -
 -                if ((undefreq & (1 << size)) ||
 -                    ((undefreq & 8) && u)) {
 -                    return 1;
 -                }
 -                if (rd & 1) {
 -                    return 1;
 -                }
 -
 -                /* Handle polynomial VMULL in a single pass.  */
 -                if (op == 14) {
 -                    if (size == 0) {
 -                        /* VMULL.P8 */
 -                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
 -                                           0, gen_helper_neon_pmull_h);
 -                    } else {
 -                        /* VMULL.P64 */
 -                        if (!dc_isar_feature(aa32_pmull, s)) {
 -                            return 1;
 -                        }
 -                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
 -                                           0, gen_helper_gvec_pmull_q);
 -                    }
 -                    return 0;
 -                }
 -                abort(); /* all others handled by decodetree */
 +                /* Three registers of different lengths: handled by decodetree */
 +                return 1;
              } else {
                  /* Two registers and a scalar. NB that for ops of this form
                   * the ARM ARM labels bit 24 as Q, but it is in our variable
 --
-.20.1
+.34.1

-[PULL 05/23] target/arm: Convert Neon 3-reg-diff long multiplies
+[PULL 17/24] target/arm: Catch illegal-exception-return from EL3 with bad NSE/NS
-Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
+The architecture requires (R_TYTWB) that an attempt to return from EL3
-a 32x32->64 multiply with possible accumulate.
+when SCR_EL3.{NSE,NS} are {1,0} is an illegal exception return. (This
 enforces that the CPU can't ever be executing below EL3 with the
 NSE,NS bits indicating an invalid security state.)
-Note that for VMLSL we do the accumulate directly with a subtraction
+We were missing this check; add it.
 rather than doing a negate-then-add as the old code did.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230807150618.101357-1-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  9 +++++
+ target/arm/tcg/helper-a64.c | 9 +++++++++
- target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
+file changed, 9 insertions(+)
  target/arm/translate.c          | 21 +++-------
 files changed, 86 insertions(+), 15 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/tcg/helper-a64.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/tcg/helper-a64.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
+         spsr &= ~PSTATE_SS;
-     VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
+     }
-     VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
-+
++    /*
-+    VMLAL_S_3d   1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
++     * FEAT_RME forbids return from EL3 with an invalid security state.
-+    VMLAL_U_3d   1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
++     * We don't need an explicit check for FEAT_RME here because we enforce
-+
++     * in scr_write() that you can't set the NSE bit without it.
-+    VMLSL_S_3d   1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
++     */
-+    VMLSL_U_3d   1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
++    if (cur_el == 3 && (env->cp15.scr_el3 & (SCR_NS | SCR_NSE)) == SCR_NSE) {
-+
++        goto illegal_return;
 +    VMULL_S_3d   1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
 +    VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
    ]
  }
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
      return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
  }
 +
 +static void gen_mull_s32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
 +{
 +    TCGv_i32 lo = tcg_temp_new_i32();
 +    TCGv_i32 hi = tcg_temp_new_i32();
 +
 +    tcg_gen_muls2_i32(lo, hi, rn, rm);
 +    tcg_gen_concat_i32_i64(rd, lo, hi);
 +
 +    tcg_temp_free_i32(lo);
 +    tcg_temp_free_i32(hi);
 +}
 +
 +static void gen_mull_u32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
 +{
 +    TCGv_i32 lo = tcg_temp_new_i32();
 +    TCGv_i32 hi = tcg_temp_new_i32();
 +
 +    tcg_gen_mulu2_i32(lo, hi, rn, rm);
 +    tcg_gen_concat_i32_i64(rd, lo, hi);
 +
 +    tcg_temp_free_i32(lo);
 +    tcg_temp_free_i32(hi);
 +}
 +
 +static bool trans_VMULL_S_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_mull_s8,
 +        gen_helper_neon_mull_s16,
 +        gen_mull_s32,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VMULL_U_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_mull_u8,
 +        gen_helper_neon_mull_u16,
 +        gen_mull_u32,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], NULL);
 +}
 +
 +#define DO_VMLAL(INSN,MULL,ACC)                                         \
 +    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
 +    {                                                                   \
 +        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
 +            gen_helper_neon_##MULL##8,                                  \
 +            gen_helper_neon_##MULL##16,                                 \
 +            gen_##MULL##32,                                             \
 +            NULL,                                                       \
 +        };                                                              \
 +        static NeonGenTwo64OpFn * const accfn[] = {                     \
 +            gen_helper_neon_##ACC##l_u16,                               \
 +            gen_helper_neon_##ACC##l_u32,                               \
 +            tcg_gen_##ACC##_i64,                                        \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_long_3d(s, a, opfn[a->size], accfn[a->size]);         \
 +    }
 +
-+DO_VMLAL(VMLAL_S,mull_s,add)
+     new_el = el_from_spsr(spsr);
-+DO_VMLAL(VMLAL_U,mull_u,add)
+     if (new_el == -1) {
-+DO_VMLAL(VMLSL_S,mull_s,sub)
+         goto illegal_return;
 +DO_VMLAL(VMLSL_U,mull_u,sub)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* VABAL */
                      {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                      {0, 0, 0, 7}, /* VABDL */
 -                    {0, 0, 0, 0}, /* VMLAL */
 +                    {0, 0, 0, 7}, /* VMLAL */
                      {0, 0, 0, 9}, /* VQDMLAL */
 -                    {0, 0, 0, 0}, /* VMLSL */
 +                    {0, 0, 0, 7}, /* VMLSL */
                      {0, 0, 0, 9}, /* VQDMLSL */
 -                    {0, 0, 0, 0}, /* Integer VMULL */
 +                    {0, 0, 0, 7}, /* Integer VMULL */
                      {0, 0, 0, 9}, /* VQDMULL */
                      {0, 0, 0, 0xa}, /* Polynomial VMULL */
                      {0, 0, 0, 7}, /* Reserved: always UNDEF */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          tmp2 = neon_load_reg(rm, pass);
                      }
                      switch (op) {
 -                    case 8: case 9: case 10: case 11: case 12: case 13:
 -                        /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
 +                    case 9: case 11: case 13:
 +                        /* VQDMLAL, VQDMLSL, VQDMULL */
                          gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
                          break;
                      default: /* 15 is RESERVED: caught earlier  */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          /* VQDMULL */
                          gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
                          neon_store_reg64(cpu_V0, rd + pass);
 -                    } else if (op == 5 || (op >= 8 && op <= 11)) {
 +                    } else {
                          /* Accumulate.  */
                          neon_load_reg64(cpu_V1, rd + pass);
                          switch (op) {
 -                        case 10: /* VMLSL */
 -                            gen_neon_negl(cpu_V0, size);
 -                            /* Fall through */
 -                        case 8: /* VABAL, VMLAL */
 -                            gen_neon_addl(size);
 -                            break;
                          case 9: case 11: /* VQDMLAL, VQDMLSL */
                              gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
                              if (op == 11) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              abort();
                          }
                          neon_store_reg64(cpu_V0, rd + pass);
 -                    } else {
 -                        /* Write back the result.  */
 -                        neon_store_reg64(cpu_V0, rd + pass);
                      }
                  }
              } else {
 --
-.20.1
+.34.1

-[PULL 09/23] target/arm: Add missing TCG temp free in do_2shift_env_64()
+[PULL 18/24] hw/rtc/m48t59: Use 64-bit arithmetic in set_alarm()
-In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
+In the m48t59 device we almost always use 64-bit arithmetic when
-temporary in do_2shift_env_64(); free it.
+dealing with time_t deltas.  The one exception is in set_alarm(),
 which currently uses a plain 'int' to hold the difference between two
 time_t values.  Switch to int64_t instead to avoid any possible
 overflow issues.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 ---
- target/arm/translate-neon.inc.c | 1 +
+ hw/rtc/m48t59.c | 2 +-
-file changed, 1 insertion(+)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+diff --git a/hw/rtc/m48t59.c b/hw/rtc/m48t59.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/hw/rtc/m48t59.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/hw/rtc/m48t59.c
-@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
+@@ -XXX,XX +XXX,XX @@ static void alarm_cb (void *opaque)
-         neon_load_reg64(tmp, a->vm + pass);
-         fn(tmp, cpu_env, tmp, constimm);
+ static void set_alarm(M48t59State *NVRAM)
-         neon_store_reg64(tmp, a->vd + pass);
+ {
-+        tcg_temp_free_i64(tmp);
+-    int diff;
-     }
++    int64_t diff;
-     tcg_temp_free_i64(constimm);
+     if (NVRAM->alrm_timer != NULL) {
-     return true;
+         timer_del(NVRAM->alrm_timer);
          diff = qemu_timedate_diff(&NVRAM->alarm) - NVRAM->time_offset;
 --
-.20.1
+.34.1

-[PULL 08/23] target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
+[PULL 19/24] hw/rtc/twl92230: Use int64_t for sec_offset and alm_sec
-Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
+In the twl92230 device, use int64_t for the two state fields
-trans_VSHLL_U_2sh() as both 'static' and 'const'.
+sec_offset and alm_sec, because we set these to values that
 are either time_t or differences between two time_t values.
 These fields aren't saved in vmstate anywhere, so we can
 safely widen them.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 ---
- target/arm/translate-neon.inc.c | 4 ++--
+ hw/rtc/twl92230.c | 4 ++--
 file changed, 2 insertions(+), 2 deletions(-)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+diff --git a/hw/rtc/twl92230.c b/hw/rtc/twl92230.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/hw/rtc/twl92230.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/hw/rtc/twl92230.c
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
+@@ -XXX,XX +XXX,XX @@ struct MenelausState {
+         struct tm tm;
- static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
+         struct tm new;
- {
+         struct tm alm;
--    NeonGenWidenFn *widenfn[] = {
+-        int sec_offset;
-+    static NeonGenWidenFn * const widenfn[] = {
+-        int alm_sec;
-         gen_helper_neon_widen_s8,
++        int64_t sec_offset;
-         gen_helper_neon_widen_s16,
++        int64_t alm_sec;
-         tcg_gen_ext_i32_i64,
+         int next_comp;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
+     } rtc;
+     uint16_t rtc_next_vmstate;
  static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
  {
 -    NeonGenWidenFn *widenfn[] = {
 +    static NeonGenWidenFn * const widenfn[] = {
          gen_helper_neon_widen_u8,
          gen_helper_neon_widen_u16,
          tcg_gen_extu_i32_i64,
 --
-.20.1
+.34.1

-[PULL 01/23] target/arm: Fix missing temp frees in do_vshll_2sh
+[PULL 20/24] hw/rtc/aspeed_rtc: Use 64-bit offset for holding time_t difference
-The widenfn() in do_vshll_2sh() does not free the input 32-bit
+In the aspeed_rtc device we store a difference between two time_t
-TCGv, so we need to do this in the calling code.
+values in an 'int'. This is not really correct when time_t could
 be 64 bits. Enlarge the field to 'int64_t'.
 This is a migration compatibility break for the aspeed boards.
 While we are changing the vmstate, remove the accidental
 duplicate of the offset field.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Cédric Le Goater <clg@kaod.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 ---
- target/arm/translate-neon.inc.c | 2 ++
+ include/hw/rtc/aspeed_rtc.h | 2 +-
-file changed, 2 insertions(+)
+ hw/rtc/aspeed_rtc.c         | 5 ++---
 files changed, 3 insertions(+), 4 deletions(-)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+diff --git a/include/hw/rtc/aspeed_rtc.h b/include/hw/rtc/aspeed_rtc.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/include/hw/rtc/aspeed_rtc.h
-+++ b/target/arm/translate-neon.inc.c
++++ b/include/hw/rtc/aspeed_rtc.h
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
+@@ -XXX,XX +XXX,XX @@ struct AspeedRtcState {
-     tmp = tcg_temp_new_i64();
+     qemu_irq irq;
-     widenfn(tmp, rm0);
+     uint32_t reg[0x18];
-+    tcg_temp_free_i32(rm0);
+-    int offset;
-     if (a->shift != 0) {
++    int64_t offset;
-         tcg_gen_shli_i64(tmp, tmp, a->shift);
-         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
+ };
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
-     neon_store_reg64(tmp, a->vd);
+diff --git a/hw/rtc/aspeed_rtc.c b/hw/rtc/aspeed_rtc.c
+index XXXXXXX..XXXXXXX 100644
-     widenfn(tmp, rm1);
+--- a/hw/rtc/aspeed_rtc.c
-+    tcg_temp_free_i32(rm1);
++++ b/hw/rtc/aspeed_rtc.c
-     if (a->shift != 0) {
+@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps aspeed_rtc_ops = {
-         tcg_gen_shli_i64(tmp, tmp, a->shift);
-         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
+ static const VMStateDescription vmstate_aspeed_rtc = {
      .name = TYPE_ASPEED_RTC,
 -    .version_id = 1,
 +    .version_id = 2,
      .fields = (VMStateField[]) {
          VMSTATE_UINT32_ARRAY(reg, AspeedRtcState, 0x18),
 -        VMSTATE_INT32(offset, AspeedRtcState),
 -        VMSTATE_INT32(offset, AspeedRtcState),
 +        VMSTATE_INT64(offset, AspeedRtcState),
          VMSTATE_END_OF_LIST()
      }
  };
 --
-.20.1
+.34.1

-[PULL 03/23] target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
+[PULL 21/24] rtc: Use time_t for passing and returning time offsets
-Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
+The functions qemu_get_timedate() and qemu_timedate_diff() take
-VRSUBHN in the Neon 3-registers-different-lengths group to
+and return a time offset as an integer. Coverity points out that
-decodetree.
+means that when an RTC device implementation holds an offset
 as a time_t, as the m48t59 does, the time_t will get truncated.
 (CID 1507157, 1517772).
 The functions work with time_t internally, so make them use that type
 in their APIs.
 Note that this won't help any Y2038 issues where either the device
 model itself is keeping the offset in a 32-bit integer, or where the
 hardware under emulation has Y2038 or other rollover problems.  If we
 missed any cases of the former then hopefully Coverity will warn us
 about them since after this patch we'd be truncating a time_t in
 assignments from qemu_timedate_diff().)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 ---
- target/arm/neon-dp.decode       |  6 +++
+ include/sysemu/rtc.h | 4 ++--
- target/arm/translate-neon.inc.c | 87 +++++++++++++++++++++++++++++++
+ softmmu/rtc.c        | 4 ++--
- target/arm/translate.c          | 91 ++++-----------------------------
+files changed, 4 insertions(+), 4 deletions(-)
 files changed, 104 insertions(+), 80 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/include/sysemu/rtc.h b/include/sysemu/rtc.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/include/sysemu/rtc.h
-+++ b/target/arm/neon-dp.decode
++++ b/include/sysemu/rtc.h
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@
+  * The behaviour of the clock whose value this function returns will
-     VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+  * depend on the -rtc command line option passed by the user.
-     VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+  */
-+
+-void qemu_get_timedate(struct tm *tm, int offset);
-+    VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
++void qemu_get_timedate(struct tm *tm, time_t offset);
-+    VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
-+
+ /**
-+    VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+  * qemu_timedate_diff: Return difference between a struct tm and the RTC
-+    VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+@@ -XXX,XX +XXX,XX @@ void qemu_get_timedate(struct tm *tm, int offset);
-   ]
+  * a timestamp one hour further ahead than the current RTC time
   * then this function will return 3600.
   */
 -int qemu_timedate_diff(struct tm *tm);
 +time_t qemu_timedate_diff(struct tm *tm);
  #endif
 diff --git a/softmmu/rtc.c b/softmmu/rtc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/softmmu/rtc.c
 +++ b/softmmu/rtc.c
@@ -XXX,XX +XXX,XX @@ static time_t qemu_ref_timedate(QEMUClockType clock)
      return value;
  }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
+-void qemu_get_timedate(struct tm *tm, int offset)
---- a/target/arm/translate-neon.inc.c
++void qemu_get_timedate(struct tm *tm, time_t offset)
-+++ b/target/arm/translate-neon.inc.c
+ {
-@@ -XXX,XX +XXX,XX @@ DO_PREWIDEN(VADDW_S, s, ext, add, true)
+     time_t ti = qemu_ref_timedate(rtc_clock);
- DO_PREWIDEN(VADDW_U, u, extu, add, true)
- DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
+@@ -XXX,XX +XXX,XX @@ void qemu_get_timedate(struct tm *tm, int offset)
  DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
 +
 +static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
 +                         NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
 +{
 +    /* 3-regs different lengths, narrowing (VADDHN/VSUBHN/VRADDHN/VRSUBHN) */
 +    TCGv_i64 rn_64, rm_64;
 +    TCGv_i32 rd0, rd1;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!opfn || !narrowfn) {
 +        /* size == 3 case, which is an entirely different insn group */
 +        return false;
 +    }
 +
 +    if ((a->vn | a->vm) & 1) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    rn_64 = tcg_temp_new_i64();
 +    rm_64 = tcg_temp_new_i64();
 +    rd0 = tcg_temp_new_i32();
 +    rd1 = tcg_temp_new_i32();
 +
 +    neon_load_reg64(rn_64, a->vn);
 +    neon_load_reg64(rm_64, a->vm);
 +
 +    opfn(rn_64, rn_64, rm_64);
 +
 +    narrowfn(rd0, rn_64);
 +
 +    neon_load_reg64(rn_64, a->vn + 1);
 +    neon_load_reg64(rm_64, a->vm + 1);
 +
 +    opfn(rn_64, rn_64, rm_64);
 +
 +    narrowfn(rd1, rn_64);
 +
 +    neon_store_reg(a->vd, 0, rd0);
 +    neon_store_reg(a->vd, 1, rd1);
 +
 +    tcg_temp_free_i64(rn_64);
 +    tcg_temp_free_i64(rm_64);
 +
 +    return true;
 +}
 +
 +#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP)                       \
 +    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
 +    {                                                                   \
 +        static NeonGenTwo64OpFn * const addfn[] = {                     \
 +            gen_helper_neon_##OP##l_u16,                                \
 +            gen_helper_neon_##OP##l_u32,                                \
 +            tcg_gen_##OP##_i64,                                         \
 +            NULL,                                                       \
 +        };                                                              \
 +        static NeonGenNarrowFn * const narrowfn[] = {                   \
 +            gen_helper_neon_##NARROWTYPE##_high_u8,                     \
 +            gen_helper_neon_##NARROWTYPE##_high_u16,                    \
 +            EXTOP,                                                      \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]);   \
 +    }
 +
 +static void gen_narrow_round_high_u32(TCGv_i32 rd, TCGv_i64 rn)
 +{
 +    tcg_gen_addi_i64(rn, rn, 1u << 31);
 +    tcg_gen_extrh_i64_i32(rd, rn);
 +}
 +
 +DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
 +DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
 +DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
 +DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
      }
  }
--static inline void gen_neon_subl(int size)
+-int qemu_timedate_diff(struct tm *tm)
--{
++time_t qemu_timedate_diff(struct tm *tm)
 -    switch (size) {
 -    case 0: gen_helper_neon_subl_u16(CPU_V001); break;
 -    case 1: gen_helper_neon_subl_u32(CPU_V001); break;
 -    case 2: tcg_gen_sub_i64(CPU_V001); break;
 -    default: abort();
 -    }
 -}
 -
  static inline void gen_neon_negl(TCGv_i64 var, int size)
  {
-     switch (size) {
+     time_t seconds;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
              op = (insn >> 8) & 0xf;
              if ((insn & (1 << 6)) == 0) {
                  /* Three registers of different lengths.  */
 -                int src1_wide;
 -                int src2_wide;
                  /* undefreq: bit 0 : UNDEF if size == 0
                   *           bit 1 : UNDEF if size == 1
                   *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* VADDW: handled by decodetree */
                      {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                      {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
 -                    {0, 1, 1, 0}, /* VADDHN */
 +                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
                      {0, 0, 0, 0}, /* VABAL */
 -                    {0, 1, 1, 0}, /* VSUBHN */
 +                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                      {0, 0, 0, 0}, /* VABDL */
                      {0, 0, 0, 0}, /* VMLAL */
                      {0, 0, 0, 9}, /* VQDMLAL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* Reserved: always UNDEF */
                  };
 -                src1_wide = neon_3reg_wide[op][1];
 -                src2_wide = neon_3reg_wide[op][2];
                  undefreq = neon_3reg_wide[op][3];
                  if ((undefreq & (1 << size)) ||
                      ((undefreq & 8) && u)) {
                      return 1;
                  }
 -                if ((src1_wide && (rn & 1)) ||
 -                    (src2_wide && (rm & 1)) ||
 -                    (!src2_wide && (rd & 1))) {
 +                if (rd & 1) {
                      return 1;
                  }
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  /* Avoid overlapping operands.  Wide source operands are
                     always aligned so will never overlap with wide
                     destinations in problematic ways.  */
 -                if (rd == rm && !src2_wide) {
 +                if (rd == rm) {
                      tmp = neon_load_reg(rm, 1);
                      neon_store_scratch(2, tmp);
 -                } else if (rd == rn && !src1_wide) {
 +                } else if (rd == rn) {
                      tmp = neon_load_reg(rn, 1);
                      neon_store_scratch(2, tmp);
                  }
                  tmp3 = NULL;
                  for (pass = 0; pass < 2; pass++) {
 -                    if (src1_wide) {
 -                        neon_load_reg64(cpu_V0, rn + pass);
 -                        tmp = NULL;
 +                    if (pass == 1 && rd == rn) {
 +                        tmp = neon_load_scratch(2);
                      } else {
 -                        if (pass == 1 && rd == rn) {
 -                            tmp = neon_load_scratch(2);
 -                        } else {
 -                            tmp = neon_load_reg(rn, pass);
 -                        }
 +                        tmp = neon_load_reg(rn, pass);
                      }
 -                    if (src2_wide) {
 -                        neon_load_reg64(cpu_V1, rm + pass);
 -                        tmp2 = NULL;
 +                    if (pass == 1 && rd == rm) {
 +                        tmp2 = neon_load_scratch(2);
                      } else {
 -                        if (pass == 1 && rd == rm) {
 -                            tmp2 = neon_load_scratch(2);
 -                        } else {
 -                            tmp2 = neon_load_reg(rm, pass);
 -                        }
 +                        tmp2 = neon_load_reg(rm, pass);
                      }
                      switch (op) {
 -                    case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
 -                        gen_neon_addl(size);
 -                        break;
 -                    case 2: case 3: case 6: /* VSUBL, VSUBW, VSUBHN, VRSUBHN */
 -                        gen_neon_subl(size);
 -                        break;
                      case 5: case 7: /* VABAL, VABDL */
                          switch ((size << 1) | u) {
                          case 0:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              abort();
                          }
                          neon_store_reg64(cpu_V0, rd + pass);
 -                    } else if (op == 4 || op == 6) {
 -                        /* Narrowing operation.  */
 -                        tmp = tcg_temp_new_i32();
 -                        if (!u) {
 -                            switch (size) {
 -                            case 0:
 -                                gen_helper_neon_narrow_high_u8(tmp, cpu_V0);
 -                                break;
 -                            case 1:
 -                                gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
 -                                break;
 -                            case 2:
 -                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
 -                                break;
 -                            default: abort();
 -                            }
 -                        } else {
 -                            switch (size) {
 -                            case 0:
 -                                gen_helper_neon_narrow_round_high_u8(tmp, cpu_V0);
 -                                break;
 -                            case 1:
 -                                gen_helper_neon_narrow_round_high_u16(tmp, cpu_V0);
 -                                break;
 -                            case 2:
 -                                tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
 -                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
 -                                break;
 -                            default: abort();
 -                            }
 -                        }
 -                        if (pass == 0) {
 -                            tmp3 = tmp;
 -                        } else {
 -                            neon_store_reg(rd, 0, tmp3);
 -                            neon_store_reg(rd, 1, tmp);
 -                        }
                      } else {
                          /* Write back the result.  */
                          neon_store_reg64(cpu_V0, rd + pass);
 --
-.20.1
+.34.1

-[PULL 06/23] target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
+[PULL 22/24] target/arm: Do all "ARM_FEATURE_X implies Y" checks in post_init
-Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
+Where architecturally one ARM_FEATURE_X flag implies another
-these are all saturating doubling long multiplies with a possible
+ARM_FEATURE_Y, we allow the CPU init function to only set X, and then
-accumulate step.
+set Y for it.  Currently we do this in two places -- we set a few
+flags in arm_cpu_post_init() because we need them to decide which
-These are the last insns in the group which use the pass-over-each
+properties to create on the CPU object, and then we do the rest in
-elements loop, so we can delete that code.
+arm_cpu_realizefn().  However, this is fragile, because it's easy to
 add a new property and not notice that this means that an X-implies-Y
 check now has to move from realize to post-init.
 As a specific example, the pmsav7-dregion property is conditional
 on ARM_FEATURE_PMSA && ARM_FEATURE_V7, which means it won't appear
 on the Cortex-M33 and -M55, because they set ARM_FEATURE_V8 and
 rely on V8-implies-V7, which doesn't happen until the realizefn.
 Move all of these X-implies-Y checks into a new function, which
 we call at the top of arm_cpu_post_init(), so the feature bits
 are available at that point.
 This does now give us the reverse issue, that if there's a feature
 bit which is enabled or disabled by the setting of a property then
 then X-implies-Y features that are dependent on that property need to
 be in realize, not in this new function.  But the only one of those
 is the "EL3 implies VBAR" which is already in the right place, so
 putting things this way round seems better to me.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230724174335.2150499-2-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  6 +++
+ target/arm/cpu.c | 179 +++++++++++++++++++++++++----------------------
- target/arm/translate-neon.inc.c | 82 +++++++++++++++++++++++++++++++++
+file changed, 97 insertions(+), 82 deletions(-)
- target/arm/translate.c          | 59 ++----------------------
-files changed, 92 insertions(+), 55 deletions(-)
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/cpu.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ unsigned int gt_cntfrq_period_ns(ARMCPU *cpu)
-     VMLAL_S_3d   1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
+       NANOSECONDS_PER_SECOND / cpu->gt_cntfrq_hz : 1;
      VMLAL_U_3d   1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
 +    VQDMLAL_3d   1111 001 0 1 . .. .... .... 1001 . 0 . 0 .... @3diff
 +
      VMLSL_S_3d   1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
      VMLSL_U_3d   1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
 +    VQDMLSL_3d   1111 001 0 1 . .. .... .... 1011 . 0 . 0 .... @3diff
 +
      VMULL_S_3d   1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
      VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
 +
 +    VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
    ]
  }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
++static void arm_cpu_propagate_feature_implications(ARMCPU *cpu)
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_VMLAL(VMLAL_S,mull_s,add)
  DO_VMLAL(VMLAL_U,mull_u,add)
  DO_VMLAL(VMLSL_S,mull_s,sub)
  DO_VMLAL(VMLSL_U,mull_u,sub)
 +
 +static void gen_VQDMULL_16(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
 +{
-+    gen_helper_neon_mull_s16(rd, rn, rm);
++    CPUARMState *env = &cpu->env;
-+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rd, rd);
++    bool no_aa32 = false;
 +
 +    /*
 +     * Some features automatically imply others: set the feature
 +     * bits explicitly for these cases.
 +     */
 +
 +    if (arm_feature(env, ARM_FEATURE_M)) {
 +        set_feature(env, ARM_FEATURE_PMSA);
 +    }
 +
 +    if (arm_feature(env, ARM_FEATURE_V8)) {
 +        if (arm_feature(env, ARM_FEATURE_M)) {
 +            set_feature(env, ARM_FEATURE_V7);
 +        } else {
 +            set_feature(env, ARM_FEATURE_V7VE);
 +        }
 +    }
 +
 +    /*
 +     * There exist AArch64 cpus without AArch32 support.  When KVM
 +     * queries ID_ISAR0_EL1 on such a host, the value is UNKNOWN.
 +     * Similarly, we cannot check ID_AA64PFR0 without AArch64 support.
 +     * As a general principle, we also do not make ID register
 +     * consistency checks anywhere unless using TCG, because only
 +     * for TCG would a consistency-check failure be a QEMU bug.
 +     */
 +    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
 +        no_aa32 = !cpu_isar_feature(aa64_aa32, cpu);
 +    }
 +
 +    if (arm_feature(env, ARM_FEATURE_V7VE)) {
 +        /*
 +         * v7 Virtualization Extensions. In real hardware this implies
 +         * EL2 and also the presence of the Security Extensions.
 +         * For QEMU, for backwards-compatibility we implement some
 +         * CPUs or CPU configs which have no actual EL2 or EL3 but do
 +         * include the various other features that V7VE implies.
 +         * Presence of EL2 itself is ARM_FEATURE_EL2, and of the
 +         * Security Extensions is ARM_FEATURE_EL3.
 +         */
 +        assert(!tcg_enabled() || no_aa32 ||
 +               cpu_isar_feature(aa32_arm_div, cpu));
 +        set_feature(env, ARM_FEATURE_LPAE);
 +        set_feature(env, ARM_FEATURE_V7);
 +    }
 +    if (arm_feature(env, ARM_FEATURE_V7)) {
 +        set_feature(env, ARM_FEATURE_VAPA);
 +        set_feature(env, ARM_FEATURE_THUMB2);
 +        set_feature(env, ARM_FEATURE_MPIDR);
 +        if (!arm_feature(env, ARM_FEATURE_M)) {
 +            set_feature(env, ARM_FEATURE_V6K);
 +        } else {
 +            set_feature(env, ARM_FEATURE_V6);
 +        }
 +
 +        /*
 +         * Always define VBAR for V7 CPUs even if it doesn't exist in
 +         * non-EL3 configs. This is needed by some legacy boards.
 +         */
 +        set_feature(env, ARM_FEATURE_VBAR);
 +    }
 +    if (arm_feature(env, ARM_FEATURE_V6K)) {
 +        set_feature(env, ARM_FEATURE_V6);
 +        set_feature(env, ARM_FEATURE_MVFR);
 +    }
 +    if (arm_feature(env, ARM_FEATURE_V6)) {
 +        set_feature(env, ARM_FEATURE_V5);
 +        if (!arm_feature(env, ARM_FEATURE_M)) {
 +            assert(!tcg_enabled() || no_aa32 ||
 +                   cpu_isar_feature(aa32_jazelle, cpu));
 +            set_feature(env, ARM_FEATURE_AUXCR);
 +        }
 +    }
 +    if (arm_feature(env, ARM_FEATURE_V5)) {
 +        set_feature(env, ARM_FEATURE_V4T);
 +    }
 +    if (arm_feature(env, ARM_FEATURE_LPAE)) {
 +        set_feature(env, ARM_FEATURE_V7MP);
 +    }
 +    if (arm_feature(env, ARM_FEATURE_CBAR_RO)) {
 +        set_feature(env, ARM_FEATURE_CBAR);
 +    }
 +    if (arm_feature(env, ARM_FEATURE_THUMB2) &&
 +        !arm_feature(env, ARM_FEATURE_M)) {
 +        set_feature(env, ARM_FEATURE_THUMB_DSP);
 +    }
 +}
 +
-+static void gen_VQDMULL_32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
+ void arm_cpu_post_init(Object *obj)
-+{
+ {
-+    gen_mull_s32(rd, rn, rm);
+     ARMCPU *cpu = ARM_CPU(obj);
-+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rd, rd);
-+}
+-    /* M profile implies PMSA. We have to do this here rather than
-+
+-     * in realize with the other feature-implication checks because
-+static bool trans_VQDMULL_3d(DisasContext *s, arg_3diff *a)
+-     * we look at the PMSA bit to see if we should add some properties.
-+{
++    /*
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++     * Some features imply others. Figure this out now, because we
-+        NULL,
++     * are going to look at the feature bits in deciding which
-+        gen_VQDMULL_16,
++     * properties to add.
-+        gen_VQDMULL_32,
+      */
-+        NULL,
+-    if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
-+    };
+-        set_feature(&cpu->env, ARM_FEATURE_PMSA);
-+
+-    }
-+    return do_long_3d(s, a, opfn[a->size], NULL);
++    arm_cpu_propagate_feature_implications(cpu);
-+}
-+
+     if (arm_feature(&cpu->env, ARM_FEATURE_CBAR) ||
-+static void gen_VQDMLAL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+         arm_feature(&cpu->env, ARM_FEATURE_CBAR_RO)) {
-+{
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
+     CPUARMState *env = &cpu->env;
-+}
+     int pagebits;
-+
+     Error *local_err = NULL;
-+static void gen_VQDMLAL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+-    bool no_aa32 = false;
-+{
-+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
+     /* Use pc-relative instructions in system-mode */
-+}
+ #ifndef CONFIG_USER_ONLY
-+
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-+static bool trans_VQDMLAL_3d(DisasContext *s, arg_3diff *a)
+         cpu->isar.id_isar3 = u;
-+{
+     }
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
+-    /* Some features automatically imply others: */
-+        gen_VQDMULL_16,
+-    if (arm_feature(env, ARM_FEATURE_V8)) {
-+        gen_VQDMULL_32,
+-        if (arm_feature(env, ARM_FEATURE_M)) {
-+        NULL,
+-            set_feature(env, ARM_FEATURE_V7);
-+    };
+-        } else {
-+    static NeonGenTwo64OpFn * const accfn[] = {
+-            set_feature(env, ARM_FEATURE_V7VE);
-+        NULL,
+-        }
-+        gen_VQDMLAL_acc_16,
+-    }
 +        gen_VQDMLAL_acc_32,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
 +}
 +
 +static void gen_VQDMLSL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
 +{
 +    gen_helper_neon_negl_u32(rm, rm);
 +    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
 +}
 +
 +static void gen_VQDMLSL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
 +{
 +    tcg_gen_neg_i64(rm, rm);
 +    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
 +}
 +
 +static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULL_16,
 +        gen_VQDMULL_32,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const accfn[] = {
 +        NULL,
 +        gen_VQDMLSL_acc_16,
 +        gen_VQDMLSL_acc_32,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                      {0, 0, 0, 7}, /* VABDL */
                      {0, 0, 0, 7}, /* VMLAL */
 -                    {0, 0, 0, 9}, /* VQDMLAL */
 +                    {0, 0, 0, 7}, /* VQDMLAL */
                      {0, 0, 0, 7}, /* VMLSL */
 -                    {0, 0, 0, 9}, /* VQDMLSL */
 +                    {0, 0, 0, 7}, /* VQDMLSL */
                      {0, 0, 0, 7}, /* Integer VMULL */
 -                    {0, 0, 0, 9}, /* VQDMULL */
 +                    {0, 0, 0, 7}, /* VQDMULL */
                      {0, 0, 0, 0xa}, /* Polynomial VMULL */
                      {0, 0, 0, 7}, /* Reserved: always UNDEF */
                  };
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      }
                      return 0;
                  }
 -
--                /* Avoid overlapping operands.  Wide source operands are
+-    /*
--                   always aligned so will never overlap with wide
+-     * There exist AArch64 cpus without AArch32 support.  When KVM
--                   destinations in problematic ways.  */
+-     * queries ID_ISAR0_EL1 on such a host, the value is UNKNOWN.
--                if (rd == rm) {
+-     * Similarly, we cannot check ID_AA64PFR0 without AArch64 support.
--                    tmp = neon_load_reg(rm, 1);
+-     * As a general principle, we also do not make ID register
--                    neon_store_scratch(2, tmp);
+-     * consistency checks anywhere unless using TCG, because only
--                } else if (rd == rn) {
+-     * for TCG would a consistency-check failure be a QEMU bug.
--                    tmp = neon_load_reg(rn, 1);
+-     */
--                    neon_store_scratch(2, tmp);
+-    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
--                }
+-        no_aa32 = !cpu_isar_feature(aa64_aa32, cpu);
--                tmp3 = NULL;
+-    }
--                for (pass = 0; pass < 2; pass++) {
+-
--                    if (pass == 1 && rd == rn) {
+-    if (arm_feature(env, ARM_FEATURE_V7VE)) {
--                        tmp = neon_load_scratch(2);
+-        /* v7 Virtualization Extensions. In real hardware this implies
--                    } else {
+-         * EL2 and also the presence of the Security Extensions.
--                        tmp = neon_load_reg(rn, pass);
+-         * For QEMU, for backwards-compatibility we implement some
--                    }
+-         * CPUs or CPU configs which have no actual EL2 or EL3 but do
--                    if (pass == 1 && rd == rm) {
+-         * include the various other features that V7VE implies.
--                        tmp2 = neon_load_scratch(2);
+-         * Presence of EL2 itself is ARM_FEATURE_EL2, and of the
--                    } else {
+-         * Security Extensions is ARM_FEATURE_EL3.
--                        tmp2 = neon_load_reg(rm, pass);
+-         */
--                    }
+-        assert(!tcg_enabled() || no_aa32 ||
--                    switch (op) {
+-               cpu_isar_feature(aa32_arm_div, cpu));
--                    case 9: case 11: case 13:
+-        set_feature(env, ARM_FEATURE_LPAE);
--                        /* VQDMLAL, VQDMLSL, VQDMULL */
+-        set_feature(env, ARM_FEATURE_V7);
--                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
+-    }
--                        break;
+-    if (arm_feature(env, ARM_FEATURE_V7)) {
--                    default: /* 15 is RESERVED: caught earlier  */
+-        set_feature(env, ARM_FEATURE_VAPA);
--                        abort();
+-        set_feature(env, ARM_FEATURE_THUMB2);
--                    }
+-        set_feature(env, ARM_FEATURE_MPIDR);
--                    if (op == 13) {
+-        if (!arm_feature(env, ARM_FEATURE_M)) {
--                        /* VQDMULL */
+-            set_feature(env, ARM_FEATURE_V6K);
--                        gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
+-        } else {
--                        neon_store_reg64(cpu_V0, rd + pass);
+-            set_feature(env, ARM_FEATURE_V6);
--                    } else {
+-        }
--                        /* Accumulate.  */
+-
--                        neon_load_reg64(cpu_V1, rd + pass);
+-        /* Always define VBAR for V7 CPUs even if it doesn't exist in
--                        switch (op) {
+-         * non-EL3 configs. This is needed by some legacy boards.
--                        case 9: case 11: /* VQDMLAL, VQDMLSL */
+-         */
--                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
+-        set_feature(env, ARM_FEATURE_VBAR);
--                            if (op == 11) {
+-    }
--                                gen_neon_negl(cpu_V0, size);
+-    if (arm_feature(env, ARM_FEATURE_V6K)) {
--                            }
+-        set_feature(env, ARM_FEATURE_V6);
--                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
+-        set_feature(env, ARM_FEATURE_MVFR);
--                            break;
+-    }
--                        default:
+-    if (arm_feature(env, ARM_FEATURE_V6)) {
--                            abort();
+-        set_feature(env, ARM_FEATURE_V5);
--                        }
+-        if (!arm_feature(env, ARM_FEATURE_M)) {
--                        neon_store_reg64(cpu_V0, rd + pass);
+-            assert(!tcg_enabled() || no_aa32 ||
--                    }
+-                   cpu_isar_feature(aa32_jazelle, cpu));
--                }
+-            set_feature(env, ARM_FEATURE_AUXCR);
-+                abort(); /* all others handled by decodetree */
+-        }
-             } else {
+-    }
-                 /* Two registers and a scalar. NB that for ops of this form
+-    if (arm_feature(env, ARM_FEATURE_V5)) {
-                  * the ARM ARM labels bit 24 as Q, but it is in our variable
+-        set_feature(env, ARM_FEATURE_V4T);
 -    }
 -    if (arm_feature(env, ARM_FEATURE_LPAE)) {
 -        set_feature(env, ARM_FEATURE_V7MP);
 -    }
 -    if (arm_feature(env, ARM_FEATURE_CBAR_RO)) {
 -        set_feature(env, ARM_FEATURE_CBAR);
 -    }
 -    if (arm_feature(env, ARM_FEATURE_THUMB2) &&
 -        !arm_feature(env, ARM_FEATURE_M)) {
 -        set_feature(env, ARM_FEATURE_THUMB_DSP);
 -    }
      /*
       * We rely on no XScale CPU having VFP so we can use the same bits in the
 --
-.20.1
+.34.1

-[PULL 04/23] target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
+[PULL 23/24] hw/arm/armv7m: Add mpu-ns-regions and mpu-s-regions properties
-Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
+M-profile CPUs generally allow configuration of the number of MPU
-Like almost all the remaining insns in this group, these are
+regions that they have.  We don't currently model this, so our
-a combination of a two-input operation which returns a double width
+implementations of some of the board models provide CPUs with the
-result and then a possible accumulation of that double width
+wrong number of regions.  RTOSes like Zephyr that hardcode the
-result into the destination.
+expected number of regions may therefore not run on the model if they
 are set up to run on real hardware.
 Add properties mpu-ns-regions and mpu-s-regions to the ARMV7M object,
 matching the ability of hardware to configure the number of Secure
 and NonSecure regions separately.  Our actual CPU implementation
 doesn't currently support that, and it happens that none of the MPS
 boards we model set the number of regions differently for Secure vs
 NonSecure, so we provide an interface to the boards and SoCs that
 won't need to change if we ever do add that functionality in future,
 but make it an error to configure the two properties to different
 values.
 (The property name on the CPU is the somewhat misnamed-for-M-profile
 "pmsav7-dregion", so we don't follow that naming convention for
 the properties here. The TRM doesn't say what the CPU configuration
 variable names are, so we pick something, and follow the lowercase
 convention we already have for properties here.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20230724174335.2150499-3-peter.maydell@linaro.org
 ---
- target/arm/translate.h          |   1 +
+ include/hw/arm/armv7m.h |  8 ++++++++
- target/arm/neon-dp.decode       |   6 ++
+ hw/arm/armv7m.c         | 21 +++++++++++++++++++++
- target/arm/translate-neon.inc.c | 132 ++++++++++++++++++++++++++++++++
+files changed, 29 insertions(+)
  target/arm/translate.c          |  31 +-------
 files changed, 142 insertions(+), 28 deletions(-)
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
+--- a/include/hw/arm/armv7m.h
-+++ b/target/arm/translate.h
++++ b/include/hw/arm/armv7m.h
-@@ -XXX,XX +XXX,XX @@ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
+@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(ARMv7MState, ARMV7M)
- typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
+  * + Property "vfp": enable VFP (forwarded to CPU object)
- typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
+  * + Property "dsp": enable DSP (forwarded to CPU object)
- typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
+  * + Property "enable-bitband": expose bitbanded IO
-+typedef void NeonGenTwoOpWidenFn(TCGv_i64, TCGv_i32, TCGv_i32);
++ * + Property "mpu-ns-regions": number of Non-Secure MPU regions (forwarded
- typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
++ *   to CPU object pmsav7-dregion property; default is whatever the default
- typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
++ *   for the CPU is)
- typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
++ * + Property "mpu-s-regions": number of Secure MPU regions (default is
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
++ *   whatever the default for the CPU is; must currently be set to the same
 + *   value as mpu-ns-regions if the CPU implements the Security Extension)
   * + Clock input "refclk" is the external reference clock for the systick timers
   * + Clock input "cpuclk" is the main CPU clock
   */
@@ -XXX,XX +XXX,XX @@ struct ARMv7MState {
      Object *idau;
      uint32_t init_svtor;
      uint32_t init_nsvtor;
 +    uint32_t mpu_ns_regions;
 +    uint32_t mpu_s_regions;
      bool enable_bitband;
      bool start_powered_off;
      bool vfp;
 diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/hw/arm/armv7m.c
-+++ b/target/arm/neon-dp.decode
++++ b/hw/arm/armv7m.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
-     VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+         }
-     VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+     }
 +    VABAL_S_3d   1111 001 0 1 . .. .... .... 0101 . 0 . 0 .... @3diff
 +    VABAL_U_3d   1111 001 1 1 . .. .... .... 0101 . 0 . 0 .... @3diff
 +
      VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
      VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
 +
 +    VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
 +    VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
    ]
  }
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
  DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
  DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
  DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
 +
 +static bool do_long_3d(DisasContext *s, arg_3diff *a,
 +                       NeonGenTwoOpWidenFn *opfn,
 +                       NeonGenTwo64OpFn *accfn)
 +{
 +    /*
-+     * 3-regs different lengths, long operations.
++     * Real M-profile hardware can be configured with a different number of
-+     * These perform an operation on two inputs that returns a double-width
++     * MPU regions for Secure vs NonSecure. QEMU's CPU implementation doesn't
-+     * result, and then possibly perform an accumulation operation of
++     * support that yet, so catch attempts to select that.
 +     * that result into the double-width destination.
 +     */
-+    TCGv_i64 rd0, rd1, tmp;
++    if (arm_feature(&s->cpu->env, ARM_FEATURE_M_SECURITY) &&
-+    TCGv_i32 rn, rm;
++        s->mpu_ns_regions != s->mpu_s_regions) {
-+
++        error_setg(errp,
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
++                   "mpu-ns-regions and mpu-s-regions properties must have the same value");
-+        return false;
++        return;
 +    }
 +    if (s->mpu_ns_regions != UINT_MAX &&
 +        object_property_find(OBJECT(s->cpu), "pmsav7-dregion")) {
 +        if (!object_property_set_uint(OBJECT(s->cpu), "pmsav7-dregion",
 +                                      s->mpu_ns_regions, errp)) {
 +            return;
 +        }
 +    }
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+     /*
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+      * Tell the CPU where the NVIC is; it will fail realize if it doesn't
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
+      * have one. Similarly, tell the NVIC where its CPU is.
-+        return false;
+@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
-+    }
+                      false),
-+
+     DEFINE_PROP_BOOL("vfp", ARMv7MState, vfp, true),
-+    if (!opfn) {
+     DEFINE_PROP_BOOL("dsp", ARMv7MState, dsp, true),
-+        /* size == 3 case, which is an entirely different insn group */
++    DEFINE_PROP_UINT32("mpu-ns-regions", ARMv7MState, mpu_ns_regions, UINT_MAX),
-+        return false;
++    DEFINE_PROP_UINT32("mpu-s-regions", ARMv7MState, mpu_s_regions, UINT_MAX),
-+    }
+     DEFINE_PROP_END_OF_LIST(),
-+
+ };
-+    if (a->vd & 1) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    rd0 = tcg_temp_new_i64();
 +    rd1 = tcg_temp_new_i64();
 +
 +    rn = neon_load_reg(a->vn, 0);
 +    rm = neon_load_reg(a->vm, 0);
 +    opfn(rd0, rn, rm);
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rm);
 +
 +    rn = neon_load_reg(a->vn, 1);
 +    rm = neon_load_reg(a->vm, 1);
 +    opfn(rd1, rn, rm);
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rm);
 +
 +    /* Don't store results until after all loads: they might overlap */
 +    if (accfn) {
 +        tmp = tcg_temp_new_i64();
 +        neon_load_reg64(tmp, a->vd);
 +        accfn(tmp, tmp, rd0);
 +        neon_store_reg64(tmp, a->vd);
 +        neon_load_reg64(tmp, a->vd + 1);
 +        accfn(tmp, tmp, rd1);
 +        neon_store_reg64(tmp, a->vd + 1);
 +        tcg_temp_free_i64(tmp);
 +    } else {
 +        neon_store_reg64(rd0, a->vd);
 +        neon_store_reg64(rd1, a->vd + 1);
 +    }
 +
 +    tcg_temp_free_i64(rd0);
 +    tcg_temp_free_i64(rd1);
 +
 +    return true;
 +}
 +
 +static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_abdl_s16,
 +        gen_helper_neon_abdl_s32,
 +        gen_helper_neon_abdl_s64,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VABDL_U_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_abdl_u16,
 +        gen_helper_neon_abdl_u32,
 +        gen_helper_neon_abdl_u64,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_abdl_s16,
 +        gen_helper_neon_abdl_s32,
 +        gen_helper_neon_abdl_s64,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const addfn[] = {
 +        gen_helper_neon_addl_u16,
 +        gen_helper_neon_addl_u32,
 +        tcg_gen_add_i64,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
 +}
 +
 +static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        gen_helper_neon_abdl_u16,
 +        gen_helper_neon_abdl_u32,
 +        gen_helper_neon_abdl_u64,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const addfn[] = {
 +        gen_helper_neon_addl_u16,
 +        gen_helper_neon_addl_u32,
 +        tcg_gen_add_i64,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                      {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                      {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
 -                    {0, 0, 0, 0}, /* VABAL */
 +                    {0, 0, 0, 7}, /* VABAL */
                      {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
 -                    {0, 0, 0, 0}, /* VABDL */
 +                    {0, 0, 0, 7}, /* VABDL */
                      {0, 0, 0, 0}, /* VMLAL */
                      {0, 0, 0, 9}, /* VQDMLAL */
                      {0, 0, 0, 0}, /* VMLSL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          tmp2 = neon_load_reg(rm, pass);
                      }
                      switch (op) {
 -                    case 5: case 7: /* VABAL, VABDL */
 -                        switch ((size << 1) | u) {
 -                        case 0:
 -                            gen_helper_neon_abdl_s16(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 1:
 -                            gen_helper_neon_abdl_u16(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 2:
 -                            gen_helper_neon_abdl_s32(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 3:
 -                            gen_helper_neon_abdl_u32(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 4:
 -                            gen_helper_neon_abdl_s64(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 5:
 -                            gen_helper_neon_abdl_u64(cpu_V0, tmp, tmp2);
 -                            break;
 -                        default: abort();
 -                        }
 -                        tcg_temp_free_i32(tmp2);
 -                        tcg_temp_free_i32(tmp);
 -                        break;
                      case 8: case 9: case 10: case 11: case 12: case 13:
                          /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
                          gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          case 10: /* VMLSL */
                              gen_neon_negl(cpu_V0, size);
                              /* Fall through */
 -                        case 5: case 8: /* VABAL, VMLAL */
 +                        case 8: /* VABAL, VMLAL */
                              gen_neon_addl(size);
                              break;
                          case 9: case 11: /* VQDMLAL, VQDMLSL */
 --
-.20.1
+.34.1

-[PULL 17/23] target/arm: Convert Neon VDUP (scalar) to decodetree
+[PULL 24/24] hw/arm: Set number of MPU regions correctly for an505, an521, an524
-Convert the Neon VDUP (scalar) insn to decodetree.  (Note that we
+The IoTKit, SSE200 and SSE300 all default to 8 MPU regions.  The
-can't call this just "VDUP" as we used that already in vfp.decode for
+MPS2/MPS3 FPGA images don't override these except in the case of
-the "VDUP (general purpose register" insn.)
+AN547, which uses 16 MPU regions.
 Define properties on the ARMSSE object for the MPU regions (using the
 same names as the documented RTL configuration settings, and
 following the pattern we already have for this device of using
 all-caps names as the RTL does), and set them in the board code.
 We don't actually need to override the default except on AN547,
 but it's simpler code to have the board code set them always
 rather than tracking which board subtypes want to set them to
 a non-default value separately from what that value is.
 Tho overall effect is that for mps2-an505, mps2-an521 and mps3-an524
 we now correctly use 8 MPU regions, while mps3-an547 stays at its
 current 16 regions.
 It's possible some guest code wrongly depended on the previous
 incorrectly modeled number of memory regions. (Such guest code
 should ideally check the number of regions via the MPU_TYPE
 register.) The old behaviour can be obtained with additional
 -global arguments to QEMU:
 For mps2-an521 and mps2-an524:
  -global sse-200.CPU0_MPU_NS=16 -global sse-200.CPU0_MPU_S=16 -global sse-200.CPU1_MPU_NS=16 -global sse-200.CPU1_MPU_S=16
 For mps2-an505:
  -global sse-200.CPU0_MPU_NS=16 -global sse-200.CPU0_MPU_S=16
 NB that the way the implementation allows this use of -global
 is slightly fragile: if the board code explicitly sets the
 properties on the sse-200 object, this overrides the -global
 command line option. So we rely on:
  - the boards that need fixing all happen to use the SSE defaults
  - we can write the board code to only set the property if it
    is different from the default, rather than having all boards
    explicitly set the property
  - the board that does need to use a non-default value happens
    to need to set it to the same value (16) we previously used
 This works, but there are some kinds of refactoring of the
 mps2-tz.c code that would break the support for -global here.
 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1772
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20230724174335.2150499-4-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  7 +++++++
+ include/hw/arm/armsse.h |  5 +++++
- target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
+ hw/arm/armsse.c         | 16 ++++++++++++++++
- target/arm/translate.c          | 25 +------------------------
+ hw/arm/mps2-tz.c        | 29 +++++++++++++++++++++++++++++
-files changed, 34 insertions(+), 24 deletions(-)
+files changed, 50 insertions(+)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/include/hw/arm/armsse.h b/include/hw/arm/armsse.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/include/hw/arm/armsse.h
-+++ b/target/arm/neon-dp.decode
++++ b/include/hw/arm/armsse.h
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@
+  *    (matching the hardware) is that for CPU0 in an IoTKit and CPU1 in an
-     VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
+  *    SSE-200 both are present; CPU0 in an SSE-200 has neither.
-                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+  *    Since the IoTKit has only one CPU, it does not have the CPU1_* properties.
-+
++ *  + QOM properties "CPU0_MPU_NS", "CPU0_MPU_S", "CPU1_MPU_NS" and "CPU1_MPU_S"
-+    VDUP_scalar  1111 001 1 1 . 11 index:3 1 .... 11 000 q:1 . 0 .... \
++ *    which set the number of MPU regions on the CPUs. If there is only one
-+                 vm=%vm_dp vd=%vd_dp size=0
++ *    CPU the CPU1 properties are not present.
-+    VDUP_scalar  1111 001 1 1 . 11 index:2 10 .... 11 000 q:1 . 0 .... \
+  *  + Named GPIO inputs "EXP_IRQ" 0..n are the expansion interrupts for CPU 0,
-+                 vm=%vm_dp vd=%vd_dp size=1
+  *    which are wired to its NVIC lines 32 .. n+32
-+    VDUP_scalar  1111 001 1 1 . 11 index:1 100 .... 11 000 q:1 . 0 .... \
+  *  + Named GPIO inputs "EXP_CPU1_IRQ" 0..n are the expansion interrupts for
-+                 vm=%vm_dp vd=%vd_dp size=2
+@@ -XXX,XX +XXX,XX @@ struct ARMSSE {
-   ]
+     uint32_t exp_numirq;
+     uint32_t sram_addr_width;
-   # Subgroup for size != 0b11
+     uint32_t init_svtor;
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
++    uint32_t cpu_mpu_ns[SSE_MAX_CPUS];
 +    uint32_t cpu_mpu_s[SSE_MAX_CPUS];
      bool cpu_fpu[SSE_MAX_CPUS];
      bool cpu_dsp[SSE_MAX_CPUS];
  };
 diff --git a/hw/arm/armsse.c b/hw/arm/armsse.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/hw/arm/armsse.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/hw/arm/armsse.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
+@@ -XXX,XX +XXX,XX @@ static Property iotkit_properties[] = {
-     tcg_temp_free_i32(tmp);
+     DEFINE_PROP_UINT32("init-svtor", ARMSSE, init_svtor, 0x10000000),
-     return true;
+     DEFINE_PROP_BOOL("CPU0_FPU", ARMSSE, cpu_fpu[0], true),
- }
+     DEFINE_PROP_BOOL("CPU0_DSP", ARMSSE, cpu_dsp[0], true),
-+
++    DEFINE_PROP_UINT32("CPU0_MPU_NS", ARMSSE, cpu_mpu_ns[0], 8),
-+static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
++    DEFINE_PROP_UINT32("CPU0_MPU_S", ARMSSE, cpu_mpu_s[0], 8),
-+{
+     DEFINE_PROP_END_OF_LIST()
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+ };
-+        return false;
-+    }
+@@ -XXX,XX +XXX,XX @@ static Property sse200_properties[] = {
-+
+     DEFINE_PROP_BOOL("CPU0_DSP", ARMSSE, cpu_dsp[0], false),
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+     DEFINE_PROP_BOOL("CPU1_FPU", ARMSSE, cpu_fpu[1], true),
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+     DEFINE_PROP_BOOL("CPU1_DSP", ARMSSE, cpu_dsp[1], true),
-+        ((a->vd | a->vm) & 0x10)) {
++    DEFINE_PROP_UINT32("CPU0_MPU_NS", ARMSSE, cpu_mpu_ns[0], 8),
-+        return false;
++    DEFINE_PROP_UINT32("CPU0_MPU_S", ARMSSE, cpu_mpu_s[0], 8),
-+    }
++    DEFINE_PROP_UINT32("CPU1_MPU_NS", ARMSSE, cpu_mpu_ns[1], 8),
-+
++    DEFINE_PROP_UINT32("CPU1_MPU_S", ARMSSE, cpu_mpu_s[1], 8),
-+    if (a->vd & a->q) {
+     DEFINE_PROP_END_OF_LIST()
-+        return false;
+ };
-+    }
-+
+@@ -XXX,XX +XXX,XX @@ static Property sse300_properties[] = {
-+    if (!vfp_access_check(s)) {
+     DEFINE_PROP_UINT32("init-svtor", ARMSSE, init_svtor, 0x10000000),
-+        return true;
+     DEFINE_PROP_BOOL("CPU0_FPU", ARMSSE, cpu_fpu[0], true),
-+    }
+     DEFINE_PROP_BOOL("CPU0_DSP", ARMSSE, cpu_dsp[0], true),
-+
++    DEFINE_PROP_UINT32("CPU0_MPU_NS", ARMSSE, cpu_mpu_ns[0], 8),
-+    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
++    DEFINE_PROP_UINT32("CPU0_MPU_S", ARMSSE, cpu_mpu_s[0], 8),
-+                         neon_element_offset(a->vm, a->index, a->size),
+     DEFINE_PROP_END_OF_LIST()
-+                         a->q ? 16 : 8, a->q ? 16 : 8);
+ };
-+    return true;
-+}
+@@ -XXX,XX +XXX,XX @@ static void armsse_realize(DeviceState *dev, Error **errp)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+                 return;
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      }
                      break;
                  }
 -            } else if ((insn & (1 << 10)) == 0) {
 -                /* VTBL, VTBX: handled by decodetree */
 -                return 1;
 -            } else if ((insn & 0x380) == 0) {
 -                /* VDUP */
 -                int element;
 -                MemOp size;
 -
 -                if ((insn & (7 << 16)) == 0 || (q && (rd & 1))) {
 -                    return 1;
 -                }
 -                if (insn & (1 << 16)) {
 -                    size = MO_8;
 -                    element = (insn >> 17) & 7;
 -                } else if (insn & (1 << 17)) {
 -                    size = MO_16;
 -                    element = (insn >> 18) & 3;
 -                } else {
 -                    size = MO_32;
 -                    element = (insn >> 19) & 1;
 -                }
 -                tcg_gen_gvec_dup_mem(size, neon_reg_offset(rd, 0),
 -                                     neon_element_offset(rm, element, size),
 -                                     q ? 16 : 8, q ? 16 : 8);
              } else {
 +                /* VTBL, VTBX, VDUP: handled by decodetree */
                  return 1;
              }
          }
++        if (!object_property_set_uint(cpuobj, "mpu-ns-regions",
++                                      s->cpu_mpu_ns[i], errp)) {
++            return;
++        }
++        if (!object_property_set_uint(cpuobj, "mpu-s-regions",
++                                      s->cpu_mpu_s[i], errp)) {
++            return;
++        }
+         if (i > 0) {
+             memory_region_add_subregion_overlap(&s->cpu_container[i], 0,
+diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/arm/mps2-tz.c
++++ b/hw/arm/mps2-tz.c
+@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineClass {
+     int uart_overflow_irq; /* number of the combined UART overflow IRQ */
+     uint32_t init_svtor; /* init-svtor setting for SSE */
+     uint32_t sram_addr_width; /* SRAM_ADDR_WIDTH setting for SSE */
++    uint32_t cpu0_mpu_ns; /* CPU0_MPU_NS setting for SSE */
++    uint32_t cpu0_mpu_s; /* CPU0_MPU_S setting for SSE */
++    uint32_t cpu1_mpu_ns; /* CPU1_MPU_NS setting for SSE */
++    uint32_t cpu1_mpu_s; /* CPU1_MPU_S setting for SSE */
+     const RAMInfo *raminfo;
+     const char *armsse_type;
+     uint32_t boot_ram_size; /* size of ram at address 0; 0 == find in raminfo */
+@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_TYPE(MPS2TZMachineState, MPS2TZMachineClass, MPS2TZ_MACHINE)
+ #define MPS3_DDR_SIZE (2 * GiB)
+ #endif
++/* For cpu{0,1}_mpu_{ns,s}, means "leave at SSE's default value" */
++#define MPU_REGION_DEFAULT UINT32_MAX
++
+ static const uint32_t an505_oscclk[] = {
+     40000000,
+     24580000,
+@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+                              OBJECT(system_memory), &error_abort);
+     qdev_prop_set_uint32(iotkitdev, "EXP_NUMIRQ", mmc->numirq);
+     qdev_prop_set_uint32(iotkitdev, "init-svtor", mmc->init_svtor);
++    if (mmc->cpu0_mpu_ns != MPU_REGION_DEFAULT) {
++        qdev_prop_set_uint32(iotkitdev, "CPU0_MPU_NS", mmc->cpu0_mpu_ns);
++    }
++    if (mmc->cpu0_mpu_s != MPU_REGION_DEFAULT) {
++        qdev_prop_set_uint32(iotkitdev, "CPU0_MPU_S", mmc->cpu0_mpu_s);
++    }
++    if (object_property_find(OBJECT(iotkitdev), "CPU1_MPU_NS")) {
++        if (mmc->cpu1_mpu_ns != MPU_REGION_DEFAULT) {
++            qdev_prop_set_uint32(iotkitdev, "CPU1_MPU_NS", mmc->cpu1_mpu_ns);
++        }
++        if (mmc->cpu1_mpu_s != MPU_REGION_DEFAULT) {
++            qdev_prop_set_uint32(iotkitdev, "CPU1_MPU_S", mmc->cpu1_mpu_s);
++        }
++    }
+     qdev_prop_set_uint32(iotkitdev, "SRAM_ADDR_WIDTH", mmc->sram_addr_width);
+     qdev_connect_clock_in(iotkitdev, "MAINCLK", mms->sysclk);
+     qdev_connect_clock_in(iotkitdev, "S32KCLK", mms->s32kclk);
+@@ -XXX,XX +XXX,XX @@ static void mps2tz_class_init(ObjectClass *oc, void *data)
+ {
+     MachineClass *mc = MACHINE_CLASS(oc);
+     IDAUInterfaceClass *iic = IDAU_INTERFACE_CLASS(oc);
++    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_CLASS(oc);
+     mc->init = mps2tz_common_init;
+     mc->reset = mps2_machine_reset;
+     iic->check = mps2_tz_idau_check;
++
++    /* Most machines leave these at the SSE defaults */
++    mmc->cpu0_mpu_ns = MPU_REGION_DEFAULT;
++    mmc->cpu0_mpu_s = MPU_REGION_DEFAULT;
++    mmc->cpu1_mpu_ns = MPU_REGION_DEFAULT;
++    mmc->cpu1_mpu_s = MPU_REGION_DEFAULT;
+ }
+ static void mps2tz_set_default_ram_info(MPS2TZMachineClass *mmc)
+@@ -XXX,XX +XXX,XX @@ static void mps3tz_an547_class_init(ObjectClass *oc, void *data)
+     mmc->numirq = 96;
+     mmc->uart_overflow_irq = 48;
+     mmc->init_svtor = 0x00000000;
++    mmc->cpu0_mpu_s = mmc->cpu0_mpu_ns = 16;
+     mmc->sram_addr_width = 21;
+     mmc->raminfo = an547_raminfo;
+     mmc->armsse_type = TYPE_SSE300;
 --
-.20.1
+.34.1

Mostly my decodetree stuff, but also some patches for various
smaller bugs/features from others.

thanks
-- PMM

The following changes since commit 53550e81e2cafe7c03a39526b95cd21b5194d9b1:

Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2020-06-15 16:36:34 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200616

for you to fetch changes up to 64b397417a26509bcdff44ab94356a35c7901c79:

hw: arm: Set vendor property for IMX SDHCI emulations (2020-06-16 10:32:29 +0100)

----------------------------------------------------------------
 * hw: arm: Set vendor property for IMX SDHCI emulations
 * sd: sdhci: Implement basic vendor specific register support
 * hw/net/imx_fec: Convert debug fprintf() to trace events
 * target/arm/cpu: adjust virtual time for all KVM arm cpus
 * Implement configurable descriptor size in ftgmac100
 * hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
 * target/arm: More Neon decodetree conversion work

----------------------------------------------------------------
Erik Smit (1):
      Implement configurable descriptor size in ftgmac100

Guenter Roeck (2):
      sd: sdhci: Implement basic vendor specific register support
      hw: arm: Set vendor property for IMX SDHCI emulations

Jean-Christophe Dubois (2):
      hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
      hw/net/imx_fec: Convert debug fprintf() to trace events

Peter Maydell (17):
      target/arm: Fix missing temp frees in do_vshll_2sh
      target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
      target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
      target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
      target/arm: Convert Neon 3-reg-diff long multiplies
      target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
      target/arm: Convert Neon 3-reg-diff polynomial VMULL
      target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
      target/arm: Add missing TCG temp free in do_2shift_env_64()
      target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
      target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
      target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
      target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
      target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
      target/arm: Convert Neon VEXT to decodetree
      target/arm: Convert Neon VTBL, VTBX to decodetree
      target/arm: Convert Neon VDUP (scalar) to decodetree

fangying (1):
      target/arm/cpu: adjust virtual time for all KVM arm cpus

The widenfn() in do_vshll_2sh() does not free the input 32-bit
TCGv, so we need to do this in the calling code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 target/arm/translate-neon.inc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
     tmp = tcg_temp_new_i64();
 
     widenfn(tmp, rm0);
+    tcg_temp_free_i32(rm0);
     if (a->shift != 0) {
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
     neon_store_reg64(tmp, a->vd);
 
     widenfn(tmp, rm1);
+    tcg_temp_free_i32(rm1);
     if (a->shift != 0) {
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
-- 
2.20.1

Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
in the Neon 3-registers-different-lengths group to decodetree.
These insns work by widening one or both inputs to double their
size, performing an add or subtract at the doubled size and
then storing the double-size result.

As usual, rather than copying the loop of the original decoder
(which needs awkward code to avoid problems when source and
destination registers overlap) we just unroll the two passes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  43 +++++++++++++
 target/arm/translate-neon.inc.c | 104 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  16 ++---
 3 files changed, 151 insertions(+), 12 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_FU_2sh      1111 001 1 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
 # So we have a single decode line and check the cmode/op in the
 # trans function.
 Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+
+######################################################################
+# Within the "two registers, or three registers of different lengths"
+# grouping ([23,4]=0b10), bits [21:20] are either part of the opcode
+# decode: 0b11 for VEXT, two-reg-misc, VTBL, and duplicate-scalar;
+# or they are a size field for the three-reg-different-lengths and
+# two-reg-and-scalar insn groups (where size cannot be 0b11). This
+# is slightly awkward for decodetree: we handle it with this
+# non-exclusive group which contains within it two exclusive groups:
+# one for the size=0b11 patterns, and one for the size-not-0b11
+# patterns. This allows us to check that none of the insns within
+# each subgroup accidentally overlap each other. Note that all the
+# trans functions for the size-not-0b11 patterns must check and
+# return false for size==3.
+######################################################################
+{
+  # 0b11 subgroup will go here
+
+  # Subgroup for size != 0b11
+  [
+    ##################################################################
+    # 3-reg-different-length grouping:
+    # 1111 001 U 1 D sz!=11 Vn:4 Vd:4 opc:4 N 0 M 0 Vm:4
+    ##################################################################
+
+    &3diff vm vn vd size
+
+    @3diff       .... ... . . . size:2 .... .... .... . . . . .... \
+                 &3diff vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VADDL_S_3d   1111 001 0 1 . .. .... .... 0000 . 0 . 0 .... @3diff
+    VADDL_U_3d   1111 001 1 1 . .. .... .... 0000 . 0 . 0 .... @3diff
+
+    VADDW_S_3d   1111 001 0 1 . .. .... .... 0001 . 0 . 0 .... @3diff
+    VADDW_U_3d   1111 001 1 1 . .. .... .... 0001 . 0 . 0 .... @3diff
+
+    VSUBL_S_3d   1111 001 0 1 . .. .... .... 0010 . 0 . 0 .... @3diff
+    VSUBL_U_3d   1111 001 1 1 . .. .... .... 0010 . 0 . 0 .... @3diff
+
+    VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+    VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+  ]
+}
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
     }
     return do_1reg_imm(s, a, fn);
 }
+
+static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+                           NeonGenWidenFn *widenfn,
+                           NeonGenTwo64OpFn *opfn,
+                           bool src1_wide)
+{
+    /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
+    TCGv_i64 rn0_64, rn1_64, rm_64;
+    TCGv_i32 rm;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!widenfn || !opfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rn0_64 = tcg_temp_new_i64();
+    rn1_64 = tcg_temp_new_i64();
+    rm_64 = tcg_temp_new_i64();
+
+    if (src1_wide) {
+        neon_load_reg64(rn0_64, a->vn);
+    } else {
+        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
+        widenfn(rn0_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
+    rm = neon_load_reg(a->vm, 0);
+
+    widenfn(rm_64, rm);
+    tcg_temp_free_i32(rm);
+    opfn(rn0_64, rn0_64, rm_64);
+
+    /*
+     * Load second pass inputs before storing the first pass result, to
+     * avoid incorrect results if a narrow input overlaps with the result.
+     */
+    if (src1_wide) {
+        neon_load_reg64(rn1_64, a->vn + 1);
+    } else {
+        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
+        widenfn(rn1_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
+    rm = neon_load_reg(a->vm, 1);
+
+    neon_store_reg64(rn0_64, a->vd);
+
+    widenfn(rm_64, rm);
+    tcg_temp_free_i32(rm);
+    opfn(rn1_64, rn1_64, rm_64);
+    neon_store_reg64(rn1_64, a->vd + 1);
+
+    tcg_temp_free_i64(rn0_64);
+    tcg_temp_free_i64(rn1_64);
+    tcg_temp_free_i64(rm_64);
+
+    return true;
+}
+
+#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+    {                                                                   \
+        static NeonGenWidenFn * const widenfn[] = {                     \
+            gen_helper_neon_widen_##S##8,                               \
+            gen_helper_neon_widen_##S##16,                              \
+            tcg_gen_##EXT##_i32_i64,                                    \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenTwo64OpFn * const addfn[] = {                     \
+            gen_helper_neon_##OP##l_u16,                                \
+            gen_helper_neon_##OP##l_u32,                                \
+            tcg_gen_##OP##_i64,                                         \
+            NULL,                                                       \
+        };                                                              \
+        return do_prewiden_3d(s, a, widenfn[a->size],                   \
+                              addfn[a->size], SRC1WIDE);                \
+    }
+
+DO_PREWIDEN(VADDL_S, s, ext, add, false)
+DO_PREWIDEN(VADDL_U, u, extu, add, false)
+DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
+DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
+DO_PREWIDEN(VADDW_S, s, ext, add, true)
+DO_PREWIDEN(VADDW_U, u, extu, add, true)
+DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
+DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 /* Three registers of different lengths.  */
                 int src1_wide;
                 int src2_wide;
-                int prewiden;
                 /* undefreq: bit 0 : UNDEF if size == 0
                  *           bit 1 : UNDEF if size == 1
                  *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 int undefreq;
                 /* prewiden, src1_wide, src2_wide, undefreq */
                 static const int neon_3reg_wide[16][4] = {
-                    {1, 0, 0, 0}, /* VADDL */
-                    {1, 1, 0, 0}, /* VADDW */
-                    {1, 0, 0, 0}, /* VSUBL */
-                    {1, 1, 0, 0}, /* VSUBW */
+                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
+                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
+                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
+                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                     {0, 1, 1, 0}, /* VADDHN */
                     {0, 0, 0, 0}, /* VABAL */
                     {0, 1, 1, 0}, /* VSUBHN */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
 
-                prewiden = neon_3reg_wide[op][0];
                 src1_wide = neon_3reg_wide[op][1];
                 src2_wide = neon_3reg_wide[op][2];
                 undefreq = neon_3reg_wide[op][3];
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         } else {
                             tmp = neon_load_reg(rn, pass);
                         }
-                        if (prewiden) {
-                            gen_neon_widen(cpu_V0, tmp, size, u);
-                        }
                     }
                     if (src2_wide) {
                         neon_load_reg64(cpu_V1, rm + pass);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         } else {
                             tmp2 = neon_load_reg(rm, pass);
                         }
-                        if (prewiden) {
-                            gen_neon_widen(cpu_V1, tmp2, size, u);
-                        }
                     }
                     switch (op) {
                     case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
-- 
2.20.1

Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
VRSUBHN in the Neon 3-registers-different-lengths group to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  6 +++
 target/arm/translate-neon.inc.c | 87 +++++++++++++++++++++++++++++++
 target/arm/translate.c          | 91 ++++-----------------------------
 3 files changed, 104 insertions(+), 80 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
     VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+
+    VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+    VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+
+    VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+    VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_PREWIDEN(VADDW_S, s, ext, add, true)
 DO_PREWIDEN(VADDW_U, u, extu, add, true)
 DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
 DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+
+static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
+                         NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
+{
+    /* 3-regs different lengths, narrowing (VADDHN/VSUBHN/VRADDHN/VRSUBHN) */
+    TCGv_i64 rn_64, rm_64;
+    TCGv_i32 rd0, rd1;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn || !narrowfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if ((a->vn | a->vm) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rn_64 = tcg_temp_new_i64();
+    rm_64 = tcg_temp_new_i64();
+    rd0 = tcg_temp_new_i32();
+    rd1 = tcg_temp_new_i32();
+
+    neon_load_reg64(rn_64, a->vn);
+    neon_load_reg64(rm_64, a->vm);
+
+    opfn(rn_64, rn_64, rm_64);
+
+    narrowfn(rd0, rn_64);
+
+    neon_load_reg64(rn_64, a->vn + 1);
+    neon_load_reg64(rm_64, a->vm + 1);
+
+    opfn(rn_64, rn_64, rm_64);
+
+    narrowfn(rd1, rn_64);
+
+    neon_store_reg(a->vd, 0, rd0);
+    neon_store_reg(a->vd, 1, rd1);
+
+    tcg_temp_free_i64(rn_64);
+    tcg_temp_free_i64(rm_64);
+
+    return true;
+}
+
+#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP)                       \
+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+    {                                                                   \
+        static NeonGenTwo64OpFn * const addfn[] = {                     \
+            gen_helper_neon_##OP##l_u16,                                \
+            gen_helper_neon_##OP##l_u32,                                \
+            tcg_gen_##OP##_i64,                                         \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenNarrowFn * const narrowfn[] = {                   \
+            gen_helper_neon_##NARROWTYPE##_high_u8,                     \
+            gen_helper_neon_##NARROWTYPE##_high_u16,                    \
+            EXTOP,                                                      \
+            NULL,                                                       \
+        };                                                              \
+        return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]);   \
+    }
+
+static void gen_narrow_round_high_u32(TCGv_i32 rd, TCGv_i64 rn)
+{
+    tcg_gen_addi_i64(rn, rn, 1u << 31);
+    tcg_gen_extrh_i64_i32(rd, rn);
+}
+
+DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
+DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
+DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
+DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
     }
 }
 
-static inline void gen_neon_subl(int size)
-{
-    switch (size) {
-    case 0: gen_helper_neon_subl_u16(CPU_V001); break;
-    case 1: gen_helper_neon_subl_u32(CPU_V001); break;
-    case 2: tcg_gen_sub_i64(CPU_V001); break;
-    default: abort();
-    }
-}
-
 static inline void gen_neon_negl(TCGv_i64 var, int size)
 {
     switch (size) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             op = (insn >> 8) & 0xf;
             if ((insn & (1 << 6)) == 0) {
                 /* Three registers of different lengths.  */
-                int src1_wide;
-                int src2_wide;
                 /* undefreq: bit 0 : UNDEF if size == 0
                  *           bit 1 : UNDEF if size == 1
                  *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VADDW: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
-                    {0, 1, 1, 0}, /* VADDHN */
+                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
                     {0, 0, 0, 0}, /* VABAL */
-                    {0, 1, 1, 0}, /* VSUBHN */
+                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                     {0, 0, 0, 0}, /* VABDL */
                     {0, 0, 0, 0}, /* VMLAL */
                     {0, 0, 0, 9}, /* VQDMLAL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
 
-                src1_wide = neon_3reg_wide[op][1];
-                src2_wide = neon_3reg_wide[op][2];
                 undefreq = neon_3reg_wide[op][3];
 
                 if ((undefreq & (1 << size)) ||
                     ((undefreq & 8) && u)) {
                     return 1;
                 }
-                if ((src1_wide && (rn & 1)) ||
-                    (src2_wide && (rm & 1)) ||
-                    (!src2_wide && (rd & 1))) {
+                if (rd & 1) {
                     return 1;
                 }
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 /* Avoid overlapping operands.  Wide source operands are
                    always aligned so will never overlap with wide
                    destinations in problematic ways.  */
-                if (rd == rm && !src2_wide) {
+                if (rd == rm) {
                     tmp = neon_load_reg(rm, 1);
                     neon_store_scratch(2, tmp);
-                } else if (rd == rn && !src1_wide) {
+                } else if (rd == rn) {
                     tmp = neon_load_reg(rn, 1);
                     neon_store_scratch(2, tmp);
                 }
                 tmp3 = NULL;
                 for (pass = 0; pass < 2; pass++) {
-                    if (src1_wide) {
-                        neon_load_reg64(cpu_V0, rn + pass);
-                        tmp = NULL;
+                    if (pass == 1 && rd == rn) {
+                        tmp = neon_load_scratch(2);
                     } else {
-                        if (pass == 1 && rd == rn) {
-                            tmp = neon_load_scratch(2);
-                        } else {
-                            tmp = neon_load_reg(rn, pass);
-                        }
+                        tmp = neon_load_reg(rn, pass);
                     }
-                    if (src2_wide) {
-                        neon_load_reg64(cpu_V1, rm + pass);
-                        tmp2 = NULL;
+                    if (pass == 1 && rd == rm) {
+                        tmp2 = neon_load_scratch(2);
                     } else {
-                        if (pass == 1 && rd == rm) {
-                            tmp2 = neon_load_scratch(2);
-                        } else {
-                            tmp2 = neon_load_reg(rm, pass);
-                        }
+                        tmp2 = neon_load_reg(rm, pass);
                     }
                     switch (op) {
-                    case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
-                        gen_neon_addl(size);
-                        break;
-                    case 2: case 3: case 6: /* VSUBL, VSUBW, VSUBHN, VRSUBHN */
-                        gen_neon_subl(size);
-                        break;
                     case 5: case 7: /* VABAL, VABDL */
                         switch ((size << 1) | u) {
                         case 0:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             abort();
                         }
                         neon_store_reg64(cpu_V0, rd + pass);
-                    } else if (op == 4 || op == 6) {
-                        /* Narrowing operation.  */
-                        tmp = tcg_temp_new_i32();
-                        if (!u) {
-                            switch (size) {
-                            case 0:
-                                gen_helper_neon_narrow_high_u8(tmp, cpu_V0);
-                                break;
-                            case 1:
-                                gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
-                                break;
-                            case 2:
-                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
-                                break;
-                            default: abort();
-                            }
-                        } else {
-                            switch (size) {
-                            case 0:
-                                gen_helper_neon_narrow_round_high_u8(tmp, cpu_V0);
-                                break;
-                            case 1:
-                                gen_helper_neon_narrow_round_high_u16(tmp, cpu_V0);
-                                break;
-                            case 2:
-                                tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
-                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
-                                break;
-                            default: abort();
-                            }
-                        }
-                        if (pass == 0) {
-                            tmp3 = tmp;
-                        } else {
-                            neon_store_reg(rd, 0, tmp3);
-                            neon_store_reg(rd, 1, tmp);
-                        }
                     } else {
                         /* Write back the result.  */
                         neon_store_reg64(cpu_V0, rd + pass);
-- 
2.20.1

Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
Like almost all the remaining insns in this group, these are
a combination of a two-input operation which returns a double width
result and then a possible accumulation of that double width
result into the destination.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.h          |   1 +
 target/arm/neon-dp.decode       |   6 ++
 target/arm/translate-neon.inc.c | 132 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  31 +-------
 4 files changed, 142 insertions(+), 28 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
 typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
 typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
 typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
+typedef void NeonGenTwoOpWidenFn(TCGv_i64, TCGv_i32, TCGv_i32);
 typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
 typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
     VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
 
+    VABAL_S_3d   1111 001 0 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+    VABAL_U_3d   1111 001 1 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+
     VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
     VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+
+    VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
+    VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
 DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
 DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
 DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
+
+static bool do_long_3d(DisasContext *s, arg_3diff *a,
+                       NeonGenTwoOpWidenFn *opfn,
+                       NeonGenTwo64OpFn *accfn)
+{
+    /*
+     * 3-regs different lengths, long operations.
+     * These perform an operation on two inputs that returns a double-width
+     * result, and then possibly perform an accumulation operation of
+     * that result into the double-width destination.
+     */
+    TCGv_i64 rd0, rd1, tmp;
+    TCGv_i32 rn, rm;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rd0 = tcg_temp_new_i64();
+    rd1 = tcg_temp_new_i64();
+
+    rn = neon_load_reg(a->vn, 0);
+    rm = neon_load_reg(a->vm, 0);
+    opfn(rd0, rn, rm);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rm);
+
+    rn = neon_load_reg(a->vn, 1);
+    rm = neon_load_reg(a->vm, 1);
+    opfn(rd1, rn, rm);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rm);
+
+    /* Don't store results until after all loads: they might overlap */
+    if (accfn) {
+        tmp = tcg_temp_new_i64();
+        neon_load_reg64(tmp, a->vd);
+        accfn(tmp, tmp, rd0);
+        neon_store_reg64(tmp, a->vd);
+        neon_load_reg64(tmp, a->vd + 1);
+        accfn(tmp, tmp, rd1);
+        neon_store_reg64(tmp, a->vd + 1);
+        tcg_temp_free_i64(tmp);
+    } else {
+        neon_store_reg64(rd0, a->vd);
+        neon_store_reg64(rd1, a->vd + 1);
+    }
+
+    tcg_temp_free_i64(rd0);
+    tcg_temp_free_i64(rd1);
+
+    return true;
+}
+
+static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_s16,
+        gen_helper_neon_abdl_s32,
+        gen_helper_neon_abdl_s64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VABDL_U_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_u16,
+        gen_helper_neon_abdl_u32,
+        gen_helper_neon_abdl_u64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_s16,
+        gen_helper_neon_abdl_s32,
+        gen_helper_neon_abdl_s64,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const addfn[] = {
+        gen_helper_neon_addl_u16,
+        gen_helper_neon_addl_u32,
+        tcg_gen_add_i64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
+}
+
+static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_u16,
+        gen_helper_neon_abdl_u32,
+        gen_helper_neon_abdl_u64,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const addfn[] = {
+        gen_helper_neon_addl_u16,
+        gen_helper_neon_addl_u32,
+        tcg_gen_add_i64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                     {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
-                    {0, 0, 0, 0}, /* VABAL */
+                    {0, 0, 0, 7}, /* VABAL */
                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
-                    {0, 0, 0, 0}, /* VABDL */
+                    {0, 0, 0, 7}, /* VABDL */
                     {0, 0, 0, 0}, /* VMLAL */
                     {0, 0, 0, 9}, /* VQDMLAL */
                     {0, 0, 0, 0}, /* VMLSL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         tmp2 = neon_load_reg(rm, pass);
                     }
                     switch (op) {
-                    case 5: case 7: /* VABAL, VABDL */
-                        switch ((size << 1) | u) {
-                        case 0:
-                            gen_helper_neon_abdl_s16(cpu_V0, tmp, tmp2);
-                            break;
-                        case 1:
-                            gen_helper_neon_abdl_u16(cpu_V0, tmp, tmp2);
-                            break;
-                        case 2:
-                            gen_helper_neon_abdl_s32(cpu_V0, tmp, tmp2);
-                            break;
-                        case 3:
-                            gen_helper_neon_abdl_u32(cpu_V0, tmp, tmp2);
-                            break;
-                        case 4:
-                            gen_helper_neon_abdl_s64(cpu_V0, tmp, tmp2);
-                            break;
-                        case 5:
-                            gen_helper_neon_abdl_u64(cpu_V0, tmp, tmp2);
-                            break;
-                        default: abort();
-                        }
-                        tcg_temp_free_i32(tmp2);
-                        tcg_temp_free_i32(tmp);
-                        break;
                     case 8: case 9: case 10: case 11: case 12: case 13:
                         /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
                         gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         case 10: /* VMLSL */
                             gen_neon_negl(cpu_V0, size);
                             /* Fall through */
-                        case 5: case 8: /* VABAL, VMLAL */
+                        case 8: /* VABAL, VMLAL */
                             gen_neon_addl(size);
                             break;
                         case 9: case 11: /* VQDMLAL, VQDMLSL */
-- 
2.20.1

Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
a 32x32->64 multiply with possible accumulate.

Note that for VMLSL we do the accumulate directly with a subtraction
rather than doing a negate-then-add as the old code did.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  9 +++++
 target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 21 +++-------
 3 files changed, 86 insertions(+), 15 deletions(-)

Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
these are all saturating doubling long multiplies with a possible
accumulate step.

These are the last insns in the group which use the pass-over-each
elements loop, so we can delete that code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  6 +++
 target/arm/translate-neon.inc.c | 82 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 59 ++----------------------
 3 files changed, 92 insertions(+), 55 deletions(-)

Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
insn in this group to be converted.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  2 ++
 target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
 target/arm/translate.c          | 60 ++-------------------------------
 3 files changed, 48 insertions(+), 57 deletions(-)

Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
trans_VSHLL_U_2sh() as both 'static' and 'const'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-neon.inc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
 
 static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
 {
-    NeonGenWidenFn *widenfn[] = {
+    static NeonGenWidenFn * const widenfn[] = {
         gen_helper_neon_widen_s8,
         gen_helper_neon_widen_s16,
         tcg_gen_ext_i32_i64,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
 
 static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
 {
-    NeonGenWidenFn *widenfn[] = {
+    static NeonGenWidenFn * const widenfn[] = {
         gen_helper_neon_widen_u8,
         gen_helper_neon_widen_u16,
         tcg_gen_extu_i32_i64,
-- 
2.20.1

Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
scalar" group to decodetree.  These are 32x32->32 operations where
one of the inputs is the scalar, followed by a possible accumulate
operation of the 32-bit result.

The refactoring removes some of the oddities of the old decoder:
 * operands to the operation and accumulation were often
   reversed (taking advantage of the fact that most of these ops
   are commutative); the new code follows the pseudocode order
 * the Q bit in the insn was in a local variable 'u'; in the
   new code it is decoded into a->q

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  15 ++++
 target/arm/translate-neon.inc.c | 133 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  77 ++----------------
 3 files changed, 154 insertions(+), 71 deletions(-)

Convert the float versions of VMLA, VMLS and VMUL in the Neon
2-reg-scalar group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
As noted in the comment on the WRAP_FP_FN macro, we could have
had a do_2scalar_fp() function, but for 3 insns it seemed
simpler to just do the wrapping to get hold of the fpstatus ptr.
(These are the only fp insns in the group.)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 65 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 37 ++-----------------
 3 files changed, 71 insertions(+), 34 deletions(-)

Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 +++
 target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
 target/arm/translate.c          | 42 ++-------------------------------
 3 files changed, 34 insertions(+), 40 deletions(-)

Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 74 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 38 +----------------
 3 files changed, 79 insertions(+), 36 deletions(-)

Convert the Neon 2-reg-scalar long multiplies to decodetree.
These are the last instructions in the group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  18 ++++
 target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
 target/arm/translate.c          | 182 ++------------------------------
 3 files changed, 187 insertions(+), 176 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
+    # For the 'long' ops the Q bit is part of insn decode
+    @2scalar_q0  .... ... . . . size:2 .... .... .... . . . . .... \
+                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
 
     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
     VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
 
+    VMLAL_S_2sc  1111 001 0 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+    VMLAL_U_2sc  1111 001 1 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+
+    VQDMLAL_2sc  1111 001 0 1 . .. .... .... 0011 . 1 . 0 .... @2scalar_q0
+
     VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
     VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
 
+    VMLSL_S_2sc  1111 001 0 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
+    VMLSL_U_2sc  1111 001 1 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
+
+    VQDMLSL_2sc  1111 001 0 1 . .. .... .... 0111 . 1 . 0 .... @2scalar_q0
+
     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
     VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
 
+    VMULL_S_2sc  1111 001 0 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
+    VMULL_U_2sc  1111 001 1 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
+
+    VQDMULL_2sc  1111 001 0 1 . .. .... .... 1011 . 1 . 0 .... @2scalar_q0
+
     VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
     VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
 
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
     };
     return do_vqrdmlah_2sc(s, a, opfn[a->size]);
 }
+
+static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
+                            NeonGenTwoOpWidenFn *opfn,
+                            NeonGenTwo64OpFn *accfn)
+{
+    /*
+     * Two registers and a scalar, long operations: perform an
+     * operation on the input elements and the scalar which produces
+     * a double-width result, and then possibly perform an accumulation
+     * operation of that result into the destination.
+     */
+    TCGv_i32 scalar, rn;
+    TCGv_i64 rn0_64, rn1_64;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* Bad size (including size == 3, which is a different insn group) */
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    scalar = neon_get_scalar(a->size, a->vm);
+
+    /* Load all inputs before writing any outputs, in case of overlap */
+    rn = neon_load_reg(a->vn, 0);
+    rn0_64 = tcg_temp_new_i64();
+    opfn(rn0_64, rn, scalar);
+    tcg_temp_free_i32(rn);
+
+    rn = neon_load_reg(a->vn, 1);
+    rn1_64 = tcg_temp_new_i64();
+    opfn(rn1_64, rn, scalar);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(scalar);
+
+    if (accfn) {
+        TCGv_i64 t64 = tcg_temp_new_i64();
+        neon_load_reg64(t64, a->vd);
+        accfn(t64, t64, rn0_64);
+        neon_store_reg64(t64, a->vd);
+        neon_load_reg64(t64, a->vd + 1);
+        accfn(t64, t64, rn1_64);
+        neon_store_reg64(t64, a->vd + 1);
+        tcg_temp_free_i64(t64);
+    } else {
+        neon_store_reg64(rn0_64, a->vd);
+        neon_store_reg64(rn1_64, a->vd + 1);
+    }
+    tcg_temp_free_i64(rn0_64);
+    tcg_temp_free_i64(rn1_64);
+    return true;
+}
+
+static bool trans_VMULL_S_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mull_s16,
+        gen_mull_s32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mull_u16,
+        gen_mull_u32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+#define DO_VMLAL_2SC(INSN, MULL, ACC)                                   \
+    static bool trans_##INSN##_2sc(DisasContext *s, arg_2scalar *a)     \
+    {                                                                   \
+        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
+            NULL,                                                       \
+            gen_helper_neon_##MULL##16,                                 \
+            gen_##MULL##32,                                             \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenTwo64OpFn * const accfn[] = {                     \
+            NULL,                                                       \
+            gen_helper_neon_##ACC##l_u32,                               \
+            tcg_gen_##ACC##_i64,                                        \
+            NULL,                                                       \
+        };                                                              \
+        return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);    \
+    }
+
+DO_VMLAL_2SC(VMLAL_S, mull_s, add)
+DO_VMLAL_2SC(VMLAL_U, mull_u, add)
+DO_VMLAL_2SC(VMLSL_S, mull_s, sub)
+DO_VMLAL_2SC(VMLSL_U, mull_u, sub)
+
+static bool trans_VQDMULL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VQDMLAL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const accfn[] = {
+        NULL,
+        gen_VQDMLAL_acc_16,
+        gen_VQDMLAL_acc_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
+}
+
+static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const accfn[] = {
+        NULL,
+        gen_VQDMLSL_acc_16,
+        gen_VQDMLSL_acc_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
     tcg_gen_ext16s_i32(dest, var);
 }
 
-/* 32x32->64 multiply.  Marks inputs as dead.  */
-static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
-{
-    TCGv_i32 lo = tcg_temp_new_i32();
-    TCGv_i32 hi = tcg_temp_new_i32();
-    TCGv_i64 ret;
-
-    tcg_gen_mulu2_i32(lo, hi, a, b);
-    tcg_temp_free_i32(a);
-    tcg_temp_free_i32(b);
-
-    ret = tcg_temp_new_i64();
-    tcg_gen_concat_i32_i64(ret, lo, hi);
-    tcg_temp_free_i32(lo);
-    tcg_temp_free_i32(hi);
-
-    return ret;
-}
-
-static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
-{
-    TCGv_i32 lo = tcg_temp_new_i32();
-    TCGv_i32 hi = tcg_temp_new_i32();
-    TCGv_i64 ret;
-
-    tcg_gen_muls2_i32(lo, hi, a, b);
-    tcg_temp_free_i32(a);
-    tcg_temp_free_i32(b);
-
-    ret = tcg_temp_new_i64();
-    tcg_gen_concat_i32_i64(ret, lo, hi);
-    tcg_temp_free_i32(lo);
-    tcg_temp_free_i32(hi);
-
-    return ret;
-}
-
 /* Swap low and high halfwords.  */
 static void gen_swap_half(TCGv_i32 var)
 {
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
     }
 }
 
-static inline void gen_neon_negl(TCGv_i64 var, int size)
-{
-    switch (size) {
-    case 0: gen_helper_neon_negl_u16(var, var); break;
-    case 1: gen_helper_neon_negl_u32(var, var); break;
-    case 2:
-        tcg_gen_neg_i64(var, var);
-        break;
-    default: abort();
-    }
-}
-
-static inline void gen_neon_addl_saturate(TCGv_i64 op0, TCGv_i64 op1, int size)
-{
-    switch (size) {
-    case 1: gen_helper_neon_addl_saturate_s32(op0, cpu_env, op0, op1); break;
-    case 2: gen_helper_neon_addl_saturate_s64(op0, cpu_env, op0, op1); break;
-    default: abort();
-    }
-}
-
-static inline void gen_neon_mull(TCGv_i64 dest, TCGv_i32 a, TCGv_i32 b,
-                                 int size, int u)
-{
-    TCGv_i64 tmp;
-
-    switch ((size << 1) | u) {
-    case 0: gen_helper_neon_mull_s8(dest, a, b); break;
-    case 1: gen_helper_neon_mull_u8(dest, a, b); break;
-    case 2: gen_helper_neon_mull_s16(dest, a, b); break;
-    case 3: gen_helper_neon_mull_u16(dest, a, b); break;
-    case 4:
-        tmp = gen_muls_i64_i32(a, b);
-        tcg_gen_mov_i64(dest, tmp);
-        tcg_temp_free_i64(tmp);
-        break;
-    case 5:
-        tmp = gen_mulu_i64_i32(a, b);
-        tcg_gen_mov_i64(dest, tmp);
-        tcg_temp_free_i64(tmp);
-        break;
-    default: abort();
-    }
-
-    /* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
-       Don't forget to clean them now.  */
-    if (size < 2) {
-        tcg_temp_free_i32(a);
-        tcg_temp_free_i32(b);
-    }
-}
-
 static void gen_neon_narrow_op(int op, int u, int size,
                                TCGv_i32 dest, TCGv_i64 src)
 {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int u;
     int vec_size;
     uint32_t imm;
-    TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
+    TCGv_i32 tmp, tmp2, tmp3, tmp5;
     TCGv_ptr ptr1;
     TCGv_i64 tmp64;
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         return 1;
     } else { /* (insn & 0x00800010 == 0x00800000) */
         if (size != 3) {
-            op = (insn >> 8) & 0xf;
-            if ((insn & (1 << 6)) == 0) {
-                /* Three registers of different lengths: handled by decodetree */
-                return 1;
-            } else {
-                /* Two registers and a scalar. NB that for ops of this form
-                 * the ARM ARM labels bit 24 as Q, but it is in our variable
-                 * 'u', not 'q'.
-                 */
-                if (size == 0) {
-                    return 1;
-                }
-                switch (op) {
-                case 0: /* Integer VMLA scalar */
-                case 4: /* Integer VMLS scalar */
-                case 8: /* Integer VMUL scalar */
-                case 1: /* Float VMLA scalar */
-                case 5: /* Floating point VMLS scalar */
-                case 9: /* Floating point VMUL scalar */
-                case 12: /* VQDMULH scalar */
-                case 13: /* VQRDMULH scalar */
-                case 14: /* VQRDMLAH scalar */
-                case 15: /* VQRDMLSH scalar */
-                    return 1; /* handled by decodetree */
-
-                case 3: /* VQDMLAL scalar */
-                case 7: /* VQDMLSL scalar */
-                case 11: /* VQDMULL scalar */
-                    if (u == 1) {
-                        return 1;
-                    }
-                    /* fall through */
-                case 2: /* VMLAL sclar */
-                case 6: /* VMLSL scalar */
-                case 10: /* VMULL scalar */
-                    if (rd & 1) {
-                        return 1;
-                    }
-                    tmp2 = neon_get_scalar(size, rm);
-                    /* We need a copy of tmp2 because gen_neon_mull
-                     * deletes it during pass 0.  */
-                    tmp4 = tcg_temp_new_i32();
-                    tcg_gen_mov_i32(tmp4, tmp2);
-                    tmp3 = neon_load_reg(rn, 1);
-
-                    for (pass = 0; pass < 2; pass++) {
-                        if (pass == 0) {
-                            tmp = neon_load_reg(rn, 0);
-                        } else {
-                            tmp = tmp3;
-                            tmp2 = tmp4;
-                        }
-                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
-                        if (op != 11) {
-                            neon_load_reg64(cpu_V1, rd + pass);
-                        }
-                        switch (op) {
-                        case 6:
-                            gen_neon_negl(cpu_V0, size);
-                            /* Fall through */
-                        case 2:
-                            gen_neon_addl(size);
-                            break;
-                        case 3: case 7:
-                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                            if (op == 7) {
-                                gen_neon_negl(cpu_V0, size);
-                            }
-                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
-                            break;
-                        case 10:
-                            /* no-op */
-                            break;
-                        case 11:
-                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                            break;
-                        default:
-                            abort();
-                        }
-                        neon_store_reg64(cpu_V0, rd + pass);
-                    }
-                    break;
-                default:
-                    g_assert_not_reached();
-                }
-            }
+            /*
+             * Three registers of different lengths, or two registers and
+             * a scalar: handled by decodetree
+             */
+            return 1;
         } else { /* size == 3 */
             if (!u) {
                 /* Extract.  */
-- 
2.20.1

Convert the Neon VEXT insn to decodetree. Rather than keeping the
old implementation which used fixed temporaries cpu_V0 and cpu_V1
and did the extraction with by-hand shift and logic ops, we use
the TCG extract2 insn.

We don't need to special case 0 or 8 immediates any more as the
optimizer is smart enough to throw away the dead code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  8 +++-
 target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 58 +------------------------
 3 files changed, 85 insertions(+), 57 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 # return false for size==3.
 ######################################################################
 {
-  # 0b11 subgroup will go here
+  [
+    ##################################################################
+    # Miscellaneous size=0b11 insns
+    ##################################################################
+    VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+  ]
 
   # Subgroup for size != 0b11
   [
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
 
     return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
 }
+
+static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (a->imm > 7 && !a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (!a->q) {
+        /* Extract 64 bits from <Vm:Vn> */
+        TCGv_i64 left, right, dest;
+
+        left = tcg_temp_new_i64();
+        right = tcg_temp_new_i64();
+        dest = tcg_temp_new_i64();
+
+        neon_load_reg64(right, a->vn);
+        neon_load_reg64(left, a->vm);
+        tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
+        neon_store_reg64(dest, a->vd);
+
+        tcg_temp_free_i64(left);
+        tcg_temp_free_i64(right);
+        tcg_temp_free_i64(dest);
+    } else {
+        /* Extract 128 bits from <Vm+1:Vm:Vn+1:Vn> */
+        TCGv_i64 left, middle, right, destleft, destright;
+
+        left = tcg_temp_new_i64();
+        middle = tcg_temp_new_i64();
+        right = tcg_temp_new_i64();
+        destleft = tcg_temp_new_i64();
+        destright = tcg_temp_new_i64();
+
+        if (a->imm < 8) {
+            neon_load_reg64(right, a->vn);
+            neon_load_reg64(middle, a->vn + 1);
+            tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
+            neon_load_reg64(left, a->vm);
+            tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
+        } else {
+            neon_load_reg64(right, a->vn + 1);
+            neon_load_reg64(middle, a->vm);
+            tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
+            neon_load_reg64(left, a->vm + 1);
+            tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
+        }
+
+        neon_store_reg64(destright, a->vd);
+        neon_store_reg64(destleft, a->vd + 1);
+
+        tcg_temp_free_i64(destright);
+        tcg_temp_free_i64(destleft);
+        tcg_temp_free_i64(right);
+        tcg_temp_free_i64(middle);
+        tcg_temp_free_i64(left);
+    }
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int pass;
     int u;
     int vec_size;
-    uint32_t imm;
     TCGv_i32 tmp, tmp2, tmp3, tmp5;
     TCGv_ptr ptr1;
-    TCGv_i64 tmp64;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         } else { /* size == 3 */
             if (!u) {
-                /* Extract.  */
-                imm = (insn >> 8) & 0xf;
-
-                if (imm > 7 && !q)
-                    return 1;
-
-                if (q && ((rd | rn | rm) & 1)) {
-                    return 1;
-                }
-
-                if (imm == 0) {
-                    neon_load_reg64(cpu_V0, rn);
-                    if (q) {
-                        neon_load_reg64(cpu_V1, rn + 1);
-                    }
-                } else if (imm == 8) {
-                    neon_load_reg64(cpu_V0, rn + 1);
-                    if (q) {
-                        neon_load_reg64(cpu_V1, rm);
-                    }
-                } else if (q) {
-                    tmp64 = tcg_temp_new_i64();
-                    if (imm < 8) {
-                        neon_load_reg64(cpu_V0, rn);
-                        neon_load_reg64(tmp64, rn + 1);
-                    } else {
-                        neon_load_reg64(cpu_V0, rn + 1);
-                        neon_load_reg64(tmp64, rm);
-                    }
-                    tcg_gen_shri_i64(cpu_V0, cpu_V0, (imm & 7) * 8);
-                    tcg_gen_shli_i64(cpu_V1, tmp64, 64 - ((imm & 7) * 8));
-                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
-                    if (imm < 8) {
-                        neon_load_reg64(cpu_V1, rm);
-                    } else {
-                        neon_load_reg64(cpu_V1, rm + 1);
-                        imm -= 8;
-                    }
-                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
-                    tcg_gen_shri_i64(tmp64, tmp64, imm * 8);
-                    tcg_gen_or_i64(cpu_V1, cpu_V1, tmp64);
-                    tcg_temp_free_i64(tmp64);
-                } else {
-                    /* BUGFIX */
-                    neon_load_reg64(cpu_V0, rn);
-                    tcg_gen_shri_i64(cpu_V0, cpu_V0, imm * 8);
-                    neon_load_reg64(cpu_V1, rm);
-                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
-                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
-                }
-                neon_store_reg64(cpu_V0, rd);
-                if (q) {
-                    neon_store_reg64(cpu_V1, rd + 1);
-                }
+                /* Extract: handled by decodetree */
+                return 1;
             } else if ((insn & (1 << 11)) == 0) {
                 /* Two register misc.  */
                 op = ((insn >> 12) & 0x30) | ((insn >> 7) & 0xf);
-- 
2.20.1

Convert the Neon VTBL, VTBX instructions to decodetree.  The actual
implementation of the insn is copied across to the new trans function
unchanged except for renaming 'tmp5' to 'tmp4'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 41 +++---------------------
 3 files changed, 63 insertions(+), 37 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     ##################################################################
     VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
   ]
 
   # Subgroup for size != 0b11
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
     }
     return true;
 }
+
+static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
+{
+    int n;
+    TCGv_i32 tmp, tmp2, tmp3, tmp4;
+    TCGv_ptr ptr1;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    n = a->len + 1;
+    if ((a->vn + n) > 32) {
+        /*
+         * This is UNPREDICTABLE; we choose to UNDEF to avoid the
+         * helper function running off the end of the register file.
+         */
+        return false;
+    }
+    n <<= 3;
+    if (a->op) {
+        tmp = neon_load_reg(a->vd, 0);
+    } else {
+        tmp = tcg_temp_new_i32();
+        tcg_gen_movi_i32(tmp, 0);
+    }
+    tmp2 = neon_load_reg(a->vm, 0);
+    ptr1 = vfp_reg_ptr(true, a->vn);
+    tmp4 = tcg_const_i32(n);
+    gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp);
+    if (a->op) {
+        tmp = neon_load_reg(a->vd, 1);
+    } else {
+        tmp = tcg_temp_new_i32();
+        tcg_gen_movi_i32(tmp, 0);
+    }
+    tmp3 = neon_load_reg(a->vm, 1);
+    gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp4);
+    tcg_temp_free_ptr(ptr1);
+    neon_store_reg(a->vd, 0, tmp2);
+    neon_store_reg(a->vd, 1, tmp3);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
 {
     int op;
     int q;
-    int rd, rn, rm, rd_ofs, rm_ofs;
+    int rd, rm, rd_ofs, rm_ofs;
     int size;
     int pass;
     int u;
     int vec_size;
-    TCGv_i32 tmp, tmp2, tmp3, tmp5;
-    TCGv_ptr ptr1;
+    TCGv_i32 tmp, tmp2, tmp3;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     q = (insn & (1 << 6)) != 0;
     u = (insn >> 24) & 1;
     VFP_DREG_D(rd, insn);
-    VFP_DREG_N(rn, insn);
     VFP_DREG_M(rm, insn);
     size = (insn >> 20) & 3;
     vec_size = q ? 16 : 8;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     break;
                 }
             } else if ((insn & (1 << 10)) == 0) {
-                /* VTBL, VTBX.  */
-                int n = ((insn >> 8) & 3) + 1;
-                if ((rn + n) > 32) {
-                    /* This is UNPREDICTABLE; we choose to UNDEF to avoid the
-                     * helper function running off the end of the register file.
-                     */
-                    return 1;
-                }
-                n <<= 3;
-                if (insn & (1 << 6)) {
-                    tmp = neon_load_reg(rd, 0);
-                } else {
-                    tmp = tcg_temp_new_i32();
-                    tcg_gen_movi_i32(tmp, 0);
-                }
-                tmp2 = neon_load_reg(rm, 0);
-                ptr1 = vfp_reg_ptr(true, rn);
-                tmp5 = tcg_const_i32(n);
-                gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
-                tcg_temp_free_i32(tmp);
-                if (insn & (1 << 6)) {
-                    tmp = neon_load_reg(rd, 1);
-                } else {
-                    tmp = tcg_temp_new_i32();
-                    tcg_gen_movi_i32(tmp, 0);
-                }
-                tmp3 = neon_load_reg(rm, 1);
-                gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
-                tcg_temp_free_i32(tmp5);
-                tcg_temp_free_ptr(ptr1);
-                neon_store_reg(rd, 0, tmp2);
-                neon_store_reg(rd, 1, tmp3);
-                tcg_temp_free_i32(tmp);
+                /* VTBL, VTBX: handled by decodetree */
+                return 1;
             } else if ((insn & 0x380) == 0) {
                 /* VDUP */
                 int element;
-- 
2.20.1

Convert the Neon VDUP (scalar) insn to decodetree.  (Note that we
can't call this just "VDUP" as we used that already in vfp.decode for
the "VDUP (general purpose register" insn.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  7 +++++++
 target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
 target/arm/translate.c          | 25 +------------------------
 3 files changed, 34 insertions(+), 24 deletions(-)

From: Jean-Christophe Dubois <jcd@tribudubois.net>

Some bits of the CCM registers are non writable.

This was left undone in the initial commit (all bits of registers were
writable).

This patch adds the required code to protect the non writable bits.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 20200608133508.550046-1-jcd@tribudubois.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/imx6ul_ccm.c | 76 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 13 deletions(-)

diff --git a/hw/misc/imx6ul_ccm.c b/hw/misc/imx6ul_ccm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/imx6ul_ccm.c
+++ b/hw/misc/imx6ul_ccm.c
@@ -XXX,XX +XXX,XX @@
 
 #include "trace.h"
 
+static const uint32_t ccm_mask[CCM_MAX] = {
+    [CCM_CCR] = 0xf01fef80,
+    [CCM_CCDR] = 0xfffeffff,
+    [CCM_CSR] = 0xffffffff,
+    [CCM_CCSR] = 0xfffffef2,
+    [CCM_CACRR] = 0xfffffff8,
+    [CCM_CBCDR] = 0xc1f8e000,
+    [CCM_CBCMR] = 0xfc03cfff,
+    [CCM_CSCMR1] = 0x80700000,
+    [CCM_CSCMR2] = 0xe01ff003,
+    [CCM_CSCDR1] = 0xfe00c780,
+    [CCM_CS1CDR] = 0xfe00fe00,
+    [CCM_CS2CDR] = 0xf8007000,
+    [CCM_CDCDR] = 0xf00fffff,
+    [CCM_CHSCCDR] = 0xfffc01ff,
+    [CCM_CSCDR2] = 0xfe0001ff,
+    [CCM_CSCDR3] = 0xffffc1ff,
+    [CCM_CDHIPR] = 0xffffffff,
+    [CCM_CTOR] = 0x00000000,
+    [CCM_CLPCR] = 0xf39ff01c,
+    [CCM_CISR] = 0xfb85ffbe,
+    [CCM_CIMR] = 0xfb85ffbf,
+    [CCM_CCOSR] = 0xfe00fe00,
+    [CCM_CGPR] = 0xfffc3fea,
+    [CCM_CCGR0] = 0x00000000,
+    [CCM_CCGR1] = 0x00000000,
+    [CCM_CCGR2] = 0x00000000,
+    [CCM_CCGR3] = 0x00000000,
+    [CCM_CCGR4] = 0x00000000,
+    [CCM_CCGR5] = 0x00000000,
+    [CCM_CCGR6] = 0x00000000,
+    [CCM_CMEOR] = 0xafffff1f,
+};
+
+static const uint32_t analog_mask[CCM_ANALOG_MAX] = {
+    [CCM_ANALOG_PLL_ARM] = 0xfff60f80,
+    [CCM_ANALOG_PLL_USB1] = 0xfffe0fbc,
+    [CCM_ANALOG_PLL_USB2] = 0xfffe0fbc,
+    [CCM_ANALOG_PLL_SYS] = 0xfffa0ffe,
+    [CCM_ANALOG_PLL_SYS_SS] = 0x00000000,
+    [CCM_ANALOG_PLL_SYS_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_SYS_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_AUDIO] = 0xffe20f80,
+    [CCM_ANALOG_PLL_AUDIO_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_AUDIO_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_VIDEO] = 0xffe20f80,
+    [CCM_ANALOG_PLL_VIDEO_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_VIDEO_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_ENET] = 0xffc20ff0,
+    [CCM_ANALOG_PFD_480] = 0x40404040,
+    [CCM_ANALOG_PFD_528] = 0x40404040,
+    [PMU_MISC0] = 0x01fe8306,
+    [PMU_MISC1] = 0x07fcede0,
+    [PMU_MISC2] = 0x005f5f5f,
+};
+
 static const char *imx6ul_ccm_reg_name(uint32_t reg)
 {
     static char unknown[20];
@@ -XXX,XX +XXX,XX @@ static void imx6ul_ccm_write(void *opaque, hwaddr offset, uint64_t value,
 
     trace_ccm_write_reg(imx6ul_ccm_reg_name(index), (uint32_t)value);
 
-    /*
-     * We will do a better implementation later. In particular some bits
-     * cannot be written to.
-     */
-    s->ccm[index] = (uint32_t)value;
+    s->ccm[index] = (s->ccm[index] & ccm_mask[index]) |
+                           ((uint32_t)value & ~ccm_mask[index]);
 }
 
 static uint64_t imx6ul_analog_read(void *opaque, hwaddr offset, unsigned size)
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, setting bits passed in the value.
          */
-        s->analog[index - 1] |= value;
+        s->analog[index - 1] |= (value & ~analog_mask[index - 1]);
         break;
     case CCM_ANALOG_PLL_ARM_CLR:
     case CCM_ANALOG_PLL_USB1_CLR:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, unsetting bits passed in the value.
          */
-        s->analog[index - 2] &= ~value;
+        s->analog[index - 2] &= ~(value & ~analog_mask[index - 2]);
         break;
     case CCM_ANALOG_PLL_ARM_TOG:
     case CCM_ANALOG_PLL_USB1_TOG:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, toggling bits passed in the value.
          */
-        s->analog[index - 3] ^= value;
+        s->analog[index - 3] ^= (value & ~analog_mask[index - 3]);
         break;
     default:
-        /*
-         * We will do a better implementation later. In particular some bits
-         * cannot be written to.
-         */
-        s->analog[index] = value;
+        s->analog[index] = (s->analog[index] & analog_mask[index]) |
+                           (value & ~analog_mask[index]);
         break;
     }
 }
-- 
2.20.1

From: Erik Smit <erik.lucas.smit@gmail.com>

The hardware supports configurable descriptor sizes, configured in the DBLAC
register.

Most drivers use the default 4 word descriptor, which is currently hardcoded,
but Aspeed SDK configures 8 words to store extra data.

Signed-off-by: Erik Smit <erik.lucas.smit@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[PMM: removed unnecessary parens]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/ftgmac100.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/ftgmac100.c
+++ b/hw/net/ftgmac100.c
@@ -XXX,XX +XXX,XX @@
 #define FTGMAC100_APTC_TXPOLL_CNT(x)        (((x) >> 8) & 0xf)
 #define FTGMAC100_APTC_TXPOLL_TIME_SEL      (1 << 12)
 
+/*
+ * DMA burst length and arbitration control register
+ */
+#define FTGMAC100_DBLAC_RXBURST_SIZE(x)     (((x) >> 8) & 0x3)
+#define FTGMAC100_DBLAC_TXBURST_SIZE(x)     (((x) >> 10) & 0x3)
+#define FTGMAC100_DBLAC_RXDES_SIZE(x)       ((((x) >> 12) & 0xf) * 8)
+#define FTGMAC100_DBLAC_TXDES_SIZE(x)       ((((x) >> 16) & 0xf) * 8)
+#define FTGMAC100_DBLAC_IFG_CNT(x)          (((x) >> 20) & 0x7)
+#define FTGMAC100_DBLAC_IFG_INC             (1 << 23)
+
 /*
  * PHY control register
  */
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_do_tx(FTGMAC100State *s, uint32_t tx_ring,
         if (bd.des0 & s->txdes0_edotr) {
             addr = tx_ring;
         } else {
-            addr += sizeof(FTGMAC100Desc);
+            addr += FTGMAC100_DBLAC_TXDES_SIZE(s->dblac);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_write(void *opaque, hwaddr addr,
         s->phydata = value & 0xffff;
         break;
     case FTGMAC100_DBLAC: /* DMA Burst Length and Arbitration Control */
+        if (FTGMAC100_DBLAC_TXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s: transmit descriptor too small : %d bytes\n",
+                          __func__, FTGMAC100_DBLAC_TXDES_SIZE(s->dblac));
+            break;
+        }
+        if (FTGMAC100_DBLAC_RXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s: receive descriptor too small : %d bytes\n",
+                          __func__, FTGMAC100_DBLAC_RXDES_SIZE(s->dblac));
+            break;
+        }
         s->dblac = value;
         break;
     case FTGMAC100_REVR:  /* Feature Register */
@@ -XXX,XX +XXX,XX @@ static ssize_t ftgmac100_receive(NetClientState *nc, const uint8_t *buf,
         if (bd.des0 & s->rxdes0_edorr) {
             addr = s->rx_ring;
         } else {
-            addr += sizeof(FTGMAC100Desc);
+            addr += FTGMAC100_DBLAC_RXDES_SIZE(s->dblac);
         }
     }
     s->rx_descriptor = addr;
-- 
2.20.1

From: fangying <fangying1@huawei.com>

Virtual time adjustment was implemented for virt-5.0 machine type,
but the cpu property was enabled only for host-passthrough and max
cpu model.  Let's add it for any KVM arm cpu which has the generic
timer feature enabled.

Signed-off-by: Ying Fang <fangying1@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 20200608121243.2076-1-fangying1@huawei.com
[PMM: minor commit message tweak, removed inaccurate
 suggested-by tag]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c   |  6 ++++--
 target/arm/cpu64.c |  1 -
 target/arm/kvm.c   | 21 +++++++++++----------
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
     if (arm_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER)) {
         qdev_property_add_static(DEVICE(cpu), &arm_cpu_gt_cntfrq_property);
     }
+
+    if (kvm_enabled()) {
+        kvm_arm_add_vcpu_properties(obj);
+    }
 }
 
 static void arm_cpu_finalizefn(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
-        kvm_arm_add_vcpu_properties(obj);
     } else {
         cortex_a15_initfn(obj);
 
@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
         aarch64_add_sve_properties(obj);
     }
-    kvm_arm_add_vcpu_properties(obj);
     arm_cpu_post_init(obj);
 }
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
-        kvm_arm_add_vcpu_properties(obj);
     } else {
         uint64_t t;
         uint32_t u;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp)
 /* KVM VCPU properties should be prefixed with "kvm-". */
 void kvm_arm_add_vcpu_properties(Object *obj)
 {
-    if (!kvm_enabled()) {
-        return;
-    }
+    ARMCPU *cpu = ARM_CPU(obj);
+    CPUARMState *env = &cpu->env;
 
-    ARM_CPU(obj)->kvm_adjvtime = true;
-    object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
-                             kvm_no_adjvtime_set);
-    object_property_set_description(obj, "kvm-no-adjvtime",
-                                    "Set on to disable the adjustment of "
-                                    "the virtual counter. VM stopped time "
-                                    "will be counted.");
+    if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
+        cpu->kvm_adjvtime = true;
+        object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
+                                 kvm_no_adjvtime_set);
+        object_property_set_description(obj, "kvm-no-adjvtime",
+                                        "Set on to disable the adjustment of "
+                                        "the virtual counter. VM stopped time "
+                                        "will be counted.");
+    }
 }
 
 bool kvm_arm_pmu_supported(CPUState *cpu)
-- 
2.20.1

From: Jean-Christophe Dubois <jcd@tribudubois.net>

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
[PMD: Fixed 32-bit format string using PRIx32/PRIx64]
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/imx_fec.c    | 106 +++++++++++++++++++-------------------------
 hw/net/trace-events |  18 ++++++++
 2 files changed, 63 insertions(+), 61 deletions(-)

diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/imx_fec.c
+++ b/hw/net/imx_fec.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/module.h"
 #include "net/checksum.h"
 #include "net/eth.h"
+#include "trace.h"
 
 /* For crc32 */
 #include <zlib.h>
 
-#ifndef DEBUG_IMX_FEC
-#define DEBUG_IMX_FEC 0
-#endif
-
-#define FEC_PRINTF(fmt, args...) \
-    do { \
-        if (DEBUG_IMX_FEC) { \
-            fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_FEC, \
-                                             __func__, ##args); \
-        } \
-    } while (0)
-
-#ifndef DEBUG_IMX_PHY
-#define DEBUG_IMX_PHY 0
-#endif
-
-#define PHY_PRINTF(fmt, args...) \
-    do { \
-        if (DEBUG_IMX_PHY) { \
-            fprintf(stderr, "[%s.phy]%s: " fmt , TYPE_IMX_FEC, \
-                                                 __func__, ##args); \
-        } \
-    } while (0)
-
 #define IMX_MAX_DESC    1024
 
 static const char *imx_default_reg_name(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static void imx_eth_update(IMXFECState *s);
  * For now we don't handle any GPIO/interrupt line, so the OS will
  * have to poll for the PHY status.
  */
-static void phy_update_irq(IMXFECState *s)
+static void imx_phy_update_irq(IMXFECState *s)
 {
     imx_eth_update(s);
 }
 
-static void phy_update_link(IMXFECState *s)
+static void imx_phy_update_link(IMXFECState *s)
 {
     /* Autonegotiation status mirrors link status.  */
     if (qemu_get_queue(s->nic)->link_down) {
-        PHY_PRINTF("link is down\n");
+        trace_imx_phy_update_link("down");
         s->phy_status &= ~0x0024;
         s->phy_int |= PHY_INT_DOWN;
     } else {
-        PHY_PRINTF("link is up\n");
+        trace_imx_phy_update_link("up");
         s->phy_status |= 0x0024;
         s->phy_int |= PHY_INT_ENERGYON;
         s->phy_int |= PHY_INT_AUTONEG_COMPLETE;
     }
-    phy_update_irq(s);
+    imx_phy_update_irq(s);
 }
 
 static void imx_eth_set_link(NetClientState *nc)
 {
-    phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
+    imx_phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
 }
 
-static void phy_reset(IMXFECState *s)
+static void imx_phy_reset(IMXFECState *s)
 {
+    trace_imx_phy_reset();
+
     s->phy_status = 0x7809;
     s->phy_control = 0x3000;
     s->phy_advertise = 0x01e1;
     s->phy_int_mask = 0;
     s->phy_int = 0;
-    phy_update_link(s);
+    imx_phy_update_link(s);
 }
 
-static uint32_t do_phy_read(IMXFECState *s, int reg)
+static uint32_t imx_phy_read(IMXFECState *s, int reg)
 {
     uint32_t val;
 
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
     case 29:    /* Interrupt source.  */
         val = s->phy_int;
         s->phy_int = 0;
-        phy_update_irq(s);
+        imx_phy_update_irq(s);
         break;
     case 30:    /* Interrupt mask */
         val = s->phy_int_mask;
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
         break;
     }
 
-    PHY_PRINTF("read 0x%04x @ %d\n", val, reg);
+    trace_imx_phy_read(val, reg);
 
     return val;
 }
 
-static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
+static void imx_phy_write(IMXFECState *s, int reg, uint32_t val)
 {
-    PHY_PRINTF("write 0x%04x @ %d\n", val, reg);
+    trace_imx_phy_write(val, reg);
 
     if (reg > 31) {
         /* we only advertise one phy */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
     switch (reg) {
     case 0:     /* Basic Control */
         if (val & 0x8000) {
-            phy_reset(s);
+            imx_phy_reset(s);
         } else {
             s->phy_control = val & 0x7980;
             /* Complete autonegotiation immediately.  */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
         break;
     case 30:    /* Interrupt mask */
         s->phy_int_mask = val & 0xff;
-        phy_update_irq(s);
+        imx_phy_update_irq(s);
         break;
     case 17:
     case 18:
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
 static void imx_fec_read_bd(IMXFECBufDesc *bd, dma_addr_t addr)
 {
     dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
+
+    trace_imx_fec_read_bd(addr, bd->flags, bd->length, bd->data);
 }
 
 static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
 static void imx_enet_read_bd(IMXENETBufDesc *bd, dma_addr_t addr)
 {
     dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
+
+    trace_imx_enet_read_bd(addr, bd->flags, bd->length, bd->data,
+                   bd->option, bd->status);
 }
 
 static void imx_enet_write_bd(IMXENETBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_do_tx(IMXFECState *s)
         int len;
 
         imx_fec_read_bd(&bd, addr);
-        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x\n",
-                   addr, bd.flags, bd.length, bd.data);
         if ((bd.flags & ENET_BD_R) == 0) {
+
             /* Run out of descriptors to transmit.  */
-            FEC_PRINTF("tx_bd ran out of descriptors to transmit\n");
+            trace_imx_eth_tx_bd_busy();
+
             break;
         }
         len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_enet_do_tx(IMXFECState *s, uint32_t index)
         int len;
 
         imx_enet_read_bd(&bd, addr);
-        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x option %04x "
-                   "status %04x\n", addr, bd.flags, bd.length, bd.data,
-                   bd.option, bd.status);
         if ((bd.flags & ENET_BD_R) == 0) {
             /* Run out of descriptors to transmit.  */
+
+            trace_imx_eth_tx_bd_busy();
+
             break;
         }
         len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_eth_enable_rx(IMXFECState *s, bool flush)
     s->regs[ENET_RDAR] = (bd.flags & ENET_BD_E) ? ENET_RDAR_RDAR : 0;
 
     if (!s->regs[ENET_RDAR]) {
-        FEC_PRINTF("RX buffer full\n");
+        trace_imx_eth_rx_bd_full();
     } else if (flush) {
         qemu_flush_queued_packets(qemu_get_queue(s->nic));
     }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_reset(DeviceState *d)
     memset(s->tx_descriptor, 0, sizeof(s->tx_descriptor));
 
     /* We also reset the PHY */
-    phy_reset(s);
+    imx_phy_reset(s);
 }
 
 static uint32_t imx_default_read(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_eth_read(void *opaque, hwaddr offset, unsigned size)
         break;
     }
 
-    FEC_PRINTF("reg[%s] => 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
-                                              value);
+    trace_imx_eth_read(index, imx_eth_reg_name(s, index), value);
 
     return value;
 }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
     const bool single_tx_ring = !imx_eth_is_multi_tx_ring(s);
     uint32_t index = offset >> 2;
 
-    FEC_PRINTF("reg[%s] <= 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
-                (uint32_t)value);
+    trace_imx_eth_write(index, imx_eth_reg_name(s, index), value);
 
     switch (index) {
     case ENET_EIR:
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
         if (extract32(value, 29, 1)) {
             /* This is a read operation */
             s->regs[ENET_MMFR] = deposit32(s->regs[ENET_MMFR], 0, 16,
-                                           do_phy_read(s,
+                                           imx_phy_read(s,
                                                        extract32(value,
                                                                  18, 10)));
         } else {
             /* This a write operation */
-            do_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
+            imx_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
         }
         /* raise the interrupt as the PHY operation is done */
         s->regs[ENET_EIR] |= ENET_INT_MII;
@@ -XXX,XX +XXX,XX @@ static bool imx_eth_can_receive(NetClientState *nc)
 {
     IMXFECState *s = IMX_FEC(qemu_get_nic_opaque(nc));
 
-    FEC_PRINTF("\n");
-
     return !!s->regs[ENET_RDAR];
 }
 
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
     unsigned int buf_len;
     size_t size = len;
 
-    FEC_PRINTF("len %d\n", (int)size);
+    trace_imx_fec_receive(size);
 
     if (!s->regs[ENET_RDAR]) {
         qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
         bd.length = buf_len;
         size -= buf_len;
 
-        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
+        trace_imx_fec_receive_len(addr, bd.length);
 
         /* The last 4 bytes are the CRC.  */
         if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
         if (size == 0) {
             /* Last buffer in frame.  */
             bd.flags |= flags | ENET_BD_L;
-            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
+
+            trace_imx_fec_receive_last(bd.flags);
+
             s->regs[ENET_EIR] |= ENET_INT_RXF;
         } else {
             s->regs[ENET_EIR] |= ENET_INT_RXB;
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
     size_t size = len;
     bool shift16 = s->regs[ENET_RACC] & ENET_RACC_SHIFT16;
 
-    FEC_PRINTF("len %d\n", (int)size);
+    trace_imx_enet_receive(size);
 
     if (!s->regs[ENET_RDAR]) {
         qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
         bd.length = buf_len;
         size -= buf_len;
 
-        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
+        trace_imx_enet_receive_len(addr, bd.length);
 
         /* The last 4 bytes are the CRC.  */
         if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
         if (size == 0) {
             /* Last buffer in frame.  */
             bd.flags |= flags | ENET_BD_L;
-            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
+
+            trace_imx_enet_receive_last(bd.flags);
+
             /* Indicate that we've updated the last buffer descriptor. */
             bd.last_buffer = ENET_BD_BDU;
             if (bd.option & ENET_BD_RX_INT) {
diff --git a/hw/net/trace-events b/hw/net/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -XXX,XX +XXX,XX @@ i82596_receive_packet(size_t sz) "len=%zu"
 i82596_new_mac(const char *id_with_mac) "New MAC for: %s"
 i82596_set_multicast(uint16_t count) "Added %d multicast entries"
 i82596_channel_attention(void *s) "%p: Received CHANNEL ATTENTION"
+
+# imx_fec.c
+imx_phy_read(uint32_t val, int reg) "0x%04"PRIx32" <= reg[%d]"
+imx_phy_write(uint32_t val, int reg) "0x%04"PRIx32" => reg[%d]"
+imx_phy_update_link(const char *s) "%s"
+imx_phy_reset(void) ""
+imx_fec_read_bd(uint64_t addr, int flags, int len, int data) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x"
+imx_enet_read_bd(uint64_t addr, int flags, int len, int data, int options, int status) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x option 0x%04x status 0x%04x"
+imx_eth_tx_bd_busy(void) "tx_bd ran out of descriptors to transmit"
+imx_eth_rx_bd_full(void) "RX buffer is full"
+imx_eth_read(int reg, const char *reg_name, uint32_t value) "reg[%d:%s] => 0x%08"PRIx32
+imx_eth_write(int reg, const char *reg_name, uint64_t value) "reg[%d:%s] <= 0x%08"PRIx64
+imx_fec_receive(size_t size) "len %zu"
+imx_fec_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
+imx_fec_receive_last(int last) "rx frame flags 0x%04x"
+imx_enet_receive(size_t size) "len %zu"
+imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
+imx_enet_receive_last(int last) "rx frame flags 0x%04x"
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

The Linux kernel's IMX code now uses vendor specific commands.
This results in endless warnings when booting the Linux kernel.

sdhci-esdhc-imx 2194000.usdhc: esdhc_wait_for_card_clock_gate_off:
	card clock still not gate off in 100us!.

Implement support for the vendor specific command implemented in IMX hardware
to be able to avoid this warning.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Message-id: 20200603145258.195920-2-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/sd/sdhci-internal.h |  5 +++++
 include/hw/sd/sdhci.h  |  5 +++++
 hw/sd/sdhci.c          | 18 +++++++++++++++++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/sdhci-internal.h
+++ b/hw/sd/sdhci-internal.h
@@ -XXX,XX +XXX,XX @@
 #define SDHC_CMD_INHIBIT               0x00000001
 #define SDHC_DATA_INHIBIT              0x00000002
 #define SDHC_DAT_LINE_ACTIVE           0x00000004
+#define SDHC_IMX_CLOCK_GATE_OFF        0x00000080
 #define SDHC_DOING_WRITE               0x00000100
 #define SDHC_DOING_READ                0x00000200
 #define SDHC_SPACE_AVAILABLE           0x00000400
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
 
 
 #define ESDHC_MIX_CTRL                  0x48
+
 #define ESDHC_VENDOR_SPEC               0xc0
+#define ESDHC_IMX_FRC_SDCLK_ON          (1 << 8)
+
 #define ESDHC_DLL_CTRL                  0x60
 
 #define ESDHC_TUNING_CTRL               0xcc
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
 #define DEFINE_SDHCI_COMMON_PROPERTIES(_state) \
     DEFINE_PROP_UINT8("sd-spec-version", _state, sd_spec_version, 2), \
     DEFINE_PROP_UINT8("uhs", _state, uhs_mode, UHS_NOT_SUPPORTED), \
+    DEFINE_PROP_UINT8("vendor", _state, vendor, SDHCI_VENDOR_NONE), \
     \
     /* Capabilities registers provide information on supported
      * features of this specific host controller implementation */ \
diff --git a/include/hw/sd/sdhci.h b/include/hw/sd/sdhci.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/sd/sdhci.h
+++ b/include/hw/sd/sdhci.h
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
     uint16_t acmd12errsts; /* Auto CMD12 error status register */
     uint16_t hostctl2;     /* Host Control 2 */
     uint64_t admasysaddr;  /* ADMA System Address Register */
+    uint16_t vendor_spec;  /* Vendor specific register */
 
     /* Read-only registers */
     uint64_t capareg;      /* Capabilities Register */
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
     uint32_t quirks;
     uint8_t sd_spec_version;
     uint8_t uhs_mode;
+    uint8_t vendor;        /* For vendor specific functionality */
 } SDHCIState;
 
+#define SDHCI_VENDOR_NONE       0
+#define SDHCI_VENDOR_IMX        1
+
 /*
  * Controller does not provide transfer-complete interrupt when not
  * busy.
diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -XXX,XX +XXX,XX @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
         }
         break;
 
+    case ESDHC_VENDOR_SPEC:
+        ret = s->vendor_spec;
+        break;
     case ESDHC_DLL_CTRL:
     case ESDHC_TUNE_CTRL_STATUS:
     case ESDHC_UNDOCUMENTED_REG27:
     case ESDHC_TUNING_CTRL:
-    case ESDHC_VENDOR_SPEC:
     case ESDHC_MIX_CTRL:
     case ESDHC_WTMK_LVL:
         ret = 0;
@@ -XXX,XX +XXX,XX @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
     case ESDHC_UNDOCUMENTED_REG27:
     case ESDHC_TUNING_CTRL:
     case ESDHC_WTMK_LVL:
+        break;
+
     case ESDHC_VENDOR_SPEC:
+        s->vendor_spec = value;
+        switch (s->vendor) {
+        case SDHCI_VENDOR_IMX:
+            if (value & ESDHC_IMX_FRC_SDCLK_ON) {
+                s->prnsts &= ~SDHC_IMX_CLOCK_GATE_OFF;
+            } else {
+                s->prnsts |= SDHC_IMX_CLOCK_GATE_OFF;
+            }
+            break;
+        default:
+            break;
+        }
         break;
 
     case SDHC_HOSTCTL:
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

Set vendor property to IMX to enable IMX specific functionality
in sdhci code.

Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200603145258.195920-3-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/fsl-imx25.c  | 6 ++++++
 hw/arm/fsl-imx6.c   | 6 ++++++
 hw/arm/fsl-imx6ul.c | 2 ++
 hw/arm/fsl-imx7.c   | 2 ++
 4 files changed, 16 insertions(+)

diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx25.c
+++ b/hw/arm/fsl-imx25.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
                                  &err);
         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX25_ESDHC_CAPABILITIES,
                                  "capareg", &err);
+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
         object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
         if (err) {
             error_propagate(errp, err);
diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx6.c
+++ b/hw/arm/fsl-imx6.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
                                  &err);
         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX6_ESDHC_CAPABILITIES,
                                  "capareg", &err);
+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
         object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
         if (err) {
             error_propagate(errp, err);
diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx6ul.c
+++ b/hw/arm/fsl-imx6ul.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
             FSL_IMX6UL_USDHC2_IRQ,
         };
 
+        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
+                                        "vendor", &error_abort);
         object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                  &error_abort);
 
diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx7.c
+++ b/hw/arm/fsl-imx7.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
             FSL_IMX7_USDHC3_IRQ,
         };
 
+        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &error_abort);
         object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                  &error_abort);
 
-- 
2.20.1

Hi; here's the latest round of arm patches. I have included also
my patchset for the RTC devices to avoid keeping time_t and
time_t diffs in 32-bit variables.

thanks
-- PMM

The following changes since commit 156618d9ea67f2f2e31d9dedd97f2dcccbe6808c:

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging (2023-08-30 09:20:27 -0400)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230831

for you to fetch changes up to e73b8bb8a3e9a162f70e9ffbf922d4fafc96bbfb:

hw/arm: Set number of MPU regions correctly for an505, an521, an524 (2023-08-31 11:07:02 +0100)

----------------------------------------------------------------
target-arm queue:
 * Some of the preliminary patches for Cortex-A710 support
 * i.MX7 and i.MX6UL refactoring
 * Implement SRC device for i.MX7
 * Catch illegal-exception-return from EL3 with bad NSE/NS
 * Use 64-bit offsets for holding time_t differences in RTC devices
 * Model correct number of MPU regions for an505, an521, an524 boards

----------------------------------------------------------------
Alex Bennée (1):
      target/arm: properly document FEAT_CRC32

Jean-Christophe Dubois (6):
      Remove i.MX7 IOMUX GPR device from i.MX6UL
      Refactor i.MX6UL processor code
      Add i.MX6UL missing devices.
      Refactor i.MX7 processor code
      Add i.MX7 missing TZ devices and memory regions
      Add i.MX7 SRC device implementation

Peter Maydell (8):
      target/arm: Catch illegal-exception-return from EL3 with bad NSE/NS
      hw/rtc/m48t59: Use 64-bit arithmetic in set_alarm()
      hw/rtc/twl92230: Use int64_t for sec_offset and alm_sec
      hw/rtc/aspeed_rtc: Use 64-bit offset for holding time_t difference
      rtc: Use time_t for passing and returning time offsets
      target/arm: Do all "ARM_FEATURE_X implies Y" checks in post_init
      hw/arm/armv7m: Add mpu-ns-regions and mpu-s-regions properties
      hw/arm: Set number of MPU regions correctly for an505, an521, an524

Richard Henderson (9):
      target/arm: Reduce dcz_blocksize to uint8_t
      target/arm: Allow cpu to configure GM blocksize
      target/arm: Support more GM blocksizes
      target/arm: When tag memory is not present, set MTE=1
      target/arm: Introduce make_ccsidr64
      target/arm: Apply access checks to neoverse-n1 special registers
      target/arm: Apply access checks to neoverse-v1 special registers
      target/arm: Suppress FEAT_TRBE (Trace Buffer Extension)
      target/arm: Implement FEAT_HPDS2 as a no-op