Series comparison

-[PULL 00/23] target-arm queue
+[PULL 00/35] target-arm queue
-Mostly my decodetree stuff, but also some patches for various
+The following changes since commit 5767815218efd3cbfd409505ed824d5f356044ae:
 smaller bugs/features from others.
-thanks
+  Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2024-02-14 15:45:52 +0000)
 -- PMM
 The following changes since commit 53550e81e2cafe7c03a39526b95cd21b5194d9b1:
   Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2020-06-15 16:36:34 +0100)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200616
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240215
-for you to fetch changes up to 64b397417a26509bcdff44ab94356a35c7901c79:
+for you to fetch changes up to f780e63fe731b058fe52d43653600d8729a1b5f2:
-  hw: arm: Set vendor property for IMX SDHCI emulations (2020-06-16 10:32:29 +0100)
+  docs: Add documentation for the mps3-an536 board (2024-02-15 14:32:39 +0000)
 ----------------------------------------------------------------
- * hw: arm: Set vendor property for IMX SDHCI emulations
+target-arm queue:
- * sd: sdhci: Implement basic vendor specific register support
+ * hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
- * hw/net/imx_fec: Convert debug fprintf() to trace events
+ * linux-user/aarch64: Choose SYNC as the preferred MTE mode
- * target/arm/cpu: adjust virtual time for all KVM arm cpus
+ * Fix some errors in SVE/SME handling of MTE tags
- * Implement configurable descriptor size in ftgmac100
+ * hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
- * hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+ * hw/block/tc58128: Don't emit deprecation warning under qtest
- * target/arm: More Neon decodetree conversion work
+ * tests/qtest: Fix handling of npcm7xx and GMAC tests
  * hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
  * tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
  * Don't assert on vmload/vmsave of M-profile CPUs
  * hw/arm/smmuv3: add support for stage 1 access fault
  * hw/arm/stellaris: QOM cleanups
  * Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
  * Improve Cortex_R52 IMPDEF sysreg modelling
  * Allow access to SPSR_hyp from hyp mode
  * New board model mps3-an536 (Cortex-R52)
 ----------------------------------------------------------------
-Erik Smit (1):
+Luc Michel (1):
-      Implement configurable descriptor size in ftgmac100
+      hw/arm/smmuv3: add support for stage 1 access fault
-Guenter Roeck (2):
+Nabih Estefan (1):
-      sd: sdhci: Implement basic vendor specific register support
+      tests/qtest: Fix GMAC test to run on a machine in upstream QEMU
       hw: arm: Set vendor property for IMX SDHCI emulations
-Jean-Christophe Dubois (2):
+Peter Maydell (22):
-      hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+      hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
-      hw/net/imx_fec: Convert debug fprintf() to trace events
+      hw/block/tc58128: Don't emit deprecation warning under qtest
       tests/qtest/meson.build: Don't include qtests_npcm7xx in qtests_aarch64
       tests/qtest/bios-tables-test: Allow changes to virt GTDT
       hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
       tests/qtest/bios-tables-tests: Update virt golden reference
       hw/arm/npcm7xx: Call qemu_configure_nic_device() for GMAC modules
       tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
       target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
       target/arm: Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
       target/arm: The Cortex-R52 has a read-only CBAR
       target/arm: Add Cortex-R52 IMPDEF sysregs
       target/arm: Allow access to SPSR_hyp from hyp mode
       hw/misc/mps2-scc: Fix condition for CFG3 register
       hw/misc/mps2-scc: Factor out which-board conditionals
       hw/misc/mps2-scc: Make changes needed for AN536 FPGA image
       hw/arm/mps3r: Initial skeleton for mps3-an536 board
       hw/arm/mps3r: Add CPUs, GIC, and per-CPU RAM
       hw/arm/mps3r: Add UARTs
       hw/arm/mps3r: Add GPIO, watchdog, dual-timer, I2C devices
       hw/arm/mps3r: Add remaining devices
       docs: Add documentation for the mps3-an536 board
-Peter Maydell (17):
+Philippe Mathieu-Daudé (5):
-      target/arm: Fix missing temp frees in do_vshll_2sh
+      hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
-      target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
+      hw/arm/stellaris: Convert ADC controller to Resettable interface
-      target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
+      hw/arm/stellaris: Convert I2C controller to Resettable interface
-      target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
+      hw/arm/stellaris: Add missing QOM 'machine' parent
-      target/arm: Convert Neon 3-reg-diff long multiplies
+      hw/arm/stellaris: Add missing QOM 'SoC' parent
       target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
       target/arm: Convert Neon 3-reg-diff polynomial VMULL
       target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
       target/arm: Add missing TCG temp free in do_2shift_env_64()
       target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
       target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
       target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
       target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
       target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
       target/arm: Convert Neon VEXT to decodetree
       target/arm: Convert Neon VTBL, VTBX to decodetree
       target/arm: Convert Neon VDUP (scalar) to decodetree
-fangying (1):
+Richard Henderson (6):
-      target/arm/cpu: adjust virtual time for all KVM arm cpus
+      linux-user/aarch64: Choose SYNC as the preferred MTE mode
       target/arm: Fix nregs computation in do_{ld,st}_zpa
       target/arm: Adjust and validate mtedesc sizem1
       target/arm: Split out make_svemte_desc
       target/arm: Handle mte in do_ldrq, do_ldro
       target/arm: Fix SVE/SME gross MTE suppression checks
- hw/sd/sdhci-internal.h          |    5 +
+ MAINTAINERS                             |   3 +-
- include/hw/sd/sdhci.h           |    5 +
+ docs/system/arm/mps2.rst                |  37 +-
- target/arm/translate.h          |    1 +
+ configs/devices/arm-softmmu/default.mak |   1 +
- target/arm/neon-dp.decode       |  130 +++++
+ hw/arm/smmuv3-internal.h                |   1 +
- hw/arm/fsl-imx25.c              |    6 +
+ include/hw/arm/smmu-common.h            |   1 +
- hw/arm/fsl-imx6.c               |    6 +
+ include/hw/arm/virt.h                   |   2 +
- hw/arm/fsl-imx6ul.c             |    2 +
+ include/hw/misc/mps2-scc.h              |   1 +
- hw/arm/fsl-imx7.c               |    2 +
+ linux-user/aarch64/target_prctl.h       |  29 +-
- hw/misc/imx6ul_ccm.c            |   76 ++-
+ target/arm/internals.h                  |   2 +-
- hw/net/ftgmac100.c              |   26 +-
+ target/arm/tcg/translate-a64.h          |   2 +
- hw/net/imx_fec.c                |  106 ++--
+ hw/arm/mps3r.c                          | 640 ++++++++++++++++++++++++++++++++
- hw/sd/sdhci.c                   |   18 +-
+ hw/arm/npcm7xx.c                        |   1 +
- target/arm/cpu.c                |    6 +-
+ hw/arm/smmu-common.c                    |  11 +
- target/arm/cpu64.c              |    1 -
+ hw/arm/smmuv3.c                         |   1 +
- target/arm/kvm.c                |   21 +-
+ hw/arm/stellaris.c                      |  47 ++-
- target/arm/translate-neon.inc.c | 1148 ++++++++++++++++++++++++++++++++++++++-
+ hw/arm/virt-acpi-build.c                |  20 +-
- target/arm/translate.c          |  684 +----------------------
+ hw/arm/virt.c                           |  60 ++-
- hw/net/trace-events             |   18 +
+ hw/arm/xilinx_zynq.c                    |   2 +
-files changed, 1495 insertions(+), 766 deletions(-)
+ hw/block/tc58128.c                      |   4 +-
  hw/misc/mps2-scc.c                      | 138 ++++++-
  hw/pci-host/raven.c                     |   1 +
  target/arm/helper.c                     |  14 +-
  target/arm/tcg/cpu32.c                  | 109 ++++++
  target/arm/tcg/op_helper.c              |  43 ++-
  target/arm/tcg/sme_helper.c             |   8 +-
  target/arm/tcg/sve_helper.c             |  12 +-
  target/arm/tcg/translate-sme.c          |  15 +-
  target/arm/tcg/translate-sve.c          |  83 +++--
  target/arm/tcg/translate.c              |  19 +-
  tests/qtest/npcm7xx_emc-test.c          |   5 +-
  tests/qtest/npcm_gmac-test.c            |  84 +----
  hw/arm/Kconfig                          |   5 +
  hw/arm/meson.build                      |   1 +
  tests/data/acpi/virt/FACP               | Bin 276 -> 276 bytes
  tests/data/acpi/virt/GTDT               | Bin 96 -> 104 bytes
  tests/qtest/meson.build                 |   4 +-
 files changed, 1184 insertions(+), 222 deletions(-)
  create mode 100644 hw/arm/mps3r.c

-[PULL 01/23] target/arm: Fix missing temp frees in do_vshll_2sh
+[PULL 01/35] hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
-The widenfn() in do_vshll_2sh() does not free the input 32-bit
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
 TCGv, so we need to do this in the calling code.
+Similarly to commits dadbb58f59..5ae79fe825 for other ARM boards,
+connect FIQ output of the GIC CPU interfaces to the CPU.
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20240130152548.17855-1-philmd@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 ---
- target/arm/translate-neon.inc.c | 2 ++
+ hw/arm/xilinx_zynq.c | 2 ++
 file changed, 2 insertions(+)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/hw/arm/xilinx_zynq.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/hw/arm/xilinx_zynq.c
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
+@@ -XXX,XX +XXX,XX @@ static void zynq_init(MachineState *machine)
-     tmp = tcg_temp_new_i64();
+     sysbus_mmio_map(busdev, 0, MPCORE_PERIPHBASE);
+     sysbus_connect_irq(busdev, 0,
-     widenfn(tmp, rm0);
+                        qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));
-+    tcg_temp_free_i32(rm0);
++    sysbus_connect_irq(busdev, 1,
-     if (a->shift != 0) {
++                       qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ));
-         tcg_gen_shli_i64(tmp, tmp, a->shift);
-         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
+     for (n = 0; n < 64; n++) {
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
+         pic[n] = qdev_get_gpio_in(dev, n);
      neon_store_reg64(tmp, a->vd);
      widenfn(tmp, rm1);
 +    tcg_temp_free_i32(rm1);
      if (a->shift != 0) {
          tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
 --
-.20.1
+.34.1

-New patch
+[PULL 02/35] linux-user/aarch64: Choose SYNC as the preferred MTE mode
+From: Richard Henderson <richard.henderson@linaro.org>
+The API does not generate an error for setting ASYNC | SYNC; that merely
+constrains the selection vs the per-cpu default.  For qemu linux-user,
+choose SYNC as the default.
+Cc: qemu-stable@nongnu.org
+Reported-by: Gustavo Romero <gustavo.romero@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-2-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ linux-user/aarch64/target_prctl.h | 29 +++++++++++++++++------------
+file changed, 17 insertions(+), 12 deletions(-)
+diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
+index XXXXXXX..XXXXXXX 100644
+--- a/linux-user/aarch64/target_prctl.h
++++ b/linux-user/aarch64/target_prctl.h
+@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_set_tagged_addr_ctrl(CPUArchState *env, abi_long arg2)
+     env->tagged_addr_enable = arg2 & PR_TAGGED_ADDR_ENABLE;
+     if (cpu_isar_feature(aa64_mte, cpu)) {
+-        switch (arg2 & PR_MTE_TCF_MASK) {
+-        case PR_MTE_TCF_NONE:
+-        case PR_MTE_TCF_SYNC:
+-        case PR_MTE_TCF_ASYNC:
+-            break;
+-        default:
+-            return -EINVAL;
+-        }
+-
+         /*
+          * Write PR_MTE_TCF to SCTLR_EL1[TCF0].
+-         * Note that the syscall values are consistent with hw.
++         *
++         * The kernel has a per-cpu configuration for the sysadmin,
++         * /sys/devices/system/cpu/cpu<N>/mte_tcf_preferred,
++         * which qemu does not implement.
++         *
++         * Because there is no performance difference between the modes, and
++         * because SYNC is most useful for debugging MTE errors, choose SYNC
++         * as the preferred mode.  With this preference, and the way the API
++         * uses only two bits, there is no way for the program to select
++         * ASYMM mode.
+          */
+-        env->cp15.sctlr_el[1] =
+-            deposit64(env->cp15.sctlr_el[1], 38, 2, arg2 >> PR_MTE_TCF_SHIFT);
++        unsigned tcf = 0;
++        if (arg2 & PR_MTE_TCF_SYNC) {
++            tcf = 1;
++        } else if (arg2 & PR_MTE_TCF_ASYNC) {
++            tcf = 2;
++        }
++        env->cp15.sctlr_el[1] = deposit64(env->cp15.sctlr_el[1], 38, 2, tcf);
+         /*
+          * Write PR_MTE_TAG to GCR_EL1[Exclude].
+--
+.34.1

-[PULL 18/23] hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+[PULL 03/35] target/arm: Fix nregs computation in do_{ld,st}_zpa
-From: Jean-Christophe Dubois <jcd@tribudubois.net>
+From: Richard Henderson <richard.henderson@linaro.org>
-Some bits of the CCM registers are non writable.
+The field is encoded as [0-3], which is convenient for
 indexing our array of function pointers, but the true
 value is [1-4].  Adjust before calling do_mem_zpa.
-This was left undone in the initial commit (all bits of registers were
+Add an assert, and move the comment re passing ZT to
-writable).
+the helper back next to the relevant code.
-This patch adds the required code to protect the non writable bits.
+Cc: qemu-stable@nongnu.org
+Fixes: 206adacfb8d ("target/arm: Add mte helpers for sve scalar + int loads")
-Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200608133508.550046-1-jcd@tribudubois.net
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
 Message-id: 20240207025210.8837-3-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/misc/imx6ul_ccm.c | 76 ++++++++++++++++++++++++++++++++++++--------
+ target/arm/tcg/translate-sve.c | 16 ++++++++--------
-file changed, 63 insertions(+), 13 deletions(-)
+file changed, 8 insertions(+), 8 deletions(-)
-diff --git a/hw/misc/imx6ul_ccm.c b/hw/misc/imx6ul_ccm.c
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/imx6ul_ccm.c
+--- a/target/arm/tcg/translate-sve.c
-+++ b/hw/misc/imx6ul_ccm.c
++++ b/target/arm/tcg/translate-sve.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
+     TCGv_ptr t_pg;
- #include "trace.h"
+     int desc = 0;
 +static const uint32_t ccm_mask[CCM_MAX] = {
 +    [CCM_CCR] = 0xf01fef80,
 +    [CCM_CCDR] = 0xfffeffff,
 +    [CCM_CSR] = 0xffffffff,
 +    [CCM_CCSR] = 0xfffffef2,
 +    [CCM_CACRR] = 0xfffffff8,
 +    [CCM_CBCDR] = 0xc1f8e000,
 +    [CCM_CBCMR] = 0xfc03cfff,
 +    [CCM_CSCMR1] = 0x80700000,
 +    [CCM_CSCMR2] = 0xe01ff003,
 +    [CCM_CSCDR1] = 0xfe00c780,
 +    [CCM_CS1CDR] = 0xfe00fe00,
 +    [CCM_CS2CDR] = 0xf8007000,
 +    [CCM_CDCDR] = 0xf00fffff,
 +    [CCM_CHSCCDR] = 0xfffc01ff,
 +    [CCM_CSCDR2] = 0xfe0001ff,
 +    [CCM_CSCDR3] = 0xffffc1ff,
 +    [CCM_CDHIPR] = 0xffffffff,
 +    [CCM_CTOR] = 0x00000000,
 +    [CCM_CLPCR] = 0xf39ff01c,
 +    [CCM_CISR] = 0xfb85ffbe,
 +    [CCM_CIMR] = 0xfb85ffbf,
 +    [CCM_CCOSR] = 0xfe00fe00,
 +    [CCM_CGPR] = 0xfffc3fea,
 +    [CCM_CCGR0] = 0x00000000,
 +    [CCM_CCGR1] = 0x00000000,
 +    [CCM_CCGR2] = 0x00000000,
 +    [CCM_CCGR3] = 0x00000000,
 +    [CCM_CCGR4] = 0x00000000,
 +    [CCM_CCGR5] = 0x00000000,
 +    [CCM_CCGR6] = 0x00000000,
 +    [CCM_CMEOR] = 0xafffff1f,
 +};
 +
 +static const uint32_t analog_mask[CCM_ANALOG_MAX] = {
 +    [CCM_ANALOG_PLL_ARM] = 0xfff60f80,
 +    [CCM_ANALOG_PLL_USB1] = 0xfffe0fbc,
 +    [CCM_ANALOG_PLL_USB2] = 0xfffe0fbc,
 +    [CCM_ANALOG_PLL_SYS] = 0xfffa0ffe,
 +    [CCM_ANALOG_PLL_SYS_SS] = 0x00000000,
 +    [CCM_ANALOG_PLL_SYS_NUM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_SYS_DENOM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_AUDIO] = 0xffe20f80,
 +    [CCM_ANALOG_PLL_AUDIO_NUM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_AUDIO_DENOM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_VIDEO] = 0xffe20f80,
 +    [CCM_ANALOG_PLL_VIDEO_NUM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_VIDEO_DENOM] = 0xc0000000,
 +    [CCM_ANALOG_PLL_ENET] = 0xffc20ff0,
 +    [CCM_ANALOG_PFD_480] = 0x40404040,
 +    [CCM_ANALOG_PFD_528] = 0x40404040,
 +    [PMU_MISC0] = 0x01fe8306,
 +    [PMU_MISC1] = 0x07fcede0,
 +    [PMU_MISC2] = 0x005f5f5f,
 +};
 +
  static const char *imx6ul_ccm_reg_name(uint32_t reg)
  {
      static char unknown[20];
@@ -XXX,XX +XXX,XX @@ static void imx6ul_ccm_write(void *opaque, hwaddr offset, uint64_t value,
      trace_ccm_write_reg(imx6ul_ccm_reg_name(index), (uint32_t)value);
 -    /*
--     * We will do a better implementation later. In particular some bits
+-     * For e.g. LD4, there are not enough arguments to pass all 4
--     * cannot be written to.
+-     * registers as pointers, so encode the regno into the data field.
 -     * For consistency, do this even for LD1.
 -     */
--    s->ccm[index] = (uint32_t)value;
++    assert(mte_n >= 1 && mte_n <= 4);
-+    s->ccm[index] = (s->ccm[index] & ccm_mask[index]) |
+     if (s->mte_active[0]) {
-+                           ((uint32_t)value & ~ccm_mask[index]);
+         int msz = dtype_msz(dtype);
@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
          addr = clean_data_tbi(s, addr);
      }
 +    /*
 +     * For e.g. LD4, there are not enough arguments to pass all 4
 +     * registers as pointers, so encode the regno into the data field.
 +     * For consistency, do this even for LD1.
 +     */
      desc = simd_desc(vsz, vsz, zt | desc);
      t_pg = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static void do_ld_zpa(DisasContext *s, int zt, int pg,
       * accessible via the instruction encoding.
       */
      assert(fn != NULL);
 -    do_mem_zpa(s, zt, pg, addr, dtype, nreg, false, fn);
 +    do_mem_zpa(s, zt, pg, addr, dtype, nreg + 1, false, fn);
  }
- static uint64_t imx6ul_analog_read(void *opaque, hwaddr offset, unsigned size)
+ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a)
-@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
+@@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-          * the REG_NAME register. So we change the value of the
+     if (nreg == 0) {
-          * REG_NAME register, setting bits passed in the value.
+         /* ST1 */
-          */
+         fn = fn_single[s->mte_active[0]][be][msz][esz];
--        s->analog[index - 1] |= value;
+-        nreg = 1;
-+        s->analog[index - 1] |= (value & ~analog_mask[index - 1]);
+     } else {
-         break;
+         /* ST2, ST3, ST4 -- msz == esz, enforced by encoding */
-     case CCM_ANALOG_PLL_ARM_CLR:
+         assert(msz == esz);
-     case CCM_ANALOG_PLL_USB1_CLR:
+         fn = fn_multiple[s->mte_active[0]][be][nreg - 1][msz];
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
           * the REG_NAME register. So we change the value of the
           * REG_NAME register, unsetting bits passed in the value.
           */
 -        s->analog[index - 2] &= ~value;
 +        s->analog[index - 2] &= ~(value & ~analog_mask[index - 2]);
          break;
      case CCM_ANALOG_PLL_ARM_TOG:
      case CCM_ANALOG_PLL_USB1_TOG:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
           * the REG_NAME register. So we change the value of the
           * REG_NAME register, toggling bits passed in the value.
           */
 -        s->analog[index - 3] ^= value;
 +        s->analog[index - 3] ^= (value & ~analog_mask[index - 3]);
          break;
      default:
 -        /*
 -         * We will do a better implementation later. In particular some bits
 -         * cannot be written to.
 -         */
 -        s->analog[index] = value;
 +        s->analog[index] = (s->analog[index] & analog_mask[index]) |
 +                           (value & ~analog_mask[index]);
          break;
      }
+     assert(fn != NULL);
+-    do_mem_zpa(s, zt, pg, addr, msz_dtype(s, msz), nreg, true, fn);
++    do_mem_zpa(s, zt, pg, addr, msz_dtype(s, msz), nreg + 1, true, fn);
  }
+ static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a)
 --
-.20.1
+.34.1

-New patch
+[PULL 04/35] target/arm: Adjust and validate mtedesc sizem1
+From: Richard Henderson <richard.henderson@linaro.org>
+When we added SVE_MTEDESC_SHIFT, we effectively limited the
+maximum size of MTEDESC.  Adjust SIZEM1 to consume the remaining
+bits (32 - 10 - 5 - 12 == 5).  Assert that the data to be stored
+fits within the field (expecting 8 * 4 - 1 == 31, exact fit).
+Cc: qemu-stable@nongnu.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-4-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/internals.h         | 2 +-
+ target/arm/tcg/translate-sve.c | 7 ++++---
+files changed, 5 insertions(+), 4 deletions(-)
+diff --git a/target/arm/internals.h b/target/arm/internals.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/internals.h
++++ b/target/arm/internals.h
+@@ -XXX,XX +XXX,XX @@ FIELD(MTEDESC, TBI,   4, 2)
+ FIELD(MTEDESC, TCMA,  6, 2)
+ FIELD(MTEDESC, WRITE, 8, 1)
+ FIELD(MTEDESC, ALIGN, 9, 3)
+-FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - 12)  /* size - 1 */
++FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - SVE_MTEDESC_SHIFT - 12)  /* size - 1 */
+ bool mte_probe(CPUARMState *env, uint32_t desc, uint64_t ptr);
+ uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra);
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
+ {
+     unsigned vsz = vec_full_reg_size(s);
+     TCGv_ptr t_pg;
++    uint32_t sizem1;
+     int desc = 0;
+     assert(mte_n >= 1 && mte_n <= 4);
++    sizem1 = (mte_n << dtype_msz(dtype)) - 1;
++    assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
+     if (s->mte_active[0]) {
+-        int msz = dtype_msz(dtype);
+-
+         desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
+         desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
+         desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
+         desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
+-        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (mte_n << msz) - 1);
++        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
+         desc <<= SVE_MTEDESC_SHIFT;
+     } else {
+         addr = clean_data_tbi(s, addr);
+--
+.34.1

-New patch
+[PULL 05/35] target/arm: Split out make_svemte_desc
+From: Richard Henderson <richard.henderson@linaro.org>
+Share code that creates mtedesc and embeds within simd_desc.
+Cc: qemu-stable@nongnu.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-5-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/tcg/translate-a64.h |  2 ++
+ target/arm/tcg/translate-sme.c | 15 +++--------
+ target/arm/tcg/translate-sve.c | 47 ++++++++++++++++++----------------
+files changed, 31 insertions(+), 33 deletions(-)
+diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-a64.h
++++ b/target/arm/tcg/translate-a64.h
+@@ -XXX,XX +XXX,XX @@ bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
+ bool sve_access_check(DisasContext *s);
+ bool sme_enabled_check(DisasContext *s);
+ bool sme_enabled_check_with_svcr(DisasContext *s, unsigned);
++uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs,
++                          uint32_t msz, bool is_write, uint32_t data);
+ /* This function corresponds to CheckStreamingSVEEnabled. */
+ static inline bool sme_sm_enabled_check(DisasContext *s)
+diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sme.c
++++ b/target/arm/tcg/translate-sme.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
+     TCGv_ptr t_za, t_pg;
+     TCGv_i64 addr;
+-    int svl, desc = 0;
++    uint32_t desc;
+     bool be = s->be_data == MO_BE;
+     bool mte = s->mte_active[0];
+@@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
+     tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz);
+     tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
+-    if (mte) {
+-        desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
+-        desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
+-        desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
+-        desc = FIELD_DP32(desc, MTEDESC, WRITE, a->st);
+-        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << a->esz) - 1);
+-        desc <<= SVE_MTEDESC_SHIFT;
+-    } else {
++    if (!mte) {
+         addr = clean_data_tbi(s, addr);
+     }
+-    svl = streaming_vec_reg_size(s);
+-    desc = simd_desc(svl, svl, desc);
++
++    desc = make_svemte_desc(s, streaming_vec_reg_size(s), 1, a->esz, a->st, 0);
+     fns[a->esz][be][a->v][mte][a->st](tcg_env, t_za, t_pg, addr,
+                                       tcg_constant_i32(desc));
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static const uint8_t dtype_esz[16] = {
+, 2, 1, 3
+ };
+-static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
+-                       int dtype, uint32_t mte_n, bool is_write,
+-                       gen_helper_gvec_mem *fn)
++uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs,
++                          uint32_t msz, bool is_write, uint32_t data)
+ {
+-    unsigned vsz = vec_full_reg_size(s);
+-    TCGv_ptr t_pg;
+     uint32_t sizem1;
+-    int desc = 0;
++    uint32_t desc = 0;
+-    assert(mte_n >= 1 && mte_n <= 4);
+-    sizem1 = (mte_n << dtype_msz(dtype)) - 1;
++    /* Assert all of the data fits, with or without MTE enabled. */
++    assert(nregs >= 1 && nregs <= 4);
++    sizem1 = (nregs << msz) - 1;
+     assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
++    assert(data < 1u << SVE_MTEDESC_SHIFT);
++
+     if (s->mte_active[0]) {
+         desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
+         desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
+         desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
+         desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
+         desc <<= SVE_MTEDESC_SHIFT;
+-    } else {
++    }
++    return simd_desc(vsz, vsz, desc | data);
++}
++
++static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
++                       int dtype, uint32_t nregs, bool is_write,
++                       gen_helper_gvec_mem *fn)
++{
++    TCGv_ptr t_pg;
++    uint32_t desc;
++
++    if (!s->mte_active[0]) {
+         addr = clean_data_tbi(s, addr);
+     }
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
+      * registers as pointers, so encode the regno into the data field.
+      * For consistency, do this even for LD1.
+      */
+-    desc = simd_desc(vsz, vsz, zt | desc);
++    desc = make_svemte_desc(s, vec_full_reg_size(s), nregs,
++                            dtype_msz(dtype), is_write, zt);
+     t_pg = tcg_temp_new_ptr();
+     tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm,
+                        int scale, TCGv_i64 scalar, int msz, bool is_write,
+                        gen_helper_gvec_mem_scatter *fn)
+ {
+-    unsigned vsz = vec_full_reg_size(s);
+     TCGv_ptr t_zm = tcg_temp_new_ptr();
+     TCGv_ptr t_pg = tcg_temp_new_ptr();
+     TCGv_ptr t_zt = tcg_temp_new_ptr();
+-    int desc = 0;
+-
+-    if (s->mte_active[0]) {
+-        desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
+-        desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
+-        desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
+-        desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
+-        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << msz) - 1);
+-        desc <<= SVE_MTEDESC_SHIFT;
+-    }
+-    desc = simd_desc(vsz, vsz, desc | scale);
++    uint32_t desc;
+     tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
+     tcg_gen_addi_ptr(t_zm, tcg_env, vec_full_reg_offset(s, zm));
+     tcg_gen_addi_ptr(t_zt, tcg_env, vec_full_reg_offset(s, zt));
++
++    desc = make_svemte_desc(s, vec_full_reg_size(s), 1, msz, is_write, scale);
+     fn(tcg_env, t_zt, t_pg, t_zm, scalar, tcg_constant_i32(desc));
+ }
+--
+.34.1

-New patch
+[PULL 06/35] target/arm: Handle mte in do_ldrq, do_ldro
+From: Richard Henderson <richard.henderson@linaro.org>
+These functions "use the standard load helpers", but
+fail to clean_data_tbi or populate mtedesc.
+Cc: qemu-stable@nongnu.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-6-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/tcg/translate-sve.c | 15 +++++++++++++--
+file changed, 13 insertions(+), 2 deletions(-)
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/translate-sve.c
++++ b/target/arm/tcg/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
+     unsigned vsz = vec_full_reg_size(s);
+     TCGv_ptr t_pg;
+     int poff;
++    uint32_t desc;
+     /* Load the first quadword using the normal predicated load helpers.  */
++    if (!s->mte_active[0]) {
++        addr = clean_data_tbi(s, addr);
++    }
++
+     poff = pred_full_reg_offset(s, pg);
+     if (vsz > 16) {
+         /*
+@@ -XXX,XX +XXX,XX @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
+     gen_helper_gvec_mem *fn
+         = ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
+-    fn(tcg_env, t_pg, addr, tcg_constant_i32(simd_desc(16, 16, zt)));
++    desc = make_svemte_desc(s, 16, 1, dtype_msz(dtype), false, zt);
++    fn(tcg_env, t_pg, addr, tcg_constant_i32(desc));
+     /* Replicate that first quadword.  */
+     if (vsz > 16) {
+@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
+     unsigned vsz_r32;
+     TCGv_ptr t_pg;
+     int poff, doff;
++    uint32_t desc;
+     if (vsz < 32) {
+         /*
+@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
+     }
+     /* Load the first octaword using the normal predicated load helpers.  */
++    if (!s->mte_active[0]) {
++        addr = clean_data_tbi(s, addr);
++    }
+     poff = pred_full_reg_offset(s, pg);
+     if (vsz > 32) {
+@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
+     gen_helper_gvec_mem *fn
+         = ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
+-    fn(tcg_env, t_pg, addr, tcg_constant_i32(simd_desc(32, 32, zt)));
++    desc = make_svemte_desc(s, 32, 1, dtype_msz(dtype), false, zt);
++    fn(tcg_env, t_pg, addr, tcg_constant_i32(desc));
+     /*
+      * Replicate that first octaword.
+--
+.34.1

-New patch
+[PULL 07/35] target/arm: Fix SVE/SME gross MTE suppression checks
+From: Richard Henderson <richard.henderson@linaro.org>
+The TBI and TCMA bits are located within mtedesc, not desc.
+Cc: qemu-stable@nongnu.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-7-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/tcg/sme_helper.c |  8 ++++----
+ target/arm/tcg/sve_helper.c | 12 ++++++------
+files changed, 10 insertions(+), 10 deletions(-)
+diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sme_helper.c
++++ b/target/arm/tcg/sme_helper.c
+@@ -XXX,XX +XXX,XX @@ void sme_ld1_mte(CPUARMState *env, void *za, uint64_t *vg,
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
+     /* Perform gross MTE suppression early. */
+-    if (!tbi_check(desc, bit55) ||
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
++    if (!tbi_check(mtedesc, bit55) ||
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
+         mtedesc = 0;
+     }
+@@ -XXX,XX +XXX,XX @@ void sme_st1_mte(CPUARMState *env, void *za, uint64_t *vg, target_ulong addr,
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
+     /* Perform gross MTE suppression early. */
+-    if (!tbi_check(desc, bit55) ||
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
++    if (!tbi_check(mtedesc, bit55) ||
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
+         mtedesc = 0;
+     }
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/sve_helper.c
++++ b/target/arm/tcg/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ void sve_ldN_r_mte(CPUARMState *env, uint64_t *vg, target_ulong addr,
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
+     /* Perform gross MTE suppression early. */
+-    if (!tbi_check(desc, bit55) ||
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
++    if (!tbi_check(mtedesc, bit55) ||
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
+         mtedesc = 0;
+     }
+@@ -XXX,XX +XXX,XX @@ void sve_ldnfff1_r_mte(CPUARMState *env, void *vg, target_ulong addr,
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
+     /* Perform gross MTE suppression early. */
+-    if (!tbi_check(desc, bit55) ||
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
++    if (!tbi_check(mtedesc, bit55) ||
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
+         mtedesc = 0;
+     }
+@@ -XXX,XX +XXX,XX @@ void sve_stN_r_mte(CPUARMState *env, uint64_t *vg, target_ulong addr,
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
+     /* Perform gross MTE suppression early. */
+-    if (!tbi_check(desc, bit55) ||
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
++    if (!tbi_check(mtedesc, bit55) ||
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
+         mtedesc = 0;
+     }
+--
+.34.1

-[PULL 23/23] hw: arm: Set vendor property for IMX SDHCI emulations
+[PULL 08/35] hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
-From: Guenter Roeck <linux@roeck-us.net>
+The raven_io_ops MemoryRegionOps is the only one in the source tree
 which sets .valid.unaligned to indicate that it should support
 unaligned accesses and which does not also set .impl.unaligned to
 indicate that its read and write functions can do the unaligned
 handling themselves.  This is a problem, because at the moment the
 core memory system does not implement the support for handling
 unaligned accesses by doing a series of aligned accesses and
 combining them (system/memory.c:access_with_adjusted_size() has a
 TODO comment noting this).
-Set vendor property to IMX to enable IMX specific functionality
+Fortunately raven_io_read() and raven_io_write() will correctly deal
-in sdhci code.
+with the case of being passed an unaligned address, so we can fix the
 missing unaligned access support by setting .impl.unaligned in the
 MemoryRegionOps struct.
-Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Fixes: 9a1839164c9c8f06 ("raven: Implement non-contiguous I/O region")
 Signed-off-by: Guenter Roeck <linux@roeck-us.net>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20200603145258.195920-3-linux@roeck-us.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Tested-by: Cédric Le Goater <clg@redhat.com>
+Reviewed-by: Cédric Le Goater <clg@redhat.com>
+Message-id: 20240112134640.1775041-1-peter.maydell@linaro.org
 ---
- hw/arm/fsl-imx25.c  | 6 ++++++
+ hw/pci-host/raven.c | 1 +
- hw/arm/fsl-imx6.c   | 6 ++++++
+file changed, 1 insertion(+)
  hw/arm/fsl-imx6ul.c | 2 ++
  hw/arm/fsl-imx7.c   | 2 ++
 files changed, 16 insertions(+)
-diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
+diff --git a/hw/pci-host/raven.c b/hw/pci-host/raven.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/fsl-imx25.c
+--- a/hw/pci-host/raven.c
-+++ b/hw/arm/fsl-imx25.c
++++ b/hw/pci-host/raven.c
-@@ -XXX,XX +XXX,XX @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps raven_io_ops = {
-                                  &err);
+     .write = raven_io_write,
-         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX25_ESDHC_CAPABILITIES,
+     .endianness = DEVICE_LITTLE_ENDIAN,
-                                  "capareg", &err);
+     .impl.max_access_size = 4,
-+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
++    .impl.unaligned = true,
-+                                 "vendor", &err);
+     .valid.unaligned = true,
-+        if (err) {
+ };
 +            error_propagate(errp, err);
 +            return;
 +        }
          object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
          if (err) {
              error_propagate(errp, err);
 diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/fsl-imx6.c
 +++ b/hw/arm/fsl-imx6.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
                                   &err);
          object_property_set_uint(OBJECT(&s->esdhc[i]), IMX6_ESDHC_CAPABILITIES,
                                   "capareg", &err);
 +        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
 +                                 "vendor", &err);
 +        if (err) {
 +            error_propagate(errp, err);
 +            return;
 +        }
          object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
          if (err) {
              error_propagate(errp, err);
 diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/fsl-imx6ul.c
 +++ b/hw/arm/fsl-imx6ul.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
              FSL_IMX6UL_USDHC2_IRQ,
          };
 +        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
 +                                        "vendor", &error_abort);
          object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                   &error_abort);
 diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/fsl-imx7.c
 +++ b/hw/arm/fsl-imx7.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
              FSL_IMX7_USDHC3_IRQ,
          };
 +        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
 +                                 "vendor", &error_abort);
          object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                   &error_abort);
 --
-.20.1
+.34.1

-[PULL 16/23] target/arm: Convert Neon VTBL, VTBX to decodetree
+[PULL 09/35] hw/block/tc58128: Don't emit deprecation warning under qtest
-Convert the Neon VTBL, VTBX instructions to decodetree.  The actual
+Suppress the deprecation warning when we're running under qtest,
-implementation of the insn is copied across to the new trans function
+to avoid "make check" including warning messages in its output.
 unchanged except for renaming 'tmp5' to 'tmp4'.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206154151.155620-1-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  3 ++
+ hw/block/tc58128.c | 4 +++-
- target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
+file changed, 3 insertions(+), 1 deletion(-)
  target/arm/translate.c          | 41 +++---------------------
 files changed, 63 insertions(+), 37 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/hw/block/tc58128.c b/hw/block/tc58128.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/hw/block/tc58128.c
-+++ b/target/arm/neon-dp.decode
++++ b/hw/block/tc58128.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static sh7750_io_device tc58128 = {
-     ##################################################################
-     VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
+ int tc58128_init(struct SH7750State *s, const char *zone1, const char *zone2)
-                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ {
-+
+-    warn_report_once("The TC58128 flash device is deprecated");
-+    VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
++    if (!qtest_enabled()) {
-+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
++        warn_report_once("The TC58128 flash device is deprecated");
    ]
    # Subgroup for size != 0b11
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
      }
      return true;
  }
 +
 +static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
 +{
 +    int n;
 +    TCGv_i32 tmp, tmp2, tmp3, tmp4;
 +    TCGv_ptr ptr1;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
-+
+     init_dev(&tc58128_devs[0], zone1);
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+     init_dev(&tc58128_devs[1], zone2);
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+     return sh7750_register_io_device(s, &tc58128);
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    n = a->len + 1;
 +    if ((a->vn + n) > 32) {
 +        /*
 +         * This is UNPREDICTABLE; we choose to UNDEF to avoid the
 +         * helper function running off the end of the register file.
 +         */
 +        return false;
 +    }
 +    n <<= 3;
 +    if (a->op) {
 +        tmp = neon_load_reg(a->vd, 0);
 +    } else {
 +        tmp = tcg_temp_new_i32();
 +        tcg_gen_movi_i32(tmp, 0);
 +    }
 +    tmp2 = neon_load_reg(a->vm, 0);
 +    ptr1 = vfp_reg_ptr(true, a->vn);
 +    tmp4 = tcg_const_i32(n);
 +    gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
 +    tcg_temp_free_i32(tmp);
 +    if (a->op) {
 +        tmp = neon_load_reg(a->vd, 1);
 +    } else {
 +        tmp = tcg_temp_new_i32();
 +        tcg_gen_movi_i32(tmp, 0);
 +    }
 +    tmp3 = neon_load_reg(a->vm, 1);
 +    gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
 +    tcg_temp_free_i32(tmp4);
 +    tcg_temp_free_ptr(ptr1);
 +    neon_store_reg(a->vd, 0, tmp2);
 +    neon_store_reg(a->vd, 1, tmp3);
 +    tcg_temp_free_i32(tmp);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
  {
      int op;
      int q;
 -    int rd, rn, rm, rd_ofs, rm_ofs;
 +    int rd, rm, rd_ofs, rm_ofs;
      int size;
      int pass;
      int u;
      int vec_size;
 -    TCGv_i32 tmp, tmp2, tmp3, tmp5;
 -    TCGv_ptr ptr1;
 +    TCGv_i32 tmp, tmp2, tmp3;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
      q = (insn & (1 << 6)) != 0;
      u = (insn >> 24) & 1;
      VFP_DREG_D(rd, insn);
 -    VFP_DREG_N(rn, insn);
      VFP_DREG_M(rm, insn);
      size = (insn >> 20) & 3;
      vec_size = q ? 16 : 8;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      break;
                  }
              } else if ((insn & (1 << 10)) == 0) {
 -                /* VTBL, VTBX.  */
 -                int n = ((insn >> 8) & 3) + 1;
 -                if ((rn + n) > 32) {
 -                    /* This is UNPREDICTABLE; we choose to UNDEF to avoid the
 -                     * helper function running off the end of the register file.
 -                     */
 -                    return 1;
 -                }
 -                n <<= 3;
 -                if (insn & (1 << 6)) {
 -                    tmp = neon_load_reg(rd, 0);
 -                } else {
 -                    tmp = tcg_temp_new_i32();
 -                    tcg_gen_movi_i32(tmp, 0);
 -                }
 -                tmp2 = neon_load_reg(rm, 0);
 -                ptr1 = vfp_reg_ptr(true, rn);
 -                tmp5 = tcg_const_i32(n);
 -                gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
 -                tcg_temp_free_i32(tmp);
 -                if (insn & (1 << 6)) {
 -                    tmp = neon_load_reg(rd, 1);
 -                } else {
 -                    tmp = tcg_temp_new_i32();
 -                    tcg_gen_movi_i32(tmp, 0);
 -                }
 -                tmp3 = neon_load_reg(rm, 1);
 -                gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
 -                tcg_temp_free_i32(tmp5);
 -                tcg_temp_free_ptr(ptr1);
 -                neon_store_reg(rd, 0, tmp2);
 -                neon_store_reg(rd, 1, tmp3);
 -                tcg_temp_free_i32(tmp);
 +                /* VTBL, VTBX: handled by decodetree */
 +                return 1;
              } else if ((insn & 0x380) == 0) {
                  /* VDUP */
                  int element;
 --
-.20.1
+.34.1

-New patch
+[PULL 10/35] tests/qtest/meson.build: Don't include qtests_npcm7xx in qtests_aarch64
+We deliberately don't include qtests_npcm7xx in qtests_aarch64,
+because we already get the coverage of those tests via qtests_arm,
+and we don't want to use extra CI minutes testing them twice.
+In commit 327b680877b79c4b we added it to qtests_aarch64; revert
+that change.
+Fixes: 327b680877b79c4b ("tests/qtest: Creating qtest for GMAC Module")
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20240206163043.315535-1-peter.maydell@linaro.org
+---
+ tests/qtest/meson.build | 1 -
+file changed, 1 deletion(-)
+diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qtest/meson.build
++++ b/tests/qtest/meson.build
+@@ -XXX,XX +XXX,XX @@ qtests_aarch64 = \
+   (config_all_devices.has_key('CONFIG_RASPI') ? ['bcm2835-dma-test'] : []) +  \
+   (config_all_accel.has_key('CONFIG_TCG') and                                            \
+    config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : []) + \
+-  (config_all_devices.has_key('CONFIG_NPCM7XX') ? qtests_npcm7xx : []) + \
+   ['arm-cpu-features',
+    'numa-test',
+    'boot-serial-test',
+--
+.34.1

-New patch
+[PULL 11/35] tests/qtest/bios-tables-test: Allow changes to virt GTDT
+Allow changes to the virt GTDT -- we are going to add the IRQ
+entry for a new timer to it.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
+Message-id: 20240122143537.233498-2-peter.maydell@linaro.org
+---
+ tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
+file changed, 2 insertions(+)
+diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qtest/bios-tables-test-allowed-diff.h
++++ b/tests/qtest/bios-tables-test-allowed-diff.h
+@@ -1 +1,3 @@
+ /* List of comma-separated changed AML files to ignore */
++"tests/data/acpi/virt/FACP",
++"tests/data/acpi/virt/GTDT",
+--
+.34.1

-New patch
+[PULL 12/35] hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
+Armv8.1+ CPUs have the Virtual Host Extension (VHE) which adds a
 non-secure EL2 virtual timer.  We implemented the timer itself in the
 CPU model, but never wired up its IRQ line to the GIC.
 Wire up the IRQ line (this is always safe whether the CPU has the
 interrupt or not, since it always creates the outbound IRQ line).
 Report it to the guest via dtb and ACPI if the CPU has the feature.
 The DTB binding is documented in the kernel's
 Documentation/devicetree/bindings/timer/arm\,arch_timer.yaml
 and the ACPI table entries are documented in the ACPI specification
 version 6.3 or later.
 Because the IRQ line ACPI binding is new in 6.3, we need to bump the
 FADT table rev to show that we might be using 6.3 features.
 Note that exposing this IRQ in the DTB will trigger a bug in EDK2
 versions prior to edk2-stable202311, for users who use the virt board
 with 'virtualization=on' to enable EL2 emulation and are booting an
 EDK2 guest BIOS, if that EDK2 has assertions enabled.  The effect is
 that EDK2 will assert on bootup:
  ASSERT [ArmTimerDxe] /home/kraxel/projects/qemu/roms/edk2/ArmVirtPkg/Library/ArmVirtTimerFdtClientLib/ArmVirtTimerFdtClientLib.c(72): PropSize == 36 || PropSize == 48
 If you see that assertion you should do one of:
  * update your EDK2 binaries to edk2-stable202311 or newer
  * use the 'virt-8.2' versioned machine type
  * not use 'virtualization=on'
 (The versions shipped with QEMU itself have the fix.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
 Message-id: 20240122143537.233498-3-peter.maydell@linaro.org
 ---
  include/hw/arm/virt.h    |  2 ++
  hw/arm/virt-acpi-build.c | 20 ++++++++++----
  hw/arm/virt.c            | 60 ++++++++++++++++++++++++++++++++++------
 files changed, 67 insertions(+), 15 deletions(-)
 diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/arm/virt.h
 +++ b/include/hw/arm/virt.h
@@ -XXX,XX +XXX,XX @@ struct VirtMachineClass {
      /* Machines < 6.2 have no support for describing cpu topology to guest */
      bool no_cpu_topology;
      bool no_tcg_lpa2;
 +    bool no_ns_el2_virt_timer_irq;
  };
  struct VirtMachineState {
@@ -XXX,XX +XXX,XX @@ struct VirtMachineState {
      PCIBus *bus;
      char *oem_id;
      char *oem_table_id;
 +    bool ns_el2_virt_timer_irq;
  };
  #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
 diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/virt-acpi-build.c
 +++ b/hw/arm/virt-acpi-build.c
@@ -XXX,XX +XXX,XX @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
  }
  /*
 - * ACPI spec, Revision 5.1
 - * 5.2.24 Generic Timer Description Table (GTDT)
 + * ACPI spec, Revision 6.5
 + * 5.2.25 Generic Timer Description Table (GTDT)
   */
  static void
  build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -XXX,XX +XXX,XX @@ build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
      uint32_t irqflags = vmc->claim_edge_triggered_timers ?
 : /* Interrupt is Edge triggered */
 ;  /* Interrupt is Level triggered  */
 -    AcpiTable table = { .sig = "GTDT", .rev = 2, .oem_id = vms->oem_id,
 +    AcpiTable table = { .sig = "GTDT", .rev = 3, .oem_id = vms->oem_id,
                          .oem_table_id = vms->oem_table_id };
      acpi_table_begin(&table, table_data);
@@ -XXX,XX +XXX,XX @@ build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
      build_append_int_noprefix(table_data, 0, 4);
      /* Platform Timer Offset */
      build_append_int_noprefix(table_data, 0, 4);
 -
 +    if (vms->ns_el2_virt_timer_irq) {
 +        /* Virtual EL2 Timer GSIV */
 +        build_append_int_noprefix(table_data, ARCH_TIMER_NS_EL2_VIRT_IRQ, 4);
 +        /* Virtual EL2 Timer Flags */
 +        build_append_int_noprefix(table_data, irqflags, 4);
 +    } else {
 +        build_append_int_noprefix(table_data, 0, 4);
 +        build_append_int_noprefix(table_data, 0, 4);
 +    }
      acpi_table_end(linker, &table);
  }
@@ -XXX,XX +XXX,XX @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
  static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
                              VirtMachineState *vms, unsigned dsdt_tbl_offset)
  {
 -    /* ACPI v6.0 */
 +    /* ACPI v6.3 */
      AcpiFadtData fadt = {
          .rev = 6,
 -        .minor_ver = 0,
 +        .minor_ver = 3,
          .flags = 1 << ACPI_FADT_F_HW_REDUCED_ACPI,
          .xdsdt_tbl_offset = &dsdt_tbl_offset,
      };
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void create_randomness(MachineState *ms, const char *node)
      qemu_fdt_setprop(ms->fdt, node, "rng-seed", seed.rng, sizeof(seed.rng));
  }
 +/*
 + * The CPU object always exposes the NS EL2 virt timer IRQ line,
 + * but we don't want to advertise it to the guest in the dtb or ACPI
 + * table unless it's really going to do something.
 + */
 +static bool ns_el2_virt_timer_present(void)
 +{
 +    ARMCPU *cpu = ARM_CPU(qemu_get_cpu(0));
 +    CPUARMState *env = &cpu->env;
 +
 +    return arm_feature(env, ARM_FEATURE_AARCH64) &&
 +        arm_feature(env, ARM_FEATURE_EL2) && cpu_isar_feature(aa64_vh, cpu);
 +}
 +
  static void create_fdt(VirtMachineState *vms)
  {
      MachineState *ms = MACHINE(vms);
@@ -XXX,XX +XXX,XX @@ static void fdt_add_timer_nodes(const VirtMachineState *vms)
                                  "arm,armv7-timer");
      }
      qemu_fdt_setprop(ms->fdt, "/timer", "always-on", NULL, 0);
 -    qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
 -                           GIC_FDT_IRQ_TYPE_PPI,
 -                           INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
 -                           GIC_FDT_IRQ_TYPE_PPI,
 -                           INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
 -                           GIC_FDT_IRQ_TYPE_PPI,
 -                           INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
 -                           GIC_FDT_IRQ_TYPE_PPI,
 -                           INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags);
 +    if (vms->ns_el2_virt_timer_irq) {
 +        qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_NS_EL2_VIRT_IRQ), irqflags);
 +    } else {
 +        qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags);
 +    }
  }
  static void fdt_add_cpu_nodes(const VirtMachineState *vms)
@@ -XXX,XX +XXX,XX @@ static void create_gic(VirtMachineState *vms, MemoryRegion *mem)
              [GTIMER_VIRT] = ARCH_TIMER_VIRT_IRQ,
              [GTIMER_HYP]  = ARCH_TIMER_NS_EL2_IRQ,
              [GTIMER_SEC]  = ARCH_TIMER_S_EL1_IRQ,
 +            [GTIMER_HYPVIRT] = ARCH_TIMER_NS_EL2_VIRT_IRQ,
          };
          for (unsigned irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
          qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
          object_unref(cpuobj);
      }
 +
 +    /* Now we've created the CPUs we can see if they have the hypvirt timer */
 +    vms->ns_el2_virt_timer_irq = ns_el2_virt_timer_present() &&
 +        !vmc->no_ns_el2_virt_timer_irq;
 +
      fdt_add_timer_nodes(vms);
      fdt_add_cpu_nodes(vms);
@@ -XXX,XX +XXX,XX @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 0)
  static void virt_machine_8_2_options(MachineClass *mc)
  {
 +    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
 +
      virt_machine_9_0_options(mc);
      compat_props_add(mc->compat_props, hw_compat_8_2, hw_compat_8_2_len);
 +    /*
 +     * Don't expose NS_EL2_VIRT timer IRQ in DTB on ACPI on 8.2 and
 +     * earlier machines. (Exposing it tickles a bug in older EDK2
 +     * guest BIOS binaries.)
 +     */
 +    vmc->no_ns_el2_virt_timer_irq = true;
  }
  DEFINE_VIRT_MACHINE(8, 2)
 --
 .34.1

-New patch
+[PULL 13/35] tests/qtest/bios-tables-tests: Update virt golden reference
+Update the virt golden reference files to say that the FACP is ACPI
 v6.3, and the GTDT table is a revision 3 table with space for the
 virtual EL2 timer.
 Diffs from iasl:
@@ -XXX,XX +XXX,XX @@
  /*
   * Intel ACPI Component Architecture
   * AML/ASL+ Disassembler version 20200925 (64-bit version)
   * Copyright (c) 2000 - 2020 Intel Corporation
   *
 - * Disassembly of tests/data/acpi/virt/FACP, Mon Jan 22 13:48:40 2024
 + * Disassembly of /tmp/aml-W8RZH2, Mon Jan 22 13:48:40 2024
   *
   * ACPI Data Table [FACP]
   *
   * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
   */
  [000h 0000   4]                    Signature : "FACP"    [Fixed ACPI Description Table (FADT)]
  [004h 0004   4]                 Table Length : 00000114
  [008h 0008   1]                     Revision : 06
 -[009h 0009   1]                     Checksum : 15
 +[009h 0009   1]                     Checksum : 12
  [00Ah 0010   6]                       Oem ID : "BOCHS "
  [010h 0016   8]                 Oem Table ID : "BXPC    "
  [018h 0024   4]                 Oem Revision : 00000001
  [01Ch 0028   4]              Asl Compiler ID : "BXPC"
  [020h 0032   4]        Asl Compiler Revision : 00000001
  [024h 0036   4]                 FACS Address : 00000000
  [028h 0040   4]                 DSDT Address : 00000000
  [02Ch 0044   1]                        Model : 00
  [02Dh 0045   1]                   PM Profile : 00 [Unspecified]
  [02Eh 0046   2]                SCI Interrupt : 0000
  [030h 0048   4]             SMI Command Port : 00000000
  [034h 0052   1]            ACPI Enable Value : 00
  [035h 0053   1]           ACPI Disable Value : 00
  [036h 0054   1]               S4BIOS Command : 00
  [037h 0055   1]              P-State Control : 00
@@ -XXX,XX +XXX,XX @@
       Use APIC Physical Destination Mode (V4) : 0
                         Hardware Reduced (V5) : 1
                        Low Power S0 Idle (V5) : 0
  [074h 0116  12]               Reset Register : [Generic Address Structure]
  [074h 0116   1]                     Space ID : 00 [SystemMemory]
  [075h 0117   1]                    Bit Width : 00
  [076h 0118   1]                   Bit Offset : 00
  [077h 0119   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [078h 0120   8]                      Address : 0000000000000000
  [080h 0128   1]         Value to cause reset : 00
  [081h 0129   2]    ARM Flags (decoded below) : 0003
                                PSCI Compliant : 1
                         Must use HVC for PSCI : 1
 -[083h 0131   1]          FADT Minor Revision : 00
 +[083h 0131   1]          FADT Minor Revision : 03
  [084h 0132   8]                 FACS Address : 0000000000000000
  [08Ch 0140   8]                 DSDT Address : 0000000000000000
  [094h 0148  12]             PM1A Event Block : [Generic Address Structure]
  [094h 0148   1]                     Space ID : 00 [SystemMemory]
  [095h 0149   1]                    Bit Width : 00
  [096h 0150   1]                   Bit Offset : 00
  [097h 0151   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [098h 0152   8]                      Address : 0000000000000000
  [0A0h 0160  12]             PM1B Event Block : [Generic Address Structure]
  [0A0h 0160   1]                     Space ID : 00 [SystemMemory]
  [0A1h 0161   1]                    Bit Width : 00
  [0A2h 0162   1]                   Bit Offset : 00
  [0A3h 0163   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [0A4h 0164   8]                      Address : 0000000000000000
@@ -XXX,XX +XXX,XX @@
  [0F5h 0245   1]                    Bit Width : 00
  [0F6h 0246   1]                   Bit Offset : 00
  [0F7h 0247   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [0F8h 0248   8]                      Address : 0000000000000000
  [100h 0256  12]        Sleep Status Register : [Generic Address Structure]
  [100h 0256   1]                     Space ID : 00 [SystemMemory]
  [101h 0257   1]                    Bit Width : 00
  [102h 0258   1]                   Bit Offset : 00
  [103h 0259   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [104h 0260   8]                      Address : 0000000000000000
  [10Ch 0268   8]                Hypervisor ID : 00000000554D4551
  Raw Table Data: Length 276 (0x114)
 -    0000: 46 41 43 50 14 01 00 00 06 15 42 4F 43 48 53 20  // FACP......BOCHS
 +    0000: 46 41 43 50 14 01 00 00 06 12 42 4F 43 48 53 20  // FACP......BOCHS
 : 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
 : 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 -    0080: 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 +    0080: 00 03 00 03 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 51 45 4D 55  // ............QEMU
 : 00 00 00 00                                      // ....
@@ -XXX,XX +XXX,XX @@
  /*
   * Intel ACPI Component Architecture
   * AML/ASL+ Disassembler version 20200925 (64-bit version)
   * Copyright (c) 2000 - 2020 Intel Corporation
   *
 - * Disassembly of tests/data/acpi/virt/GTDT, Mon Jan 22 13:48:40 2024
 + * Disassembly of /tmp/aml-XDSZH2, Mon Jan 22 13:48:40 2024
   *
   * ACPI Data Table [GTDT]
   *
   * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
   */
  [000h 0000   4]                    Signature : "GTDT"    [Generic Timer Description Table]
 -[004h 0004   4]                 Table Length : 00000060
 -[008h 0008   1]                     Revision : 02
 -[009h 0009   1]                     Checksum : 9C
 +[004h 0004   4]                 Table Length : 00000068
 +[008h 0008   1]                     Revision : 03
 +[009h 0009   1]                     Checksum : 93
  [00Ah 0010   6]                       Oem ID : "BOCHS "
  [010h 0016   8]                 Oem Table ID : "BXPC    "
  [018h 0024   4]                 Oem Revision : 00000001
  [01Ch 0028   4]              Asl Compiler ID : "BXPC"
  [020h 0032   4]        Asl Compiler Revision : 00000001
  [024h 0036   8]        Counter Block Address : FFFFFFFFFFFFFFFF
  [02Ch 0044   4]                     Reserved : 00000000
  [030h 0048   4]         Secure EL1 Interrupt : 0000001D
  [034h 0052   4]    EL1 Flags (decoded below) : 00000000
                                  Trigger Mode : 0
                                      Polarity : 0
                                     Always On : 0
  [038h 0056   4]     Non-Secure EL1 Interrupt : 0000001E
@@ -XXX,XX +XXX,XX @@
  [040h 0064   4]      Virtual Timer Interrupt : 0000001B
  [044h 0068   4]     VT Flags (decoded below) : 00000000
                                  Trigger Mode : 0
                                      Polarity : 0
                                     Always On : 0
  [048h 0072   4]     Non-Secure EL2 Interrupt : 0000001A
  [04Ch 0076   4]   NEL2 Flags (decoded below) : 00000000
                                  Trigger Mode : 0
                                      Polarity : 0
                                     Always On : 0
  [050h 0080   8]   Counter Read Block Address : FFFFFFFFFFFFFFFF
  [058h 0088   4]         Platform Timer Count : 00000000
  [05Ch 0092   4]        Platform Timer Offset : 00000000
 +[060h 0096   4]       Virtual EL2 Timer GSIV : 00000000
 +[064h 0100   4]      Virtual EL2 Timer Flags : 00000000
 -Raw Table Data: Length 96 (0x60)
 +Raw Table Data: Length 104 (0x68)
 -    0000: 47 54 44 54 60 00 00 00 02 9C 42 4F 43 48 53 20  // GTDT`.....BOCHS
 +    0000: 47 54 44 54 68 00 00 00 03 93 42 4F 43 48 53 20  // GTDTh.....BOCHS
 : 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
 : 01 00 00 00 FF FF FF FF FF FF FF FF 00 00 00 00  // ................
 : 1D 00 00 00 00 00 00 00 1E 00 00 00 04 00 00 00  // ................
 : 1B 00 00 00 00 00 00 00 1A 00 00 00 00 00 00 00  // ................
 : FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00  // ................
 +    0060: 00 00 00 00 00 00 00 00                          // ........
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
 Message-id: 20240122143537.233498-4-peter.maydell@linaro.org
 ---
  tests/qtest/bios-tables-test-allowed-diff.h |   2 --
  tests/data/acpi/virt/FACP                   | Bin 276 -> 276 bytes
  tests/data/acpi/virt/GTDT                   | Bin 96 -> 104 bytes
 files changed, 2 deletions(-)
 diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qtest/bios-tables-test-allowed-diff.h
 +++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,3 +1 @@
  /* List of comma-separated changed AML files to ignore */
 -"tests/data/acpi/virt/FACP",
 -"tests/data/acpi/virt/GTDT",
 diff --git a/tests/data/acpi/virt/FACP b/tests/data/acpi/virt/FACP
 index XXXXXXX..XXXXXXX 100644
 GIT binary patch
 delta 25
 gcmbQjG=+)F&CxkPgpq-PO=u!l<;2F$$vli407<0<)c^nh
 delta 28
 kcmbQjG=+)F&CxkPgpq-PO>`nx<-|!<6Akz$^DuG%0AAS!ssI20
 diff --git a/tests/data/acpi/virt/GTDT b/tests/data/acpi/virt/GTDT
 index XXXXXXX..XXXXXXX 100644
 GIT binary patch
 delta 25
 bcmYeu;BpUf3CUn!U|^m+kt>V?$N&QXMtB4L
 delta 16
 Xcmc~u;BpUf2}xjJU|^avkt+-UB60)u
 --
 .34.1

-New patch
+[PULL 14/35] hw/arm/npcm7xx: Call qemu_configure_nic_device() for GMAC modules
+The patchset adding the GMAC ethernet to this SoC crossed in the
+mail with the patchset cleaning up the NIC handling. When we
+create the GMAC modules we must call qemu_configure_nic_device()
+so that the user has the opportunity to use the -nic commandline
+option to create a network backend and connect it to the GMACs.
+Add the missing call.
+Fixes: 21e5326a7c ("hw/arm: Add GMAC devices to NPCM7XX SoC")
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
+Message-id: 20240206171231.396392-2-peter.maydell@linaro.org
+---
+ hw/arm/npcm7xx.c | 1 +
+file changed, 1 insertion(+)
+diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/arm/npcm7xx.c
++++ b/hw/arm/npcm7xx.c
+@@ -XXX,XX +XXX,XX @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
+     for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
+         SysBusDevice *sbd = SYS_BUS_DEVICE(&s->gmac[i]);
++        qemu_configure_nic_device(DEVICE(sbd), false, NULL);
+         /*
+          * The device exists regardless of whether it's connected to a QEMU
+          * netdev backend. So always instantiate it even if there is no
+--
+.34.1

-New patch
+[PULL 15/35] tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
+Currently QEMU will warn if there is a NIC on the board that
+is not connected to a backend. By default the '-nic user' will
+get used for all NICs, but if you manually connect a specific
+NIC to a specific backend, then the other NICs on the board
+have no backend and will be warned about:
+qemu-system-arm: warning: nic npcm7xx-emc.1 has no peer
+qemu-system-arm: warning: nic npcm-gmac.0 has no peer
+qemu-system-arm: warning: nic npcm-gmac.1 has no peer
+So suppress those warnings by manually connecting every NIC
+on the board to some backend.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
+Reviewed-by: Thomas Huth <thuth@redhat.com>
+Message-id: 20240206171231.396392-3-peter.maydell@linaro.org
+---
+ tests/qtest/npcm7xx_emc-test.c | 5 ++++-
+file changed, 4 insertions(+), 1 deletion(-)
+diff --git a/tests/qtest/npcm7xx_emc-test.c b/tests/qtest/npcm7xx_emc-test.c
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qtest/npcm7xx_emc-test.c
++++ b/tests/qtest/npcm7xx_emc-test.c
+@@ -XXX,XX +XXX,XX @@ static int *packet_test_init(int module_num, GString *cmd_line)
+      * KISS and use -nic. The driver accepts 'emc0' and 'emc1' as aliases
+      * in the 'model' field to specify the device to match.
+      */
+-    g_string_append_printf(cmd_line, " -nic socket,fd=%d,model=emc%d ",
++    g_string_append_printf(cmd_line, " -nic socket,fd=%d,model=emc%d "
++                           "-nic user,model=npcm7xx-emc "
++                           "-nic user,model=npcm-gmac "
++                           "-nic user,model=npcm-gmac",
+                            test_sockets[1], module_num);
+     g_test_queue_destroy(packet_test_clear, test_sockets);
+--
+.34.1

-[PULL 15/23] target/arm: Convert Neon VEXT to decodetree
+[PULL 16/35] target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
-Convert the Neon VEXT insn to decodetree. Rather than keeping the
+It doesn't make sense to read the value of MDCR_EL2 on a non-A-profile
-old implementation which used fixed temporaries cpu_V0 and cpu_V1
+CPU, and in fact if you try to do it we will assert:
 and did the extraction with by-hand shift and logic ops, we use
 the TCG extract2 insn.
-We don't need to special case 0 or 8 immediates any more as the
+#6  0x00007ffff4b95e96 in __GI___assert_fail
-optimizer is smart enough to throw away the dead code.
+    (assertion=0x5555565a8c70 "!arm_feature(env, ARM_FEATURE_M)", file=0x5555565a6e5c "../../target/arm/helper.c", line=12600, function=0x5555565a9560 <__PRETTY_FUNCTION__.0> "arm_security_space_below_el3") at ./assert/assert.c:101
 #7  0x0000555555ebf412 in arm_security_space_below_el3 (env=0x555557bc8190) at ../../target/arm/helper.c:12600
 #8  0x0000555555ea6f89 in arm_is_el2_enabled (env=0x555557bc8190) at ../../target/arm/cpu.h:2595
 #9  0x0000555555ea942f in arm_mdcr_el2_eff (env=0x555557bc8190) at ../../target/arm/internals.h:1512
+We might call pmu_counter_enabled() on an M-profile CPU (for example
+from the migration pre/post hooks in machine.c); this should always
+return false because these CPUs don't set ARM_FEATURE_PMU.
+Avoid the assertion by not calling arm_mdcr_el2_eff() before we
+have done the early return for "PMU not present".
+This fixes an assertion failure if you try to do a loadvm or
+savevm for an M-profile board.
+Cc: qemu-stable@nongnu.org
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2155
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240208153346.970021-1-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  8 +++-
+ target/arm/helper.c | 12 ++++++++++--
- target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
+file changed, 10 insertions(+), 2 deletions(-)
  target/arm/translate.c          | 58 +------------------------
 files changed, 85 insertions(+), 57 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/helper.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static bool pmu_counter_enabled(CPUARMState *env, uint8_t counter)
- # return false for size==3.
+     bool enabled, prohibited = false, filtered;
- ######################################################################
+     bool secure = arm_is_secure(env);
- {
+     int el = arm_current_el(env);
--  # 0b11 subgroup will go here
+-    uint64_t mdcr_el2 = arm_mdcr_el2_eff(env);
-+  [
+-    uint8_t hpmn = mdcr_el2 & MDCR_HPMN;
-+    ##################################################################
++    uint64_t mdcr_el2;
-+    # Miscellaneous size=0b11 insns
++    uint8_t hpmn;
-+    ##################################################################
-+    VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
++    /*
-+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
++     * We might be called for M-profile cores where MDCR_EL2 doesn't
-+  ]
++     * exist and arm_mdcr_el2_eff() will assert, so this early-exit check
++     * must be before we read that value.
-   # Subgroup for size != 0b11
++     */
-   [
+     if (!arm_feature(env, ARM_FEATURE_PMU)) {
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+         return false;
-index XXXXXXX..XXXXXXX 100644
+     }
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
++    mdcr_el2 = arm_mdcr_el2_eff(env);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
++    hpmn = mdcr_el2 & MDCR_HPMN;
      return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
  }
 +
-+static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
+     if (!arm_feature(env, ARM_FEATURE_EL2) ||
-+{
+             (counter < hpmn || counter == 31)) {
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+         e = env->cp15.c9_pmcr & PMCRE;
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if ((a->vn | a->vm | a->vd) & a->q) {
 +        return false;
 +    }
 +
 +    if (a->imm > 7 && !a->q) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (!a->q) {
 +        /* Extract 64 bits from <Vm:Vn> */
 +        TCGv_i64 left, right, dest;
 +
 +        left = tcg_temp_new_i64();
 +        right = tcg_temp_new_i64();
 +        dest = tcg_temp_new_i64();
 +
 +        neon_load_reg64(right, a->vn);
 +        neon_load_reg64(left, a->vm);
 +        tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
 +        neon_store_reg64(dest, a->vd);
 +
 +        tcg_temp_free_i64(left);
 +        tcg_temp_free_i64(right);
 +        tcg_temp_free_i64(dest);
 +    } else {
 +        /* Extract 128 bits from <Vm+1:Vm:Vn+1:Vn> */
 +        TCGv_i64 left, middle, right, destleft, destright;
 +
 +        left = tcg_temp_new_i64();
 +        middle = tcg_temp_new_i64();
 +        right = tcg_temp_new_i64();
 +        destleft = tcg_temp_new_i64();
 +        destright = tcg_temp_new_i64();
 +
 +        if (a->imm < 8) {
 +            neon_load_reg64(right, a->vn);
 +            neon_load_reg64(middle, a->vn + 1);
 +            tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
 +            neon_load_reg64(left, a->vm);
 +            tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
 +        } else {
 +            neon_load_reg64(right, a->vn + 1);
 +            neon_load_reg64(middle, a->vm);
 +            tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
 +            neon_load_reg64(left, a->vm + 1);
 +            tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
 +        }
 +
 +        neon_store_reg64(destright, a->vd);
 +        neon_store_reg64(destleft, a->vd + 1);
 +
 +        tcg_temp_free_i64(destright);
 +        tcg_temp_free_i64(destleft);
 +        tcg_temp_free_i64(right);
 +        tcg_temp_free_i64(middle);
 +        tcg_temp_free_i64(left);
 +    }
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
      int pass;
      int u;
      int vec_size;
 -    uint32_t imm;
      TCGv_i32 tmp, tmp2, tmp3, tmp5;
      TCGv_ptr ptr1;
 -    TCGv_i64 tmp64;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
              return 1;
          } else { /* size == 3 */
              if (!u) {
 -                /* Extract.  */
 -                imm = (insn >> 8) & 0xf;
 -
 -                if (imm > 7 && !q)
 -                    return 1;
 -
 -                if (q && ((rd | rn | rm) & 1)) {
 -                    return 1;
 -                }
 -
 -                if (imm == 0) {
 -                    neon_load_reg64(cpu_V0, rn);
 -                    if (q) {
 -                        neon_load_reg64(cpu_V1, rn + 1);
 -                    }
 -                } else if (imm == 8) {
 -                    neon_load_reg64(cpu_V0, rn + 1);
 -                    if (q) {
 -                        neon_load_reg64(cpu_V1, rm);
 -                    }
 -                } else if (q) {
 -                    tmp64 = tcg_temp_new_i64();
 -                    if (imm < 8) {
 -                        neon_load_reg64(cpu_V0, rn);
 -                        neon_load_reg64(tmp64, rn + 1);
 -                    } else {
 -                        neon_load_reg64(cpu_V0, rn + 1);
 -                        neon_load_reg64(tmp64, rm);
 -                    }
 -                    tcg_gen_shri_i64(cpu_V0, cpu_V0, (imm & 7) * 8);
 -                    tcg_gen_shli_i64(cpu_V1, tmp64, 64 - ((imm & 7) * 8));
 -                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
 -                    if (imm < 8) {
 -                        neon_load_reg64(cpu_V1, rm);
 -                    } else {
 -                        neon_load_reg64(cpu_V1, rm + 1);
 -                        imm -= 8;
 -                    }
 -                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
 -                    tcg_gen_shri_i64(tmp64, tmp64, imm * 8);
 -                    tcg_gen_or_i64(cpu_V1, cpu_V1, tmp64);
 -                    tcg_temp_free_i64(tmp64);
 -                } else {
 -                    /* BUGFIX */
 -                    neon_load_reg64(cpu_V0, rn);
 -                    tcg_gen_shri_i64(cpu_V0, cpu_V0, imm * 8);
 -                    neon_load_reg64(cpu_V1, rm);
 -                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
 -                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
 -                }
 -                neon_store_reg64(cpu_V0, rd);
 -                if (q) {
 -                    neon_store_reg64(cpu_V1, rd + 1);
 -                }
 +                /* Extract: handled by decodetree */
 +                return 1;
              } else if ((insn & (1 << 11)) == 0) {
                  /* Two register misc.  */
                  op = ((insn >> 12) & 0x30) | ((insn >> 7) & 0xf);
 --
-.20.1
+.34.1

-[PULL 21/23] hw/net/imx_fec: Convert debug fprintf() to trace events
+[PULL 17/35] tests/qtest: Fix GMAC test to run on a machine in upstream QEMU
-From: Jean-Christophe Dubois <jcd@tribudubois.net>
+From: Nabih Estefan <nabihestefan@google.com>
-Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
+Fix the nocm_gmac-test.c file to run on a nuvoton 7xx machine instead
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+of 8xx. Also fix comments referencing this and values expecting 8xx.
-Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-[PMD: Fixed 32-bit format string using PRIx32/PRIx64]
+Change-Id: Iabd0fba14910c3f1e883c4a9521350f3db9ffab8
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Signed-Off-By: Nabih Estefan <nabihestefan@google.com>
 Reviewed-by: Tyrone Ting <kfting@nuvoton.com>
 Message-id: 20240208194759.2858582-2-nabihestefan@google.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 [PMM: commit message tweaks]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/net/imx_fec.c    | 106 +++++++++++++++++++-------------------------
+ tests/qtest/npcm_gmac-test.c | 84 +-----------------------------------
- hw/net/trace-events |  18 ++++++++
+ tests/qtest/meson.build      |  3 +-
-files changed, 63 insertions(+), 61 deletions(-)
+files changed, 4 insertions(+), 83 deletions(-)
-diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
+diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/imx_fec.c
+--- a/tests/qtest/npcm_gmac-test.c
-+++ b/hw/net/imx_fec.c
++++ b/tests/qtest/npcm_gmac-test.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ typedef struct TestData {
- #include "qemu/module.h"
+     const GMACModule *module;
- #include "net/checksum.h"
+ } TestData;
- #include "net/eth.h"
-+#include "trace.h"
+-/* Values extracted from hw/arm/npcm8xx.c */
++/* Values extracted from hw/arm/npcm7xx.c */
- /* For crc32 */
+ static const GMACModule gmac_module_list[] = {
- #include <zlib.h>
+     {
+         .irq        = 14,
--#ifndef DEBUG_IMX_FEC
+@@ -XXX,XX +XXX,XX @@ static const GMACModule gmac_module_list[] = {
--#define DEBUG_IMX_FEC 0
+         .irq        = 15,
--#endif
+         .base_addr  = 0xf0804000
      },
 -    {
 -        .irq        = 16,
 -        .base_addr  = 0xf0806000
 -    },
 -    {
 -        .irq        = 17,
 -        .base_addr  = 0xf0808000
 -    }
  };
  /* Returns the index of the GMAC module. */
@@ -XXX,XX +XXX,XX @@ static uint32_t gmac_read(QTestState *qts, const GMACModule *mod,
      return qtest_readl(qts, mod->base_addr + regno);
  }
 -static uint16_t pcs_read(QTestState *qts, const GMACModule *mod,
 -                          NPCMRegister regno)
 -{
 -    uint32_t write_value = (regno & 0x3ffe00) >> 9;
 -    qtest_writel(qts, PCS_BASE_ADDRESS + NPCM_PCS_IND_AC_BA, write_value);
 -    uint32_t read_offset = regno & 0x1ff;
 -    return qtest_readl(qts, PCS_BASE_ADDRESS + read_offset);
 -}
 -
--#define FEC_PRINTF(fmt, args...) \
+ /* Check that GMAC registers are reset to default value */
  static void test_init(gconstpointer test_data)
  {
      const TestData *td = test_data;
      const GMACModule *mod = td->module;
 -    QTestState *qts = qtest_init("-machine npcm845-evb");
 +    QTestState *qts = qtest_init("-machine npcm750-evb");
  #define CHECK_REG32(regno, value) \
      do { \
          g_assert_cmphex(gmac_read(qts, mod, (regno)), ==, (value)); \
      } while (0)
 -#define CHECK_REG_PCS(regno, value) \
 -    do { \
--        if (DEBUG_IMX_FEC) { \
+-        g_assert_cmphex(pcs_read(qts, mod, (regno)), ==, (value)); \
 -            fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_FEC, \
 -                                             __func__, ##args); \
 -        } \
 -    } while (0)
 -
--#ifndef DEBUG_IMX_PHY
+     CHECK_REG32(NPCM_DMA_BUS_MODE, 0x00020100);
--#define DEBUG_IMX_PHY 0
+     CHECK_REG32(NPCM_DMA_XMT_POLL_DEMAND, 0);
--#endif
+     CHECK_REG32(NPCM_DMA_RCV_POLL_DEMAND, 0);
@@ -XXX,XX +XXX,XX @@ static void test_init(gconstpointer test_data)
      CHECK_REG32(NPCM_GMAC_PTP_TAR, 0);
      CHECK_REG32(NPCM_GMAC_PTP_TTSR, 0);
 -    /* TODO Add registers PCS */
 -    if (mod->base_addr == 0xf0802000) {
 -        CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID1, 0x699e);
 -        CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID2, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_CTL_STS, 0x8000);
 -
--#define PHY_PRINTF(fmt, args...) \
+-        CHECK_REG_PCS(NPCM_PCS_SR_MII_CTRL, 0x1140);
--    do { \
+-        CHECK_REG_PCS(NPCM_PCS_SR_MII_STS, 0x0109);
--        if (DEBUG_IMX_PHY) { \
+-        CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID1, 0x699e);
--            fprintf(stderr, "[%s.phy]%s: " fmt , TYPE_IMX_FEC, \
+-        CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID2, 0x0ced0);
--                                                 __func__, ##args); \
+-        CHECK_REG_PCS(NPCM_PCS_SR_MII_AN_ADV, 0x0020);
--        } \
+-        CHECK_REG_PCS(NPCM_PCS_SR_MII_LP_BABL, 0);
--    } while (0)
+-        CHECK_REG_PCS(NPCM_PCS_SR_MII_AN_EXPN, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_EXT_STS, 0xc000);
 -
- #define IMX_MAX_DESC    1024
+-        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_ABL, 0x0003);
+-        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_LWR, 0x0038);
- static const char *imx_default_reg_name(IMXFECState *s, uint32_t index)
+-        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_UPR, 0);
-@@ -XXX,XX +XXX,XX @@ static void imx_eth_update(IMXFECState *s);
+-        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_LWR, 0x0038);
-  * For now we don't handle any GPIO/interrupt line, so the OS will
+-        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_UPR, 0);
-  * have to poll for the PHY status.
+-        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_LWR, 0x0058);
-  */
+-        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_UPR, 0);
--static void phy_update_irq(IMXFECState *s)
+-        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_LWR, 0x0048);
-+static void imx_phy_update_irq(IMXFECState *s)
+-        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_UPR, 0);
- {
+-
-     imx_eth_update(s);
+-        CHECK_REG_PCS(NPCM_PCS_VR_MII_MMD_DIG_CTRL1, 0x2400);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_AN_CTRL, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_AN_INTR_STS, 0x000a);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_TC, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DBG_CTRL, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_MCTRL0, 0x899c);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_TXTIMER, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_RXTIMER, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_LINK_TIMER_CTRL, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_MCTRL1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_STS, 0x0010);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_ICG_ERRCNT1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MISC_STS, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_RX_LSTS, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_BSTCTRL0, 0x00a);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_LVLCTRL0, 0x007f);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_GENCTRL0, 0x0001);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_GENCTRL1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_STS, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_GENCTRL0, 0x0100);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_GENCTRL1, 0x1100);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_LOS_CTRL0, 0x000e);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_CTRL0, 0x0100);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_CTRL1, 0x0032);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_STS, 0x0001);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL2, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_LVL_CTRL, 0x0019);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL0, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_CTRL2, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_ERRCNT_SEL, 0);
 -    }
 -
      qtest_quit(qts);
  }
--static void phy_update_link(IMXFECState *s)
+diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
 +static void imx_phy_update_link(IMXFECState *s)
  {
      /* Autonegotiation status mirrors link status.  */
      if (qemu_get_queue(s->nic)->link_down) {
 -        PHY_PRINTF("link is down\n");
 +        trace_imx_phy_update_link("down");
          s->phy_status &= ~0x0024;
          s->phy_int |= PHY_INT_DOWN;
      } else {
 -        PHY_PRINTF("link is up\n");
 +        trace_imx_phy_update_link("up");
          s->phy_status |= 0x0024;
          s->phy_int |= PHY_INT_ENERGYON;
          s->phy_int |= PHY_INT_AUTONEG_COMPLETE;
      }
 -    phy_update_irq(s);
 +    imx_phy_update_irq(s);
  }
  static void imx_eth_set_link(NetClientState *nc)
  {
 -    phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
 +    imx_phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
  }
 -static void phy_reset(IMXFECState *s)
 +static void imx_phy_reset(IMXFECState *s)
  {
 +    trace_imx_phy_reset();
 +
      s->phy_status = 0x7809;
      s->phy_control = 0x3000;
      s->phy_advertise = 0x01e1;
      s->phy_int_mask = 0;
      s->phy_int = 0;
 -    phy_update_link(s);
 +    imx_phy_update_link(s);
  }
 -static uint32_t do_phy_read(IMXFECState *s, int reg)
 +static uint32_t imx_phy_read(IMXFECState *s, int reg)
  {
      uint32_t val;
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
      case 29:    /* Interrupt source.  */
          val = s->phy_int;
          s->phy_int = 0;
 -        phy_update_irq(s);
 +        imx_phy_update_irq(s);
          break;
      case 30:    /* Interrupt mask */
          val = s->phy_int_mask;
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
          break;
      }
 -    PHY_PRINTF("read 0x%04x @ %d\n", val, reg);
 +    trace_imx_phy_read(val, reg);
      return val;
  }
 -static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
 +static void imx_phy_write(IMXFECState *s, int reg, uint32_t val)
  {
 -    PHY_PRINTF("write 0x%04x @ %d\n", val, reg);
 +    trace_imx_phy_write(val, reg);
      if (reg > 31) {
          /* we only advertise one phy */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
      switch (reg) {
      case 0:     /* Basic Control */
          if (val & 0x8000) {
 -            phy_reset(s);
 +            imx_phy_reset(s);
          } else {
              s->phy_control = val & 0x7980;
              /* Complete autonegotiation immediately.  */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
          break;
      case 30:    /* Interrupt mask */
          s->phy_int_mask = val & 0xff;
 -        phy_update_irq(s);
 +        imx_phy_update_irq(s);
          break;
      case 17:
      case 18:
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
  static void imx_fec_read_bd(IMXFECBufDesc *bd, dma_addr_t addr)
  {
      dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
 +
 +    trace_imx_fec_read_bd(addr, bd->flags, bd->length, bd->data);
  }
  static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
  static void imx_enet_read_bd(IMXENETBufDesc *bd, dma_addr_t addr)
  {
      dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
 +
 +    trace_imx_enet_read_bd(addr, bd->flags, bd->length, bd->data,
 +                   bd->option, bd->status);
  }
  static void imx_enet_write_bd(IMXENETBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_do_tx(IMXFECState *s)
          int len;
          imx_fec_read_bd(&bd, addr);
 -        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x\n",
 -                   addr, bd.flags, bd.length, bd.data);
          if ((bd.flags & ENET_BD_R) == 0) {
 +
              /* Run out of descriptors to transmit.  */
 -            FEC_PRINTF("tx_bd ran out of descriptors to transmit\n");
 +            trace_imx_eth_tx_bd_busy();
 +
              break;
          }
          len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_enet_do_tx(IMXFECState *s, uint32_t index)
          int len;
          imx_enet_read_bd(&bd, addr);
 -        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x option %04x "
 -                   "status %04x\n", addr, bd.flags, bd.length, bd.data,
 -                   bd.option, bd.status);
          if ((bd.flags & ENET_BD_R) == 0) {
              /* Run out of descriptors to transmit.  */
 +
 +            trace_imx_eth_tx_bd_busy();
 +
              break;
          }
          len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_eth_enable_rx(IMXFECState *s, bool flush)
      s->regs[ENET_RDAR] = (bd.flags & ENET_BD_E) ? ENET_RDAR_RDAR : 0;
      if (!s->regs[ENET_RDAR]) {
 -        FEC_PRINTF("RX buffer full\n");
 +        trace_imx_eth_rx_bd_full();
      } else if (flush) {
          qemu_flush_queued_packets(qemu_get_queue(s->nic));
      }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_reset(DeviceState *d)
      memset(s->tx_descriptor, 0, sizeof(s->tx_descriptor));
      /* We also reset the PHY */
 -    phy_reset(s);
 +    imx_phy_reset(s);
  }
  static uint32_t imx_default_read(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_eth_read(void *opaque, hwaddr offset, unsigned size)
          break;
      }
 -    FEC_PRINTF("reg[%s] => 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
 -                                              value);
 +    trace_imx_eth_read(index, imx_eth_reg_name(s, index), value);
      return value;
  }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
      const bool single_tx_ring = !imx_eth_is_multi_tx_ring(s);
      uint32_t index = offset >> 2;
 -    FEC_PRINTF("reg[%s] <= 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
 -                (uint32_t)value);
 +    trace_imx_eth_write(index, imx_eth_reg_name(s, index), value);
      switch (index) {
      case ENET_EIR:
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
          if (extract32(value, 29, 1)) {
              /* This is a read operation */
              s->regs[ENET_MMFR] = deposit32(s->regs[ENET_MMFR], 0, 16,
 -                                           do_phy_read(s,
 +                                           imx_phy_read(s,
                                                         extract32(value,
 , 10)));
          } else {
              /* This a write operation */
 -            do_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
 +            imx_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
          }
          /* raise the interrupt as the PHY operation is done */
          s->regs[ENET_EIR] |= ENET_INT_MII;
@@ -XXX,XX +XXX,XX @@ static bool imx_eth_can_receive(NetClientState *nc)
  {
      IMXFECState *s = IMX_FEC(qemu_get_nic_opaque(nc));
 -    FEC_PRINTF("\n");
 -
      return !!s->regs[ENET_RDAR];
  }
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
      unsigned int buf_len;
      size_t size = len;
 -    FEC_PRINTF("len %d\n", (int)size);
 +    trace_imx_fec_receive(size);
      if (!s->regs[ENET_RDAR]) {
          qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
          bd.length = buf_len;
          size -= buf_len;
 -        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
 +        trace_imx_fec_receive_len(addr, bd.length);
          /* The last 4 bytes are the CRC.  */
          if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
          if (size == 0) {
              /* Last buffer in frame.  */
              bd.flags |= flags | ENET_BD_L;
 -            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
 +
 +            trace_imx_fec_receive_last(bd.flags);
 +
              s->regs[ENET_EIR] |= ENET_INT_RXF;
          } else {
              s->regs[ENET_EIR] |= ENET_INT_RXB;
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
      size_t size = len;
      bool shift16 = s->regs[ENET_RACC] & ENET_RACC_SHIFT16;
 -    FEC_PRINTF("len %d\n", (int)size);
 +    trace_imx_enet_receive(size);
      if (!s->regs[ENET_RDAR]) {
          qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
          bd.length = buf_len;
          size -= buf_len;
 -        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
 +        trace_imx_enet_receive_len(addr, bd.length);
          /* The last 4 bytes are the CRC.  */
          if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
          if (size == 0) {
              /* Last buffer in frame.  */
              bd.flags |= flags | ENET_BD_L;
 -            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
 +
 +            trace_imx_enet_receive_last(bd.flags);
 +
              /* Indicate that we've updated the last buffer descriptor. */
              bd.last_buffer = ENET_BD_BDU;
              if (bd.option & ENET_BD_RX_INT) {
 diff --git a/hw/net/trace-events b/hw/net/trace-events
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/trace-events
+--- a/tests/qtest/meson.build
-+++ b/hw/net/trace-events
++++ b/tests/qtest/meson.build
-@@ -XXX,XX +XXX,XX @@ i82596_receive_packet(size_t sz) "len=%zu"
+@@ -XXX,XX +XXX,XX @@ qtests_npcm7xx = \
- i82596_new_mac(const char *id_with_mac) "New MAC for: %s"
+    'npcm7xx_sdhci-test',
- i82596_set_multicast(uint16_t count) "Added %d multicast entries"
+    'npcm7xx_smbus-test',
- i82596_channel_attention(void *s) "%p: Received CHANNEL ATTENTION"
+    'npcm7xx_timer-test',
-+
+-   'npcm7xx_watchdog_timer-test'] + \
-+# imx_fec.c
++   'npcm7xx_watchdog_timer-test',
-+imx_phy_read(uint32_t val, int reg) "0x%04"PRIx32" <= reg[%d]"
++   'npcm_gmac-test'] + \
-+imx_phy_write(uint32_t val, int reg) "0x%04"PRIx32" => reg[%d]"
+    (slirp.found() ? ['npcm7xx_emc-test'] : [])
-+imx_phy_update_link(const char *s) "%s"
+ qtests_aspeed = \
-+imx_phy_reset(void) ""
+   ['aspeed_hace-test',
 +imx_fec_read_bd(uint64_t addr, int flags, int len, int data) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x"
 +imx_enet_read_bd(uint64_t addr, int flags, int len, int data, int options, int status) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x option 0x%04x status 0x%04x"
 +imx_eth_tx_bd_busy(void) "tx_bd ran out of descriptors to transmit"
 +imx_eth_rx_bd_full(void) "RX buffer is full"
 +imx_eth_read(int reg, const char *reg_name, uint32_t value) "reg[%d:%s] => 0x%08"PRIx32
 +imx_eth_write(int reg, const char *reg_name, uint64_t value) "reg[%d:%s] <= 0x%08"PRIx64
 +imx_fec_receive(size_t size) "len %zu"
 +imx_fec_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
 +imx_fec_receive_last(int last) "rx frame flags 0x%04x"
 +imx_enet_receive(size_t size) "len %zu"
 +imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
 +imx_enet_receive_last(int last) "rx frame flags 0x%04x"
 --
-.20.1
+.34.1

-New patch
+[PULL 18/35] hw/arm/smmuv3: add support for stage 1 access fault
+From: Luc Michel <luc.michel@amd.com>
+An access fault is raised when the Access Flag is not set in the
+looked-up PTE and the AFFD field is not set in the corresponding context
+descriptor. This was already implemented for stage 2. Implement it for
+stage 1 as well.
+Signed-off-by: Luc Michel <luc.michel@amd.com>
+Reviewed-by: Mostafa Saleh <smostafa@google.com>
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
+Tested-by: Mostafa Saleh <smostafa@google.com>
+Message-id: 20240213082211.3330400-1-luc.michel@amd.com
+[PMM: tweaked comment text]
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/arm/smmuv3-internal.h     |  1 +
+ include/hw/arm/smmu-common.h |  1 +
+ hw/arm/smmu-common.c         | 11 +++++++++++
+ hw/arm/smmuv3.c              |  1 +
+files changed, 14 insertions(+)
+diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/arm/smmuv3-internal.h
++++ b/hw/arm/smmuv3-internal.h
+@@ -XXX,XX +XXX,XX @@ static inline int pa_range(STE *ste)
+ #define CD_EPD(x, sel)   extract32((x)->word[0], (16 * (sel)) + 14, 1)
+ #define CD_ENDI(x)       extract32((x)->word[0], 15, 1)
+ #define CD_IPS(x)        extract32((x)->word[1], 0 , 3)
++#define CD_AFFD(x)       extract32((x)->word[1], 3 , 1)
+ #define CD_TBI(x)        extract32((x)->word[1], 6 , 2)
+ #define CD_HD(x)         extract32((x)->word[1], 10 , 1)
+ #define CD_HA(x)         extract32((x)->word[1], 11 , 1)
+diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/hw/arm/smmu-common.h
++++ b/include/hw/arm/smmu-common.h
+@@ -XXX,XX +XXX,XX @@ typedef struct SMMUTransCfg {
+     bool disabled;             /* smmu is disabled */
+     bool bypassed;             /* translation is bypassed */
+     bool aborted;              /* translation is aborted */
++    bool affd;                 /* AF fault disable */
+     uint32_t iotlb_hits;       /* counts IOTLB hits */
+     uint32_t iotlb_misses;     /* counts IOTLB misses*/
+     /* Used by stage-1 only. */
+diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/arm/smmu-common.c
++++ b/hw/arm/smmu-common.c
+@@ -XXX,XX +XXX,XX @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
+                                      pte_addr, pte, iova, gpa,
+                                      block_size >> 20);
+         }
++
++        /*
++         * QEMU does not currently implement HTTU, so if AFFD and PTE.AF
++         * are 0 we take an Access flag fault. (5.4. Context Descriptor)
++         * An Access flag fault takes priority over a Permission fault.
++         */
++        if (!PTE_AF(pte) && !cfg->affd) {
++            info->type = SMMU_PTW_ERR_ACCESS;
++            goto error;
++        }
++
+         ap = PTE_AP(pte);
+         if (is_permission_fault(ap, perm)) {
+             info->type = SMMU_PTW_ERR_PERMISSION;
+diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/arm/smmuv3.c
++++ b/hw/arm/smmuv3.c
+@@ -XXX,XX +XXX,XX @@ static int decode_cd(SMMUTransCfg *cfg, CD *cd, SMMUEventInfo *event)
+     cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
+     cfg->tbi = CD_TBI(cd);
+     cfg->asid = CD_ASID(cd);
++    cfg->affd = CD_AFFD(cd);
+     trace_smmuv3_decode_cd(cfg->oas);
+--
+.34.1

-[PULL 20/23] target/arm/cpu: adjust virtual time for all KVM arm cpus
+[PULL 19/35] hw/arm/stellaris: Convert ADC controller to Resettable interface
-From: fangying <fangying1@huawei.com>
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
-Virtual time adjustment was implemented for virt-5.0 machine type,
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-but the cpu property was enabled only for host-passthrough and max
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-cpu model.  Let's add it for any KVM arm cpu which has the generic
+Message-id: 20240213155214.13619-2-philmd@linaro.org
 timer feature enabled.
 Signed-off-by: Ying Fang <fangying1@huawei.com>
 Reviewed-by: Andrew Jones <drjones@redhat.com>
 Message-id: 20200608121243.2076-1-fangying1@huawei.com
 [PMM: minor commit message tweak, removed inaccurate
  suggested-by tag]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.c   |  6 ++++--
+ hw/arm/stellaris.c | 6 ++++--
- target/arm/cpu64.c |  1 -
+file changed, 4 insertions(+), 2 deletions(-)
  target/arm/kvm.c   | 21 +++++++++++----------
 files changed, 15 insertions(+), 13 deletions(-)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/hw/arm/stellaris.c
-+++ b/target/arm/cpu.c
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void stellaris_adc_trigger(void *opaque, int irq, int level)
      if (arm_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER)) {
          qdev_property_add_static(DEVICE(cpu), &arm_cpu_gt_cntfrq_property);
      }
-+
-+    if (kvm_enabled()) {
-+        kvm_arm_add_vcpu_properties(obj);
-+    }
  }
- static void arm_cpu_finalizefn(Object *obj)
+-static void stellaris_adc_reset(StellarisADCState *s)
-@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
++static void stellaris_adc_reset_hold(Object *obj)
+ {
-     if (kvm_enabled()) {
++    StellarisADCState *s = STELLARIS_ADC(obj);
-         kvm_arm_set_cpu_features_from_host(cpu);
+     int n;
--        kvm_arm_add_vcpu_properties(obj);
-     } else {
+     for (n = 0; n < 4; n++) {
-         cortex_a15_initfn(obj);
+@@ -XXX,XX +XXX,XX @@ static void stellaris_adc_init(Object *obj)
+     memory_region_init_io(&s->iomem, obj, &stellaris_adc_ops, s,
-@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
+                           "adc", 0x1000);
-     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
+     sysbus_init_mmio(sbd, &s->iomem);
-         aarch64_add_sve_properties(obj);
+-    stellaris_adc_reset(s);
-     }
+     qdev_init_gpio_in(dev, stellaris_adc_trigger, 1);
 -    kvm_arm_add_vcpu_properties(obj);
      arm_cpu_post_init(obj);
  }
-diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
+@@ -XXX,XX +XXX,XX @@ static const TypeInfo stellaris_i2c_info = {
-index XXXXXXX..XXXXXXX 100644
+ static void stellaris_adc_class_init(ObjectClass *klass, void *data)
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
      if (kvm_enabled()) {
          kvm_arm_set_cpu_features_from_host(cpu);
 -        kvm_arm_add_vcpu_properties(obj);
      } else {
          uint64_t t;
          uint32_t u;
 diff --git a/target/arm/kvm.c b/target/arm/kvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm.c
 +++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp)
  /* KVM VCPU properties should be prefixed with "kvm-". */
  void kvm_arm_add_vcpu_properties(Object *obj)
  {
--    if (!kvm_enabled()) {
+     DeviceClass *dc = DEVICE_CLASS(klass);
--        return;
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
--    }
-+    ARMCPU *cpu = ARM_CPU(obj);
++    rc->phases.hold = stellaris_adc_reset_hold;
-+    CPUARMState *env = &cpu->env;
+     dc->vmsd = &vmstate_stellaris_adc;
 -    ARM_CPU(obj)->kvm_adjvtime = true;
 -    object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
 -                             kvm_no_adjvtime_set);
 -    object_property_set_description(obj, "kvm-no-adjvtime",
 -                                    "Set on to disable the adjustment of "
 -                                    "the virtual counter. VM stopped time "
 -                                    "will be counted.");
 +    if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
 +        cpu->kvm_adjvtime = true;
 +        object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
 +                                 kvm_no_adjvtime_set);
 +        object_property_set_description(obj, "kvm-no-adjvtime",
 +                                        "Set on to disable the adjustment of "
 +                                        "the virtual counter. VM stopped time "
 +                                        "will be counted.");
 +    }
  }
- bool kvm_arm_pmu_supported(CPUState *cpu)
 --
-.20.1
+.34.1

-[PULL 10/23] target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
+[PULL 20/35] hw/arm/stellaris: Convert I2C controller to Resettable interface
-Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
 scalar" group to decodetree.  These are 32x32->32 operations where
 one of the inputs is the scalar, followed by a possible accumulate
 operation of the 32-bit result.
-The refactoring removes some of the oddities of the old decoder:
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
- * operands to the operation and accumulation were often
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-   reversed (taking advantage of the fact that most of these ops
+Message-id: 20240213155214.13619-3-philmd@linaro.org
-   are commutative); the new code follows the pseudocode order
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
- * the Q bit in the insn was in a local variable 'u'; in the
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-   new code it is decoded into a->q
+---
  hw/arm/stellaris.c | 26 ++++++++++++++++++++++----
 file changed, 22 insertions(+), 4 deletions(-)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/neon-dp.decode       |  15 ++++
  target/arm/translate-neon.inc.c | 133 ++++++++++++++++++++++++++++++++
  target/arm/translate.c          |  77 ++----------------
 files changed, 154 insertions(+), 71 deletions(-)
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/hw/arm/stellaris.c
-+++ b/target/arm/neon-dp.decode
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static void stellaris_sys_instance_init(Object *obj)
-     VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
+     s->sysclk = qdev_init_clock_out(DEVICE(s), "SYSCLK");
+ }
-     VMULL_P_3d   1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
 -/* I2C controller.  */
 +/*
 + * I2C controller.
 + * ??? For now we only implement the master interface.
 + */
  #define TYPE_STELLARIS_I2C "stellaris-i2c"
  OBJECT_DECLARE_SIMPLE_TYPE(stellaris_i2c_state, STELLARIS_I2C)
@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_write(void *opaque, hwaddr offset,
      stellaris_i2c_update(s);
  }
 -static void stellaris_i2c_reset(stellaris_i2c_state *s)
 +static void stellaris_i2c_reset_enter(Object *obj, ResetType type)
  {
 +    stellaris_i2c_state *s = STELLARIS_I2C(obj);
 +
-+    ##################################################################
+     if (s->mcs & STELLARIS_I2C_MCS_BUSBSY)
-+    # 2-regs-plus-scalar grouping:
+         i2c_end_transfer(s->bus);
 +    # 1111 001 Q 1 D sz!=11 Vn:4 Vd:4 opc:4 N 1 M 0 Vm:4
 +    ##################################################################
 +    &2scalar vm vn vd size q
 +
 +    @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
 +                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +    VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
 +
 +    VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
 +
 +    VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
    ]
  }
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
 , 16, 0, fn_gvec);
      return true;
  }
 +
 +static void gen_neon_dup_low16(TCGv_i32 var)
 +{
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +    tcg_gen_ext16u_i32(var, var);
 +    tcg_gen_shli_i32(tmp, var, 16);
 +    tcg_gen_or_i32(var, var, tmp);
 +    tcg_temp_free_i32(tmp);
 +}
 +
-+static void gen_neon_dup_high16(TCGv_i32 var)
++static void stellaris_i2c_reset_hold(Object *obj)
 +{
-+    TCGv_i32 tmp = tcg_temp_new_i32();
++    stellaris_i2c_state *s = STELLARIS_I2C(obj);
-+    tcg_gen_andi_i32(var, var, 0xffff0000);
-+    tcg_gen_shri_i32(tmp, var, 16);
+     s->msa = 0;
-+    tcg_gen_or_i32(var, var, tmp);
+     s->mcs = 0;
-+    tcg_temp_free_i32(tmp);
+@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_reset(stellaris_i2c_state *s)
      s->mimr = 0;
      s->mris = 0;
      s->mcr = 0;
 +}
 +
-+static inline TCGv_i32 neon_get_scalar(int size, int reg)
++static void stellaris_i2c_reset_exit(Object *obj)
 +{
-+    TCGv_i32 tmp;
++    stellaris_i2c_state *s = STELLARIS_I2C(obj);
 +    if (size == 1) {
 +        tmp = neon_load_reg(reg & 7, reg >> 4);
 +        if (reg & 8) {
 +            gen_neon_dup_high16(tmp);
 +        } else {
 +            gen_neon_dup_low16(tmp);
 +        }
 +    } else {
 +        tmp = neon_load_reg(reg & 15, reg >> 4);
 +    }
 +    return tmp;
 +}
 +
-+static bool do_2scalar(DisasContext *s, arg_2scalar *a,
+     stellaris_i2c_update(s);
-+                       NeonGenTwoOpFn *opfn, NeonGenTwoOpFn *accfn)
+ }
-+{
-+    /*
+@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_init(Object *obj)
-+     * Two registers and a scalar: perform an operation between
+     memory_region_init_io(&s->iomem, obj, &stellaris_i2c_ops, s,
-+     * the input elements and the scalar, and then possibly
+                           "i2c", 0x1000);
-+     * perform an accumulation operation of that result into the
+     sysbus_init_mmio(sbd, &s->iomem);
-+     * destination.
+-    /* ??? For now we only implement the master interface.  */
-+     */
+-    stellaris_i2c_reset(s);
-+    TCGv_i32 scalar;
+ }
-+    int pass;
-+
+ /* Analogue to Digital Converter.  This is only partially implemented,
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+@@ -XXX,XX +XXX,XX @@ type_init(stellaris_machine_init)
-+        return false;
+ static void stellaris_i2c_class_init(ObjectClass *klass, void *data)
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!opfn) {
 +        /* Bad size (including size == 3, which is a different insn group) */
 +        return false;
 +    }
 +
 +    if (a->q && ((a->vd | a->vn) & 1)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    scalar = neon_get_scalar(a->size, a->vm);
 +
 +    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 +        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
 +        opfn(tmp, tmp, scalar);
 +        if (accfn) {
 +            TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +            accfn(tmp, rd, tmp);
 +            tcg_temp_free_i32(rd);
 +        }
 +        neon_store_reg(a->vd, pass, tmp);
 +    }
 +    tcg_temp_free_i32(scalar);
 +    return true;
 +}
 +
 +static bool trans_VMUL_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_helper_neon_mul_u16,
 +        tcg_gen_mul_i32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VMLA_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_helper_neon_mul_u16,
 +        tcg_gen_mul_i32,
 +        NULL,
 +    };
 +    static NeonGenTwoOpFn * const accfn[] = {
 +        NULL,
 +        gen_helper_neon_add_u16,
 +        tcg_gen_add_i32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 +}
 +
 +static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_helper_neon_mul_u16,
 +        tcg_gen_mul_i32,
 +        NULL,
 +    };
 +    static NeonGenTwoOpFn * const accfn[] = {
 +        NULL,
 +        gen_helper_neon_sub_u16,
 +        tcg_gen_sub_i32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
  #define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16,  7)
  #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 -static void gen_neon_dup_low16(TCGv_i32 var)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_ext16u_i32(var, var);
 -    tcg_gen_shli_i32(tmp, var, 16);
 -    tcg_gen_or_i32(var, var, tmp);
 -    tcg_temp_free_i32(tmp);
 -}
 -
 -static void gen_neon_dup_high16(TCGv_i32 var)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_andi_i32(var, var, 0xffff0000);
 -    tcg_gen_shri_i32(tmp, var, 16);
 -    tcg_gen_or_i32(var, var, tmp);
 -    tcg_temp_free_i32(tmp);
 -}
 -
  static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
  {
- #ifndef CONFIG_USER_ONLY
+     DeviceClass *dc = DEVICE_CLASS(klass);
-@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
- #define CPU_V001 cpu_V0, cpu_V0, cpu_V1
++    rc->phases.enter = stellaris_i2c_reset_enter;
++    rc->phases.hold = stellaris_i2c_reset_hold;
--static inline void gen_neon_add(int size, TCGv_i32 t0, TCGv_i32 t1)
++    rc->phases.exit = stellaris_i2c_reset_exit;
--{
+     dc->vmsd = &vmstate_stellaris_i2c;
 -    switch (size) {
 -    case 0: gen_helper_neon_add_u8(t0, t0, t1); break;
 -    case 1: gen_helper_neon_add_u16(t0, t0, t1); break;
 -    case 2: tcg_gen_add_i32(t0, t0, t1); break;
 -    default: abort();
 -    }
 -}
 -
 -static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
 -{
 -    switch (size) {
 -    case 0: gen_helper_neon_sub_u8(t0, t1, t0); break;
 -    case 1: gen_helper_neon_sub_u16(t0, t1, t0); break;
 -    case 2: tcg_gen_sub_i32(t0, t1, t0); break;
 -    default: return;
 -    }
 -}
 -
  static TCGv_i32 neon_load_scratch(int scratch)
  {
      TCGv_i32 tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static void neon_store_scratch(int scratch, TCGv_i32 var)
      tcg_temp_free_i32(var);
  }
--static inline TCGv_i32 neon_get_scalar(int size, int reg)
--{
--    TCGv_i32 tmp;
--    if (size == 1) {
--        tmp = neon_load_reg(reg & 7, reg >> 4);
--        if (reg & 8) {
--            gen_neon_dup_high16(tmp);
--        } else {
--            gen_neon_dup_low16(tmp);
--        }
--    } else {
--        tmp = neon_load_reg(reg & 15, reg >> 4);
--    }
--    return tmp;
--}
--
- static int gen_neon_unzip(int rd, int rm, int size, int q)
- {
-     TCGv_ptr pd, pm;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     return 1;
-                 }
-                 switch (op) {
-+                case 0: /* Integer VMLA scalar */
-+                case 4: /* Integer VMLS scalar */
-+                case 8: /* Integer VMUL scalar */
-+                    return 1; /* handled by decodetree */
-+
-                 case 1: /* Float VMLA scalar */
-                 case 5: /* Floating point VMLS scalar */
-                 case 9: /* Floating point VMUL scalar */
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         return 1;
-                     }
-                     /* fall through */
--                case 0: /* Integer VMLA scalar */
--                case 4: /* Integer VMLS scalar */
--                case 8: /* Integer VMUL scalar */
-                 case 12: /* VQDMULH scalar */
-                 case 13: /* VQRDMULH scalar */
-                     if (u && ((rd | rn) & 1)) {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                             } else {
-                                 gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
-                             }
--                        } else if (op & 1) {
-+                        } else {
-                             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-                             gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
-                             tcg_temp_free_ptr(fpstatus);
--                        } else {
--                            switch (size) {
--                            case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
--                            case 1: gen_helper_neon_mul_u16(tmp, tmp, tmp2); break;
--                            case 2: tcg_gen_mul_i32(tmp, tmp, tmp2); break;
--                            default: abort();
--                            }
-                         }
-                         tcg_temp_free_i32(tmp2);
-                         if (op < 8) {
-                             /* Accumulate.  */
-                             tmp2 = neon_load_reg(rd, pass);
-                             switch (op) {
--                            case 0:
--                                gen_neon_add(size, tmp, tmp2);
--                                break;
-                             case 1:
-                             {
-                                 TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                                 tcg_temp_free_ptr(fpstatus);
-                                 break;
-                             }
--                            case 4:
--                                gen_neon_rsb(size, tmp, tmp2);
--                                break;
-                             case 5:
-                             {
-                                 TCGv_ptr fpstatus = get_fpstatus_ptr(1);
 --
-.20.1
+.34.1

-[PULL 19/23] Implement configurable descriptor size in ftgmac100
+[PULL 21/35] hw/arm/stellaris: Add missing QOM 'machine' parent
-From: Erik Smit <erik.lucas.smit@gmail.com>
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
-The hardware supports configurable descriptor sizes, configured in the DBLAC
+QDev objects created with qdev_new() need to manually add
-register.
+their parent relationship with object_property_add_child().
-Most drivers use the default 4 word descriptor, which is currently hardcoded,
+This commit plug the devices which aren't part of the SoC;
-but Aspeed SDK configures 8 words to store extra data.
+they will be plugged into a SoC container in the next one.
-Signed-off-by: Erik Smit <erik.lucas.smit@gmail.com>
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Cédric Le Goater <clg@kaod.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-[PMM: removed unnecessary parens]
+Message-id: 20240213155214.13619-4-philmd@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/net/ftgmac100.c | 26 ++++++++++++++++++++++++--
+ hw/arm/stellaris.c | 4 ++++
-file changed, 24 insertions(+), 2 deletions(-)
+file changed, 4 insertions(+)
-diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/ftgmac100.c
+--- a/hw/arm/stellaris.c
-+++ b/hw/net/ftgmac100.c
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
- #define FTGMAC100_APTC_TXPOLL_CNT(x)        (((x) >> 8) & 0xf)
+                                    &error_fatal);
- #define FTGMAC100_APTC_TXPOLL_TIME_SEL      (1 << 12)
+             ssddev = qdev_new("ssd0323");
-+/*
++            object_property_add_child(OBJECT(ms), "oled", OBJECT(ssddev));
-+ * DMA burst length and arbitration control register
+             qdev_prop_set_uint8(ssddev, "cs", 1);
-+ */
+             qdev_realize_and_unref(ssddev, bus, &error_fatal);
-+#define FTGMAC100_DBLAC_RXBURST_SIZE(x)     (((x) >> 8) & 0x3)
-+#define FTGMAC100_DBLAC_TXBURST_SIZE(x)     (((x) >> 10) & 0x3)
+             gpio_d_splitter = qdev_new(TYPE_SPLIT_IRQ);
-+#define FTGMAC100_DBLAC_RXDES_SIZE(x)       ((((x) >> 12) & 0xf) * 8)
++            object_property_add_child(OBJECT(ms), "splitter",
-+#define FTGMAC100_DBLAC_TXDES_SIZE(x)       ((((x) >> 16) & 0xf) * 8)
++                                      OBJECT(gpio_d_splitter));
-+#define FTGMAC100_DBLAC_IFG_CNT(x)          (((x) >> 20) & 0x7)
+             qdev_prop_set_uint32(gpio_d_splitter, "num-lines", 2);
-+#define FTGMAC100_DBLAC_IFG_INC             (1 << 23)
+             qdev_realize_and_unref(gpio_d_splitter, NULL, &error_fatal);
-+
+             qdev_connect_gpio_out(
- /*
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-  * PHY control register
+         DeviceState *gpad;
-  */
-@@ -XXX,XX +XXX,XX @@ static void ftgmac100_do_tx(FTGMAC100State *s, uint32_t tx_ring,
+         gpad = qdev_new(TYPE_STELLARIS_GAMEPAD);
-         if (bd.des0 & s->txdes0_edotr) {
++        object_property_add_child(OBJECT(ms), "gamepad", OBJECT(gpad));
-             addr = tx_ring;
+         for (i = 0; i < ARRAY_SIZE(gpad_keycode); i++) {
-         } else {
+             qlist_append_int(gpad_keycode_list, gpad_keycode[i]);
 -            addr += sizeof(FTGMAC100Desc);
 +            addr += FTGMAC100_DBLAC_TXDES_SIZE(s->dblac);
          }
-     }
-@@ -XXX,XX +XXX,XX @@ static void ftgmac100_write(void *opaque, hwaddr addr,
-         s->phydata = value & 0xffff;
-         break;
-     case FTGMAC100_DBLAC: /* DMA Burst Length and Arbitration Control */
-+        if (FTGMAC100_DBLAC_TXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
-+            qemu_log_mask(LOG_GUEST_ERROR,
-+                          "%s: transmit descriptor too small : %d bytes\n",
-+                          __func__, FTGMAC100_DBLAC_TXDES_SIZE(s->dblac));
-+            break;
-+        }
-+        if (FTGMAC100_DBLAC_RXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
-+            qemu_log_mask(LOG_GUEST_ERROR,
-+                          "%s: receive descriptor too small : %d bytes\n",
-+                          __func__, FTGMAC100_DBLAC_RXDES_SIZE(s->dblac));
-+            break;
-+        }
-         s->dblac = value;
-         break;
-     case FTGMAC100_REVR:  /* Feature Register */
-@@ -XXX,XX +XXX,XX @@ static ssize_t ftgmac100_receive(NetClientState *nc, const uint8_t *buf,
-         if (bd.des0 & s->rxdes0_edorr) {
-             addr = s->rx_ring;
-         } else {
--            addr += sizeof(FTGMAC100Desc);
-+            addr += FTGMAC100_DBLAC_RXDES_SIZE(s->dblac);
-         }
-     }
-     s->rx_descriptor = addr;
 --
-.20.1
+.34.1

-[PULL 11/23] target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
+[PULL 22/35] hw/arm/stellaris: Add missing QOM 'SoC' parent
-Convert the float versions of VMLA, VMLS and VMUL in the Neon
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
 -reg-scalar group to decodetree.
+QDev objects created with qdev_new() need to manually add
+their parent relationship with object_property_add_child().
+Since we don't model the SoC, just use a QOM container.
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20240213155214.13619-5-philmd@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
-As noted in the comment on the WRAP_FP_FN macro, we could have
+ hw/arm/stellaris.c | 11 ++++++++++-
-had a do_2scalar_fp() function, but for 3 insns it seemed
+file changed, 10 insertions(+), 1 deletion(-)
 simpler to just do the wrapping to get hold of the fpstatus ptr.
 (These are the only fp insns in the group.)
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/neon-dp.decode       |  3 ++
  target/arm/translate-neon.inc.c | 65 +++++++++++++++++++++++++++++++++
  target/arm/translate.c          | 37 ++-----------------
 files changed, 71 insertions(+), 34 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/hw/arm/stellaris.c
-+++ b/target/arm/neon-dp.decode
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
+      * 400fe000 system control
+      */
-     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
-+    VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
++    Object *soc_container;
+     DeviceState *gpio_dev[7], *nvic;
-     VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
+     qemu_irq gpio_in[7][8];
-+    VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
+     qemu_irq gpio_out[7][8];
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
+     flash_size = (((board->dc0 & 0xffff) + 1) << 1) * 1024;
-+    VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
+     sram_size = ((board->dc0 >> 18) + 1) * 1024;
-   ]
- }
++    soc_container = object_new("container");
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
++    object_property_add_child(OBJECT(ms), "soc", soc_container);
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
      return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
  }
 +
-+/*
+     /* Flash programming is done via the SCU, so pretend it is ROM.  */
-+ * Rather than have a float-specific version of do_2scalar just for
+     memory_region_init_rom(flash, NULL, "stellaris.flash", flash_size,
-+ * three insns, we wrap a NeonGenTwoSingleOpFn to turn it into
+                            &error_fatal);
-+ * a NeonGenTwoOpFn.
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-+ */
+      * need its sysclk output.
-+#define WRAP_FP_FN(WRAPNAME, FUNC)                              \
+      */
-+    static void WRAPNAME(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm) \
+     ssys_dev = qdev_new(TYPE_STELLARIS_SYS);
-+    {                                                           \
++    object_property_add_child(soc_container, "sys", OBJECT(ssys_dev));
-+        TCGv_ptr fpstatus = get_fpstatus_ptr(1);                \
-+        FUNC(rd, rn, rm, fpstatus);                             \
+     /*
-+        tcg_temp_free_ptr(fpstatus);                            \
+      * Most devices come preprogrammed with a MAC address in the user data.
-+    }
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-+
+     sysbus_realize_and_unref(SYS_BUS_DEVICE(ssys_dev), &error_fatal);
-+WRAP_FP_FN(gen_VMUL_F_mul, gen_helper_vfp_muls)
-+WRAP_FP_FN(gen_VMUL_F_add, gen_helper_vfp_adds)
+     nvic = qdev_new(TYPE_ARMV7M);
-+WRAP_FP_FN(gen_VMUL_F_sub, gen_helper_vfp_subs)
++    object_property_add_child(soc_container, "v7m", OBJECT(nvic));
-+
+     qdev_prop_set_uint32(nvic, "num-irq", NUM_IRQ_LINES);
-+static bool trans_VMUL_F_2sc(DisasContext *s, arg_2scalar *a)
+     qdev_prop_set_uint8(nvic, "num-prio-bits", NUM_PRIO_BITS);
-+{
+     qdev_prop_set_string(nvic, "cpu-type", ms->cpu_type);
-+    static NeonGenTwoOpFn * const opfn[] = {
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-+        NULL,
-+        NULL, /* TODO: fp16 support */
+             dev = qdev_new(TYPE_STELLARIS_GPTM);
-+        gen_VMUL_F_mul,
+             sbd = SYS_BUS_DEVICE(dev);
-+        NULL,
++            object_property_add_child(soc_container, "gptm[*]", OBJECT(dev));
-+    };
+             qdev_connect_clock_in(dev, "clk",
-+
+                                   qdev_get_clock_out(ssys_dev, "SYSCLK"));
-+    return do_2scalar(s, a, opfn[a->size], NULL);
+             sysbus_realize_and_unref(sbd, &error_fatal);
-+}
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-+
-+static bool trans_VMLA_F_2sc(DisasContext *s, arg_2scalar *a)
+     if (board->dc1 & (1 << 3)) { /* watchdog present */
-+{
+         dev = qdev_new(TYPE_LUMINARY_WATCHDOG);
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_mul,
 +        NULL,
 +    };
 +    static NeonGenTwoOpFn * const accfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_add,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 +}
 +
 +static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_mul,
 +        NULL,
 +    };
 +    static NeonGenTwoOpFn * const accfn[] = {
 +        NULL,
 +        NULL, /* TODO: fp16 support */
 +        gen_VMUL_F_sub,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  case 0: /* Integer VMLA scalar */
                  case 4: /* Integer VMLS scalar */
                  case 8: /* Integer VMUL scalar */
 -                    return 1; /* handled by decodetree */
 -
-                 case 1: /* Float VMLA scalar */
++        object_property_add_child(soc_container, "wdg", OBJECT(dev));
-                 case 5: /* Floating point VMLS scalar */
+         qdev_connect_clock_in(dev, "WDOGCLK",
-                 case 9: /* Floating point VMUL scalar */
+                               qdev_get_clock_out(ssys_dev, "SYSCLK"));
--                    if (size == 1) {
--                        return 1;
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
--                    }
+             SysBusDevice *sbd;
--                    /* fall through */
-+                    return 1; /* handled by decodetree */
+             dev = qdev_new("pl011_luminary");
-+
++            object_property_add_child(soc_container, "uart[*]", OBJECT(dev));
-                 case 12: /* VQDMULH scalar */
+             sbd = SYS_BUS_DEVICE(dev);
-                 case 13: /* VQRDMULH scalar */
+             qdev_prop_set_chr(dev, "chardev", serial_hd(i));
-                     if (u && ((rd | rn) & 1)) {
+             sysbus_realize_and_unref(sbd, &error_fatal);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-                             } else {
+         DeviceState *enet;
-                                 gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
-                             }
+         enet = qdev_new("stellaris_enet");
--                        } else if (op == 13) {
++        object_property_add_child(soc_container, "enet", OBJECT(enet));
-+                        } else {
+         if (nd) {
-                             if (size == 1) {
+             qdev_set_nic_properties(enet, nd);
-                                 gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
+         } else {
                              } else {
                                  gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
                              }
 -                        } else {
 -                            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
 -                            gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
 -                            tcg_temp_free_ptr(fpstatus);
                          }
                          tcg_temp_free_i32(tmp2);
 -                        if (op < 8) {
 -                            /* Accumulate.  */
 -                            tmp2 = neon_load_reg(rd, pass);
 -                            switch (op) {
 -                            case 1:
 -                            {
 -                                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
 -                                gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
 -                                tcg_temp_free_ptr(fpstatus);
 -                                break;
 -                            }
 -                            case 5:
 -                            {
 -                                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
 -                                gen_helper_vfp_subs(tmp, tmp2, tmp, fpstatus);
 -                                tcg_temp_free_ptr(fpstatus);
 -                                break;
 -                            }
 -                            default:
 -                                abort();
 -                            }
 -                            tcg_temp_free_i32(tmp2);
 -                        }
                          neon_store_reg(rd, pass, tmp);
                      }
                      break;
 --
-.20.1
+.34.1

-[PULL 13/23] target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
+[PULL 23/35] target/arm: Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
-Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
+We support two different encodings for the AArch32 IMPDEF
-group to decodetree.
+CBAR register -- older cores like the Cortex A9, A7, A15
 have this at 4, c15, c0, 0; newer cores like the
 Cortex A35, A53, A57 and A72 have it at 1 c15 c0 0.
 When we implemented this we picked which encoding to
 use based on whether the CPU set ARM_FEATURE_AARCH64.
 However this isn't right for three cases:
  * the qemu-system-arm 'max' CPU, which is supposed to be
    a variant on a Cortex-A57; it ought to use the same
    encoding the A57 does and which the AArch64 'max'
    exposes to AArch32 guest code
  * the Cortex-R52, which is AArch32-only but has the CBAR
    at the newer encoding (and where we incorrectly are
    not yet setting ARM_FEATURE_CBAR_RO anyway)
  * any possible future support for other v8 AArch32
    only CPUs, or for supporting "boot the CPU into
    AArch32 mode" on our existing cores like the A57 etc
 Make the decision of the encoding be based on whether
 the CPU implements the ARM_FEATURE_V8 flag instead.
 This changes the behaviour only for the qemu-system-arm
 '-cpu max'. We don't expect anybody to be relying on the
 old behaviour because:
  * it's not what the real hardware Cortex-A57 does
    (and that's what our ID register claims we are)
  * we don't implement the memory-mapped GICv3 support
    which is the only thing that exists at the peripheral
    base address pointed to by the register
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-2-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  3 ++
+ target/arm/helper.c | 2 +-
- target/arm/translate-neon.inc.c | 74 +++++++++++++++++++++++++++++++++
+file changed, 1 insertion(+), 1 deletion(-)
  target/arm/translate.c          | 38 +----------------
 files changed, 79 insertions(+), 36 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/helper.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+          * AArch64 cores we might need to add a specific feature flag
-     VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
+          * to indicate cores with "flavour 2" CBAR.
-     VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
+          */
-+
+-        if (arm_feature(env, ARM_FEATURE_AARCH64)) {
-+    VQRDMLAH_2sc 1111 001 . 1 . .. .... .... 1110 . 1 . 0 .... @2scalar
++        if (arm_feature(env, ARM_FEATURE_V8)) {
-+    VQRDMLSH_2sc 1111 001 . 1 . .. .... .... 1111 . 1 . 0 .... @2scalar
+             /* 32 bit view is [31:18] 0...0 [43:32]. */
-   ]
+             uint32_t cbar32 = (extract64(cpu->reset_cbar, 18, 14) << 18)
- }
+                 | extract64(cpu->reset_cbar, 32, 12);
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
      return do_2scalar(s, a, opfn[a->size], NULL);
  }
 +
 +static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
 +                            NeonGenThreeOpEnvFn *opfn)
 +{
 +    /*
 +     * VQRDMLAH/VQRDMLSH: this is like do_2scalar, but the opfn
 +     * performs a kind of fused op-then-accumulate using a helper
 +     * function that takes all of rd, rn and the scalar at once.
 +     */
 +    TCGv_i32 scalar;
 +    int pass;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    if (!dc_isar_feature(aa32_rdm, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!opfn) {
 +        /* Bad size (including size == 3, which is a different insn group) */
 +        return false;
 +    }
 +
 +    if (a->q && ((a->vd | a->vn) & 1)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    scalar = neon_get_scalar(a->size, a->vm);
 +
 +    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 +        TCGv_i32 rn = neon_load_reg(a->vn, pass);
 +        TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +        opfn(rd, cpu_env, rn, scalar, rd);
 +        tcg_temp_free_i32(rn);
 +        neon_store_reg(a->vd, pass, rd);
 +    }
 +    tcg_temp_free_i32(scalar);
 +
 +    return true;
 +}
 +
 +static bool trans_VQRDMLAH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenThreeOpEnvFn *opfn[] = {
 +        NULL,
 +        gen_helper_neon_qrdmlah_s16,
 +        gen_helper_neon_qrdmlah_s32,
 +        NULL,
 +    };
 +    return do_vqrdmlah_2sc(s, a, opfn[a->size]);
 +}
 +
 +static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenThreeOpEnvFn *opfn[] = {
 +        NULL,
 +        gen_helper_neon_qrdmlsh_s16,
 +        gen_helper_neon_qrdmlsh_s32,
 +        NULL,
 +    };
 +    return do_vqrdmlah_2sc(s, a, opfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  case 9: /* Floating point VMUL scalar */
                  case 12: /* VQDMULH scalar */
                  case 13: /* VQRDMULH scalar */
 +                case 14: /* VQRDMLAH scalar */
 +                case 15: /* VQRDMLSH scalar */
                      return 1; /* handled by decodetree */
                  case 3: /* VQDMLAL scalar */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          neon_store_reg64(cpu_V0, rd + pass);
                      }
                      break;
 -                case 14: /* VQRDMLAH scalar */
 -                case 15: /* VQRDMLSH scalar */
 -                    {
 -                        NeonGenThreeOpEnvFn *fn;
 -
 -                        if (!dc_isar_feature(aa32_rdm, s)) {
 -                            return 1;
 -                        }
 -                        if (u && ((rd | rn) & 1)) {
 -                            return 1;
 -                        }
 -                        if (op == 14) {
 -                            if (size == 1) {
 -                                fn = gen_helper_neon_qrdmlah_s16;
 -                            } else {
 -                                fn = gen_helper_neon_qrdmlah_s32;
 -                            }
 -                        } else {
 -                            if (size == 1) {
 -                                fn = gen_helper_neon_qrdmlsh_s16;
 -                            } else {
 -                                fn = gen_helper_neon_qrdmlsh_s32;
 -                            }
 -                        }
 -
 -                        tmp2 = neon_get_scalar(size, rm);
 -                        for (pass = 0; pass < (u ? 4 : 2); pass++) {
 -                            tmp = neon_load_reg(rn, pass);
 -                            tmp3 = neon_load_reg(rd, pass);
 -                            fn(tmp, cpu_env, tmp, tmp2, tmp3);
 -                            tcg_temp_free_i32(tmp3);
 -                            neon_store_reg(rd, pass, tmp);
 -                        }
 -                        tcg_temp_free_i32(tmp2);
 -                    }
 -                    break;
                  default:
                      g_assert_not_reached();
                  }
 --
-.20.1
+.34.1

-[PULL 09/23] target/arm: Add missing TCG temp free in do_2shift_env_64()
+[PULL 24/35] target/arm: The Cortex-R52 has a read-only CBAR
-In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
+The Cortex-R52 implements the Configuration Base Address Register
-temporary in do_2shift_env_64(); free it.
+(CBAR), as a read-only register.  Add ARM_FEATURE_CBAR_RO to this CPU
 type, so that our implementation provides the register and the
 associated qdev property.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-3-peter.maydell@linaro.org
 ---
- target/arm/translate-neon.inc.c | 1 +
+ target/arm/tcg/cpu32.c | 1 +
 file changed, 1 insertion(+)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+diff --git a/target/arm/tcg/cpu32.c b/target/arm/tcg/cpu32.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/tcg/cpu32.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/tcg/cpu32.c
-@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
+@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
-         neon_load_reg64(tmp, a->vm + pass);
+     set_feature(&cpu->env, ARM_FEATURE_PMSA);
-         fn(tmp, cpu_env, tmp, constimm);
+     set_feature(&cpu->env, ARM_FEATURE_NEON);
-         neon_store_reg64(tmp, a->vd + pass);
+     set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
-+        tcg_temp_free_i64(tmp);
++    set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
-     }
+     cpu->midr = 0x411fd133; /* r1p3 */
-     tcg_temp_free_i64(constimm);
+     cpu->revidr = 0x00000000;
-     return true;
+     cpu->reset_fpsid = 0x41034023;
 --
-.20.1
+.34.1

-[PULL 12/23] target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
+[PULL 25/35] target/arm: Add Cortex-R52 IMPDEF sysregs
-Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
+Add the Cortex-R52 IMPDEF sysregs, by defining them here and
-to decodetree.
+also by enabling the AUXCR feature which defines the ACTLR
 and HACTLR registers. As is our usual practice, we make these
 simple reads-as-zero stubs for now.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-4-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  3 +++
+ target/arm/tcg/cpu32.c | 108 +++++++++++++++++++++++++++++++++++++++++
- target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
+file changed, 108 insertions(+)
  target/arm/translate.c          | 42 ++-------------------------------
 files changed, 34 insertions(+), 40 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/tcg/cpu32.c b/target/arm/tcg/cpu32.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/tcg/cpu32.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/tcg/cpu32.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
+     define_arm_cp_regs(cpu, cortexr5_cp_reginfo);
-     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
+ }
-     VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
 +static const ARMCPRegInfo cortex_r52_cp_reginfo[] = {
 +    { .name = "CPUACTLR", .cp = 15, .opc1 = 0, .crm = 15,
 +      .access = PL1_RW, .type = ARM_CP_CONST | ARM_CP_64BIT, .resetvalue = 0 },
 +    { .name = "IMP_ATCMREGIONR",
 +      .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_BTCMREGIONR",
 +      .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_CTCMREGIONR",
 +      .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 2,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_CSCTLR",
 +      .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_BPCTLR",
 +      .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_MEMPROTCLR",
 +      .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 2,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_SLAVEPCTLR",
 +      .cp = 15, .opc1 = 0, .crn = 11, .crm = 0, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_PERIPHREGIONR",
 +      .cp = 15, .opc1 = 0, .crn = 15, .crm = 0, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_FLASHIFREGIONR",
 +      .cp = 15, .opc1 = 0, .crn = 15, .crm = 0, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_BUILDOPTR",
 +      .cp = 15, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 0,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_PINOPTR",
 +      .cp = 15, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 7,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_QOSR",
 +      .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_BUSTIMEOUTR",
 +      .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 2,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_INTMONR",
 +      .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 4,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_ICERR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 0, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_ICERR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 0, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_DCERR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 1, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_DCERR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 1, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMERR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMERR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMSYNDR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 2,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMSYNDR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 3,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_FLASHERR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 3, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_FLASHERR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 3, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDR0",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 0, .opc2 = 0,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_CBDGBR1",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 0, .opc2 = 1,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TESTR0",
 +      .cp = 15, .opc1 = 4, .crn = 15, .crm = 0, .opc2 = 0,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TESTR1",
 +      .cp = 15, .opc1 = 4, .crn = 15, .crm = 0, .opc2 = 1,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDCI",
 +      .cp = 15, .opc1 = 0, .crn = 15, .crm = 15, .opc2 = 0,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDCT",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 2, .opc2 = 0,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGICT",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 2, .opc2 = 1,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDCD",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 4, .opc2 = 0,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGICD",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 4, .opc2 = 1,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +};
 +
-+    VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
++
-+    VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
+ static void cortex_r52_initfn(Object *obj)
-   ]
+ {
      ARMCPU *cpu = ARM_CPU(obj);
@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
      set_feature(&cpu->env, ARM_FEATURE_NEON);
      set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
      set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
 +    set_feature(&cpu->env, ARM_FEATURE_AUXCR);
      cpu->midr = 0x411fd133; /* r1p3 */
      cpu->revidr = 0x00000000;
      cpu->reset_fpsid = 0x41034023;
@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
      cpu->pmsav7_dregion = 16;
      cpu->pmsav8r_hdregion = 16;
 +
 +    define_arm_cp_regs(cpu, cortex_r52_cp_reginfo);
  }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
+ static void cortex_r5f_initfn(Object *obj)
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
      return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
  }
 +
 +WRAP_ENV_FN(gen_VQDMULH_16, gen_helper_neon_qdmulh_s16)
 +WRAP_ENV_FN(gen_VQDMULH_32, gen_helper_neon_qdmulh_s32)
 +WRAP_ENV_FN(gen_VQRDMULH_16, gen_helper_neon_qrdmulh_s16)
 +WRAP_ENV_FN(gen_VQRDMULH_32, gen_helper_neon_qrdmulh_s32)
 +
 +static bool trans_VQDMULH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULH_16,
 +        gen_VQDMULH_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_VQRDMULH_16,
 +        gen_VQRDMULH_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
  #define CPU_V001 cpu_V0, cpu_V0, cpu_V1
 -static TCGv_i32 neon_load_scratch(int scratch)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
 -    return tmp;
 -}
 -
 -static void neon_store_scratch(int scratch, TCGv_i32 var)
 -{
 -    tcg_gen_st_i32(var, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
 -    tcg_temp_free_i32(var);
 -}
 -
  static int gen_neon_unzip(int rd, int rm, int size, int q)
  {
      TCGv_ptr pd, pm;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  case 1: /* Float VMLA scalar */
                  case 5: /* Floating point VMLS scalar */
                  case 9: /* Floating point VMUL scalar */
 -                    return 1; /* handled by decodetree */
 -
                  case 12: /* VQDMULH scalar */
                  case 13: /* VQRDMULH scalar */
 -                    if (u && ((rd | rn) & 1)) {
 -                        return 1;
 -                    }
 -                    tmp = neon_get_scalar(size, rm);
 -                    neon_store_scratch(0, tmp);
 -                    for (pass = 0; pass < (u ? 4 : 2); pass++) {
 -                        tmp = neon_load_scratch(0);
 -                        tmp2 = neon_load_reg(rn, pass);
 -                        if (op == 12) {
 -                            if (size == 1) {
 -                                gen_helper_neon_qdmulh_s16(tmp, cpu_env, tmp, tmp2);
 -                            } else {
 -                                gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
 -                            }
 -                        } else {
 -                            if (size == 1) {
 -                                gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
 -                            } else {
 -                                gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
 -                            }
 -                        }
 -                        tcg_temp_free_i32(tmp2);
 -                        neon_store_reg(rd, pass, tmp);
 -                    }
 -                    break;
 +                    return 1; /* handled by decodetree */
 +
                  case 3: /* VQDMLAL scalar */
                  case 7: /* VQDMLSL scalar */
                  case 11: /* VQDMULL scalar */
 --
-.20.1
+.34.1

-[PULL 22/23] sd: sdhci: Implement basic vendor specific register support
+[PULL 26/35] target/arm: Allow access to SPSR_hyp from hyp mode
-From: Guenter Roeck <linux@roeck-us.net>
+Architecturally, the AArch32 MSR/MRS to/from banked register
 instructions are UNPREDICTABLE for attempts to access a banked
 register that the guest could access in a more direct way (e.g.
 using this insn to access r8_fiq when already in FIQ mode).  QEMU has
 chosen to UNDEF on all of these.
-The Linux kernel's IMX code now uses vendor specific commands.
+However, for the case of accessing SPSR_hyp from hyp mode, it turns
-This results in endless warnings when booting the Linux kernel.
+out that real hardware permits this, with the same effect as if the
 guest had directly written to SPSR. Further, there is some
 guest code out there that assumes it can do this, because it
 happens to work on hardware: an example Cortex-R52 startup code
 fragment uses this, and it got copied into various other places,
 including Zephyr. Zephyr was fixed to not use this:
  https://github.com/zephyrproject-rtos/zephyr/issues/47330
 but other examples are still out there, like the selftest
 binary for the MPS3-AN536.
-sdhci-esdhc-imx 2194000.usdhc: esdhc_wait_for_card_clock_gate_off:
+For convenience of being able to run guest code, permit
-    card clock still not gate off in 100us!.
+this UNPREDICTABLE access instead of UNDEFing it.
-Implement support for the vendor specific command implemented in IMX hardware
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-to be able to avoid this warning.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20240206132931.38376-5-peter.maydell@linaro.org
 ---
  target/arm/tcg/op_helper.c | 43 ++++++++++++++++++++++++++------------
  target/arm/tcg/translate.c | 19 +++++++++++------
 files changed, 43 insertions(+), 19 deletions(-)
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+diff --git a/target/arm/tcg/op_helper.c b/target/arm/tcg/op_helper.c
 Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Guenter Roeck <linux@roeck-us.net>
 Message-id: 20200603145258.195920-2-linux@roeck-us.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/sd/sdhci-internal.h |  5 +++++
  include/hw/sd/sdhci.h  |  5 +++++
  hw/sd/sdhci.c          | 18 +++++++++++++++++-
 files changed, 27 insertions(+), 1 deletion(-)
 diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/sd/sdhci-internal.h
+--- a/target/arm/tcg/op_helper.c
-+++ b/hw/sd/sdhci-internal.h
++++ b/target/arm/tcg/op_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void msr_mrs_banked_exc_checks(CPUARMState *env, uint32_t tgtmode,
- #define SDHC_CMD_INHIBIT               0x00000001
+      */
- #define SDHC_DATA_INHIBIT              0x00000002
+     int curmode = env->uncached_cpsr & CPSR_M;
- #define SDHC_DAT_LINE_ACTIVE           0x00000004
-+#define SDHC_IMX_CLOCK_GATE_OFF        0x00000080
+-    if (regno == 17) {
- #define SDHC_DOING_WRITE               0x00000100
+-        /* ELR_Hyp: a special case because access from tgtmode is OK */
- #define SDHC_DOING_READ                0x00000200
+-        if (curmode != ARM_CPU_MODE_HYP && curmode != ARM_CPU_MODE_MON) {
- #define SDHC_SPACE_AVAILABLE           0x00000400
+-            goto undef;
-@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
++    if (tgtmode == ARM_CPU_MODE_HYP) {
++        /*
++         * Handle Hyp target regs first because some are special cases
- #define ESDHC_MIX_CTRL                  0x48
++         * which don't want the usual "not accessible from tgtmode" check.
-+
++         */
- #define ESDHC_VENDOR_SPEC               0xc0
++        switch (regno) {
-+#define ESDHC_IMX_FRC_SDCLK_ON          (1 << 8)
++        case 16 ... 17: /* ELR_Hyp, SPSR_Hyp */
-+
++            if (curmode != ARM_CPU_MODE_HYP && curmode != ARM_CPU_MODE_MON) {
- #define ESDHC_DLL_CTRL                  0x60
++                goto undef;
++            }
- #define ESDHC_TUNING_CTRL               0xcc
++            break;
-@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
++        case 13:
- #define DEFINE_SDHCI_COMMON_PROPERTIES(_state) \
++            if (curmode != ARM_CPU_MODE_MON) {
-     DEFINE_PROP_UINT8("sd-spec-version", _state, sd_spec_version, 2), \
++                goto undef;
      DEFINE_PROP_UINT8("uhs", _state, uhs_mode, UHS_NOT_SUPPORTED), \
 +    DEFINE_PROP_UINT8("vendor", _state, vendor, SDHCI_VENDOR_NONE), \
      \
      /* Capabilities registers provide information on supported
       * features of this specific host controller implementation */ \
 diff --git a/include/hw/sd/sdhci.h b/include/hw/sd/sdhci.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/sd/sdhci.h
 +++ b/include/hw/sd/sdhci.h
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
      uint16_t acmd12errsts; /* Auto CMD12 error status register */
      uint16_t hostctl2;     /* Host Control 2 */
      uint64_t admasysaddr;  /* ADMA System Address Register */
 +    uint16_t vendor_spec;  /* Vendor specific register */
      /* Read-only registers */
      uint64_t capareg;      /* Capabilities Register */
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
      uint32_t quirks;
      uint8_t sd_spec_version;
      uint8_t uhs_mode;
 +    uint8_t vendor;        /* For vendor specific functionality */
  } SDHCIState;
 +#define SDHCI_VENDOR_NONE       0
 +#define SDHCI_VENDOR_IMX        1
 +
  /*
   * Controller does not provide transfer-complete interrupt when not
   * busy.
 diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/sd/sdhci.c
 +++ b/hw/sd/sdhci.c
@@ -XXX,XX +XXX,XX @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
          }
          break;
 +    case ESDHC_VENDOR_SPEC:
 +        ret = s->vendor_spec;
 +        break;
      case ESDHC_DLL_CTRL:
      case ESDHC_TUNE_CTRL_STATUS:
      case ESDHC_UNDOCUMENTED_REG27:
      case ESDHC_TUNING_CTRL:
 -    case ESDHC_VENDOR_SPEC:
      case ESDHC_MIX_CTRL:
      case ESDHC_WTMK_LVL:
          ret = 0;
@@ -XXX,XX +XXX,XX @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
      case ESDHC_UNDOCUMENTED_REG27:
      case ESDHC_TUNING_CTRL:
      case ESDHC_WTMK_LVL:
 +        break;
 +
      case ESDHC_VENDOR_SPEC:
 +        s->vendor_spec = value;
 +        switch (s->vendor) {
 +        case SDHCI_VENDOR_IMX:
 +            if (value & ESDHC_IMX_FRC_SDCLK_ON) {
 +                s->prnsts &= ~SDHC_IMX_CLOCK_GATE_OFF;
 +            } else {
 +                s->prnsts |= SDHC_IMX_CLOCK_GATE_OFF;
 +            }
 +            break;
 +        default:
-+            break;
++            g_assert_not_reached();
          }
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void msr_mrs_banked_exc_checks(CPUARMState *env, uint32_t tgtmode,
          }
      }
 -    if (tgtmode == ARM_CPU_MODE_HYP) {
 -        /* SPSR_Hyp, r13_hyp: accessible from Monitor mode only */
 -        if (curmode != ARM_CPU_MODE_MON) {
 -            goto undef;
 -        }
 -    }
 -
      return;
  undef:
@@ -XXX,XX +XXX,XX @@ void HELPER(msr_banked)(CPUARMState *env, uint32_t value, uint32_t tgtmode,
      switch (regno) {
      case 16: /* SPSRs */
 -        env->banked_spsr[bank_number(tgtmode)] = value;
 +        if (tgtmode == (env->uncached_cpsr & CPSR_M)) {
 +            /* Only happens for SPSR_Hyp access in Hyp mode */
 +            env->spsr = value;
 +        } else {
 +            env->banked_spsr[bank_number(tgtmode)] = value;
 +        }
          break;
+     case 17: /* ELR_Hyp */
-     case SDHC_HOSTCTL:
+         env->elr_el[2] = value;
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mrs_banked)(CPUARMState *env, uint32_t tgtmode, uint32_t regno)
      switch (regno) {
      case 16: /* SPSRs */
 -        return env->banked_spsr[bank_number(tgtmode)];
 +        if (tgtmode == (env->uncached_cpsr & CPSR_M)) {
 +            /* Only happens for SPSR_Hyp access in Hyp mode */
 +            return env->spsr;
 +        } else {
 +            return env->banked_spsr[bank_number(tgtmode)];
 +        }
      case 17: /* ELR_Hyp */
          return env->elr_el[2];
      case 13:
 diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate.c
 +++ b/target/arm/tcg/translate.c
@@ -XXX,XX +XXX,XX @@ static bool msr_banked_access_decode(DisasContext *s, int r, int sysm, int rn,
          break;
      case ARM_CPU_MODE_HYP:
          /*
 -         * SPSR_hyp and r13_hyp can only be accessed from Monitor mode
 -         * (and so we can forbid accesses from EL2 or below). elr_hyp
 -         * can be accessed also from Hyp mode, so forbid accesses from
 -         * EL0 or EL1.
 +         * r13_hyp can only be accessed from Monitor mode, and so we
 +         * can forbid accesses from EL2 or below.
 +         * elr_hyp can be accessed also from Hyp mode, so forbid
 +         * accesses from EL0 or EL1.
 +         * SPSR_hyp is supposed to be in the same category as r13_hyp
 +         * and UNPREDICTABLE if accessed from anything except Monitor
 +         * mode. However there is some real-world code that will do
 +         * it because at least some hardware happens to permit the
 +         * access. (Notably a standard Cortex-R52 startup code fragment
 +         * does this.) So we permit SPSR_hyp from Hyp mode also, to allow
 +         * this (incorrect) guest code to run.
           */
 -        if (!arm_dc_feature(s, ARM_FEATURE_EL2) || s->current_el < 2 ||
 -            (s->current_el < 3 && *regno != 17)) {
 +        if (!arm_dc_feature(s, ARM_FEATURE_EL2) || s->current_el < 2
 +            || (s->current_el < 3 && *regno != 16 && *regno != 17)) {
              goto undef;
          }
          break;
 --
-.20.1
+.34.1

-[PULL 17/23] target/arm: Convert Neon VDUP (scalar) to decodetree
+[PULL 27/35] hw/misc/mps2-scc: Fix condition for CFG3 register
-Convert the Neon VDUP (scalar) insn to decodetree.  (Note that we
+We currently guard the CFG3 register read with
-can't call this just "VDUP" as we used that already in vfp.decode for
+ (scc_partno(s) == 0x524 && scc_partno(s) == 0x547)
-the "VDUP (general purpose register" insn.)
+which is clearly wrong as it is never true.
+This register is present on all board types except AN524
+and AN527; correct the condition.
+Fixes: 6ac80818941829c0 ("hw/misc/mps2-scc: Implement changes for AN547")
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-6-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  7 +++++++
+ hw/misc/mps2-scc.c | 2 +-
- target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
+file changed, 1 insertion(+), 1 deletion(-)
  target/arm/translate.c          | 25 +------------------------
 files changed, 34 insertions(+), 24 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/hw/misc/mps2-scc.c
-+++ b/target/arm/neon-dp.decode
++++ b/hw/misc/mps2-scc.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
+         r = s->cfg2;
-     VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
+         break;
-                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+     case A_CFG3:
-+
+-        if (scc_partno(s) == 0x524 && scc_partno(s) == 0x547) {
-+    VDUP_scalar  1111 001 1 1 . 11 index:3 1 .... 11 000 q:1 . 0 .... \
++        if (scc_partno(s) == 0x524 || scc_partno(s) == 0x547) {
-+                 vm=%vm_dp vd=%vd_dp size=0
+             /* CFG3 reserved on AN524 */
-+    VDUP_scalar  1111 001 1 1 . 11 index:2 10 .... 11 000 q:1 . 0 .... \
+             goto bad_offset;
 +                 vm=%vm_dp vd=%vd_dp size=1
 +    VDUP_scalar  1111 001 1 1 . 11 index:1 100 .... 11 000 q:1 . 0 .... \
 +                 vm=%vm_dp vd=%vd_dp size=2
    ]
    # Subgroup for size != 0b11
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
      tcg_temp_free_i32(tmp);
      return true;
  }
 +
 +static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
 +{
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (a->vd & a->q) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
 +                         neon_element_offset(a->vm, a->index, a->size),
 +                         a->q ? 16 : 8, a->q ? 16 : 8);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      }
                      break;
                  }
 -            } else if ((insn & (1 << 10)) == 0) {
 -                /* VTBL, VTBX: handled by decodetree */
 -                return 1;
 -            } else if ((insn & 0x380) == 0) {
 -                /* VDUP */
 -                int element;
 -                MemOp size;
 -
 -                if ((insn & (7 << 16)) == 0 || (q && (rd & 1))) {
 -                    return 1;
 -                }
 -                if (insn & (1 << 16)) {
 -                    size = MO_8;
 -                    element = (insn >> 17) & 7;
 -                } else if (insn & (1 << 17)) {
 -                    size = MO_16;
 -                    element = (insn >> 18) & 3;
 -                } else {
 -                    size = MO_32;
 -                    element = (insn >> 19) & 1;
 -                }
 -                tcg_gen_gvec_dup_mem(size, neon_reg_offset(rd, 0),
 -                                     neon_element_offset(rm, element, size),
 -                                     q ? 16 : 8, q ? 16 : 8);
              } else {
 +                /* VTBL, VTBX, VDUP: handled by decodetree */
                  return 1;
              }
          }
 --
-.20.1
+.34.1

-[PULL 05/23] target/arm: Convert Neon 3-reg-diff long multiplies
+[PULL 28/35] hw/misc/mps2-scc: Factor out which-board conditionals
-Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
+The MPS SCC device has a lot of different flavours for the various
-a 32x32->64 multiply with possible accumulate.
+different MPS FPGA images, which look mostly similar but have
 differences in how particular registers are handled.  Currently we
 deal with this with a lot of open-coded checks on scc_partno(), but
 as we add more board types this is getting a bit hard to read.
-Note that for VMLSL we do the accumulate directly with a subtraction
+Factor out the conditions into some functions which we can
-rather than doing a negate-then-add as the old code did.
+give more descriptive names to.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-7-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  9 +++++
+ hw/misc/mps2-scc.c | 45 +++++++++++++++++++++++++++++++--------------
- target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
+file changed, 31 insertions(+), 14 deletions(-)
  target/arm/translate.c          | 21 +++-------
 files changed, 86 insertions(+), 15 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/hw/misc/mps2-scc.c
-+++ b/target/arm/neon-dp.decode
++++ b/hw/misc/mps2-scc.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static int scc_partno(MPS2SCC *s)
+     return extract32(s->id, 4, 8);
      VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
      VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
 +
 +    VMLAL_S_3d   1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
 +    VMLAL_U_3d   1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
 +
 +    VMLSL_S_3d   1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
 +    VMLSL_U_3d   1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
 +
 +    VMULL_S_3d   1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
 +    VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
    ]
  }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
++/* Is CFG_REG2 present? */
---- a/target/arm/translate-neon.inc.c
++static bool have_cfg2(MPS2SCC *s)
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
      return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
  }
 +
 +static void gen_mull_s32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
 +{
-+    TCGv_i32 lo = tcg_temp_new_i32();
++    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
 +    TCGv_i32 hi = tcg_temp_new_i32();
 +
 +    tcg_gen_muls2_i32(lo, hi, rn, rm);
 +    tcg_gen_concat_i32_i64(rd, lo, hi);
 +
 +    tcg_temp_free_i32(lo);
 +    tcg_temp_free_i32(hi);
 +}
 +
-+static void gen_mull_u32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
++/* Is CFG_REG3 present? */
 +static bool have_cfg3(MPS2SCC *s)
 +{
-+    TCGv_i32 lo = tcg_temp_new_i32();
++    return scc_partno(s) != 0x524 && scc_partno(s) != 0x547;
 +    TCGv_i32 hi = tcg_temp_new_i32();
 +
 +    tcg_gen_mulu2_i32(lo, hi, rn, rm);
 +    tcg_gen_concat_i32_i64(rd, lo, hi);
 +
 +    tcg_temp_free_i32(lo);
 +    tcg_temp_free_i32(hi);
 +}
 +
-+static bool trans_VMULL_S_3d(DisasContext *s, arg_3diff *a)
++/* Is CFG_REG5 present? */
 +static bool have_cfg5(MPS2SCC *s)
 +{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
 +        gen_helper_neon_mull_s8,
 +        gen_helper_neon_mull_s16,
 +        gen_mull_s32,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], NULL);
 +}
 +
-+static bool trans_VMULL_U_3d(DisasContext *s, arg_3diff *a)
++/* Is CFG_REG6 present? */
 +static bool have_cfg6(MPS2SCC *s)
 +{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++    return scc_partno(s) == 0x524;
 +        gen_helper_neon_mull_u8,
 +        gen_helper_neon_mull_u16,
 +        gen_mull_u32,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], NULL);
 +}
 +
-+#define DO_VMLAL(INSN,MULL,ACC)                                         \
+ /* Handle a write via the SYS_CFG channel to the specified function/device.
-+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+  * Return false on error (reported to guest via SYS_CFGCTRL ERROR bit).
-+    {                                                                   \
+  */
-+        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
+@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
-+            gen_helper_neon_##MULL##8,                                  \
+         r = s->cfg1;
-+            gen_helper_neon_##MULL##16,                                 \
+         break;
-+            gen_##MULL##32,                                             \
+     case A_CFG2:
-+            NULL,                                                       \
+-        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
-+        };                                                              \
+-            /* CFG2 reserved on other boards */
-+        static NeonGenTwo64OpFn * const accfn[] = {                     \
++        if (!have_cfg2(s)) {
-+            gen_helper_neon_##ACC##l_u16,                               \
+             goto bad_offset;
-+            gen_helper_neon_##ACC##l_u32,                               \
+         }
-+            tcg_gen_##ACC##_i64,                                        \
+         r = s->cfg2;
-+            NULL,                                                       \
+         break;
-+        };                                                              \
+     case A_CFG3:
-+        return do_long_3d(s, a, opfn[a->size], accfn[a->size]);         \
+-        if (scc_partno(s) == 0x524 || scc_partno(s) == 0x547) {
-+    }
+-            /* CFG3 reserved on AN524 */
-+
++        if (!have_cfg3(s)) {
-+DO_VMLAL(VMLAL_S,mull_s,add)
+             goto bad_offset;
-+DO_VMLAL(VMLAL_U,mull_u,add)
+         }
-+DO_VMLAL(VMLSL_S,mull_s,sub)
+         /* These are user-settable DIP switches on the board. We don't
-+DO_VMLAL(VMLSL_U,mull_u,sub)
+@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+         r = s->cfg4;
-index XXXXXXX..XXXXXXX 100644
+         break;
---- a/target/arm/translate.c
+     case A_CFG5:
-+++ b/target/arm/translate.c
+-        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+-            /* CFG5 reserved on other boards */
-                     {0, 0, 0, 7}, /* VABAL */
++        if (!have_cfg5(s)) {
-                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
+             goto bad_offset;
-                     {0, 0, 0, 7}, /* VABDL */
+         }
--                    {0, 0, 0, 0}, /* VMLAL */
+         r = s->cfg5;
-+                    {0, 0, 0, 7}, /* VMLAL */
+         break;
-                     {0, 0, 0, 9}, /* VQDMLAL */
+     case A_CFG6:
--                    {0, 0, 0, 0}, /* VMLSL */
+-        if (scc_partno(s) != 0x524) {
-+                    {0, 0, 0, 7}, /* VMLSL */
+-            /* CFG6 reserved on other boards */
-                     {0, 0, 0, 9}, /* VQDMLSL */
++        if (!have_cfg6(s)) {
--                    {0, 0, 0, 0}, /* Integer VMULL */
+             goto bad_offset;
-+                    {0, 0, 0, 7}, /* Integer VMULL */
+         }
-                     {0, 0, 0, 9}, /* VQDMULL */
+         r = s->cfg6;
-                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
+@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
-                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
+         }
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+         break;
-                         tmp2 = neon_load_reg(rm, pass);
+     case A_CFG2:
-                     }
+-        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
-                     switch (op) {
+-            /* CFG2 reserved on other boards */
--                    case 8: case 9: case 10: case 11: case 12: case 13:
++        if (!have_cfg2(s)) {
--                        /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
+             goto bad_offset;
-+                    case 9: case 11: case 13:
+         }
-+                        /* VQDMLAL, VQDMLSL, VQDMULL */
+         /* AN524: QSPI Select signal */
-                         gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
+         s->cfg2 = value;
-                         break;
+         break;
-                     default: /* 15 is RESERVED: caught earlier  */
+     case A_CFG5:
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+-        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
-                         /* VQDMULL */
+-            /* CFG5 reserved on other boards */
-                         gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
++        if (!have_cfg5(s)) {
-                         neon_store_reg64(cpu_V0, rd + pass);
+             goto bad_offset;
--                    } else if (op == 5 || (op >= 8 && op <= 11)) {
+         }
-+                    } else {
+         /* AN524: ACLK frequency in Hz */
-                         /* Accumulate.  */
+         s->cfg5 = value;
-                         neon_load_reg64(cpu_V1, rd + pass);
+         break;
-                         switch (op) {
+     case A_CFG6:
--                        case 10: /* VMLSL */
+-        if (scc_partno(s) != 0x524) {
--                            gen_neon_negl(cpu_V0, size);
+-            /* CFG6 reserved on other boards */
--                            /* Fall through */
++        if (!have_cfg6(s)) {
--                        case 8: /* VABAL, VMLAL */
+             goto bad_offset;
--                            gen_neon_addl(size);
+         }
--                            break;
+         /* AN524: Clock divider for BRAM */
                          case 9: case 11: /* VQDMLAL, VQDMLSL */
                              gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
                              if (op == 11) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              abort();
                          }
                          neon_store_reg64(cpu_V0, rd + pass);
 -                    } else {
 -                        /* Write back the result.  */
 -                        neon_store_reg64(cpu_V0, rd + pass);
                      }
                  }
              } else {
 --
-.20.1
+.34.1

-[PULL 08/23] target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
+[PULL 29/35] hw/misc/mps2-scc: Make changes needed for AN536 FPGA image
-Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
+The MPS2 SCC device is broadly the same for all FPGA images, but has
-trans_VSHLL_U_2sh() as both 'static' and 'const'.
+minor differences in the behaviour of the CFG registers depending on
 the image. In many cases we don't really care about the functionality
 controlled by these registers and a reads-as-written or similar
 behaviour is sufficient for the moment.
 For the AN536 the required behaviour is:
  * A_CFG0 has CPU reset and halt bits
     - implement as reads-as-written for the moment
  * A_CFG1 has flash or ATCM address 0 remap handling
     - QEMU doesn't model this; implement as reads-as-written
  * A_CFG2 has QSPI select (like AN524)
     - implemented (no behaviour, as with AN524)
  * A_CFG3 is MCC_MSB_ADDR "additional MCC addressing bits"
     - QEMU doesn't care about these, so use the existing
       RAZ behaviour for convenience
  * A_CFG4 is board rev (like all other images)
     - no change needed
  * A_CFG5 is ACLK frq in hz (like AN524)
     - implemented as reads-as-written, as for other boards
  * A_CFG6 is core 0 vector table base address
     - implemented as reads-as-written for the moment
  * A_CFG7 is core 1 vector table base address
     - implemented as reads-as-written for the moment
 Make the changes necessary for this; leave TODO comments where
 appropriate to indicate where we might want to come back and
 implement things like CPU reset.
 The other aspects of the device specific to this FPGA image (like the
 values of the board ID and similar registers) will be set via the
 device's qdev properties.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20240206132931.38376-8-peter.maydell@linaro.org
 ---
- target/arm/translate-neon.inc.c | 4 ++--
+ include/hw/misc/mps2-scc.h |   1 +
-file changed, 2 insertions(+), 2 deletions(-)
+ hw/misc/mps2-scc.c         | 101 +++++++++++++++++++++++++++++++++----
+files changed, 92 insertions(+), 10 deletions(-)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/include/hw/misc/mps2-scc.h
-+++ b/target/arm/translate-neon.inc.c
++++ b/include/hw/misc/mps2-scc.h
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
+@@ -XXX,XX +XXX,XX @@ struct MPS2SCC {
+     uint32_t cfg4;
- static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
+     uint32_t cfg5;
- {
+     uint32_t cfg6;
--    NeonGenWidenFn *widenfn[] = {
++    uint32_t cfg7;
-+    static NeonGenWidenFn * const widenfn[] = {
+     uint32_t cfgdata_rtn;
-         gen_helper_neon_widen_s8,
+     uint32_t cfgdata_out;
-         gen_helper_neon_widen_s16,
+     uint32_t cfgctrl;
-         tcg_gen_ext_i32_i64,
+diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/misc/mps2-scc.c
- static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
++++ b/hw/misc/mps2-scc.c
- {
+@@ -XXX,XX +XXX,XX @@ REG32(CFG3, 0xc)
--    NeonGenWidenFn *widenfn[] = {
+ REG32(CFG4, 0x10)
-+    static NeonGenWidenFn * const widenfn[] = {
+ REG32(CFG5, 0x14)
-         gen_helper_neon_widen_u8,
+ REG32(CFG6, 0x18)
-         gen_helper_neon_widen_u16,
++REG32(CFG7, 0x1c)
-         tcg_gen_extu_i32_i64,
+ REG32(CFGDATA_RTN, 0xa0)
  REG32(CFGDATA_OUT, 0xa4)
  REG32(CFGCTRL, 0xa8)
@@ -XXX,XX +XXX,XX @@ static int scc_partno(MPS2SCC *s)
  /* Is CFG_REG2 present? */
  static bool have_cfg2(MPS2SCC *s)
  {
 -    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
 +    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547 ||
 +        scc_partno(s) == 0x536;
  }
  /* Is CFG_REG3 present? */
  static bool have_cfg3(MPS2SCC *s)
  {
 -    return scc_partno(s) != 0x524 && scc_partno(s) != 0x547;
 +    return scc_partno(s) != 0x524 && scc_partno(s) != 0x547 &&
 +        scc_partno(s) != 0x536;
  }
  /* Is CFG_REG5 present? */
  static bool have_cfg5(MPS2SCC *s)
  {
 -    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
 +    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547 ||
 +        scc_partno(s) == 0x536;
  }
  /* Is CFG_REG6 present? */
  static bool have_cfg6(MPS2SCC *s)
  {
 -    return scc_partno(s) == 0x524;
 +    return scc_partno(s) == 0x524 || scc_partno(s) == 0x536;
 +}
 +
 +/* Is CFG_REG7 present? */
 +static bool have_cfg7(MPS2SCC *s)
 +{
 +    return scc_partno(s) == 0x536;
 +}
 +
 +/* Does CFG_REG0 drive the 'remap' GPIO output? */
 +static bool cfg0_is_remap(MPS2SCC *s)
 +{
 +    return scc_partno(s) != 0x536;
 +}
 +
 +/* Is CFG_REG1 driving a set of LEDs? */
 +static bool cfg1_is_leds(MPS2SCC *s)
 +{
 +    return scc_partno(s) != 0x536;
  }
  /* Handle a write via the SYS_CFG channel to the specified function/device.
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
          if (!have_cfg3(s)) {
              goto bad_offset;
          }
 -        /* These are user-settable DIP switches on the board. We don't
 +        /*
 +         * These are user-settable DIP switches on the board. We don't
           * model that, so just return zeroes.
 +         *
 +         * TODO: for AN536 this is MCC_MSB_ADDR "additional MCC addressing
 +         * bits". These change which part of the DDR4 the motherboard
 +         * configuration controller can see in its memory map (see the
 +         * appnote section 2.4). QEMU doesn't model the MCC at all, so these
 +         * bits are not interesting to us; read-as-zero is as good as anything
 +         * else.
           */
          r = 0;
          break;
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
          }
          r = s->cfg6;
          break;
 +    case A_CFG7:
 +        if (!have_cfg7(s)) {
 +            goto bad_offset;
 +        }
 +        r = s->cfg7;
 +        break;
      case A_CFGDATA_RTN:
          r = s->cfgdata_rtn;
          break;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
           * we always reflect bit 0 in the 'remap' GPIO output line,
           * and let the board wire it up or not as it chooses.
           * TODO on some boards bit 1 is CPU_WAIT.
 +         *
 +         * TODO: on the AN536 this register controls reset and halt
 +         * for both CPUs. For the moment we don't implement this, so the
 +         * register just reads as written.
           */
          s->cfg0 = value;
 -        qemu_set_irq(s->remap, s->cfg0 & 1);
 +        if (cfg0_is_remap(s)) {
 +            qemu_set_irq(s->remap, s->cfg0 & 1);
 +        }
          break;
      case A_CFG1:
          s->cfg1 = value;
 -        for (size_t i = 0; i < ARRAY_SIZE(s->led); i++) {
 -            led_set_state(s->led[i], extract32(value, i, 1));
 +        /*
 +         * On most boards this register drives LEDs.
 +         *
 +         * TODO: for AN536 this controls whether flash and ATCM are
 +         * enabled or disabled on reset. QEMU doesn't model this, and
 +         * always wires up RAM in the ATCM area and ROM in the flash area.
 +         */
 +        if (cfg1_is_leds(s)) {
 +            for (size_t i = 0; i < ARRAY_SIZE(s->led); i++) {
 +                led_set_state(s->led[i], extract32(value, i, 1));
 +            }
          }
          break;
      case A_CFG2:
          if (!have_cfg2(s)) {
              goto bad_offset;
          }
 -        /* AN524: QSPI Select signal */
 +        /* AN524, AN536: QSPI Select signal */
          s->cfg2 = value;
          break;
      case A_CFG5:
          if (!have_cfg5(s)) {
              goto bad_offset;
          }
 -        /* AN524: ACLK frequency in Hz */
 +        /* AN524, AN536: ACLK frequency in Hz */
          s->cfg5 = value;
          break;
      case A_CFG6:
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
              goto bad_offset;
          }
          /* AN524: Clock divider for BRAM */
 +        /* AN536: Core 0 vector table base address */
 +        s->cfg6 = value;
 +        break;
 +    case A_CFG7:
 +        if (!have_cfg7(s)) {
 +            goto bad_offset;
 +        }
 +        /* AN536: Core 1 vector table base address */
          s->cfg6 = value;
          break;
      case A_CFGDATA_OUT:
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_finalize(Object *obj)
      g_free(s->oscclk_reset);
  }
 +static bool cfg7_needed(void *opaque)
 +{
 +    MPS2SCC *s = opaque;
 +
 +    return have_cfg7(s);
 +}
 +
 +static const VMStateDescription vmstate_cfg7 = {
 +    .name = "mps2-scc/cfg7",
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .needed = cfg7_needed,
 +    .fields = (const VMStateField[]) {
 +        VMSTATE_UINT32(cfg7, MPS2SCC),
 +        VMSTATE_END_OF_LIST()
 +    }
 +};
 +
  static const VMStateDescription mps2_scc_vmstate = {
      .name = "mps2-scc",
      .version_id = 3,
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription mps2_scc_vmstate = {
          VMSTATE_VARRAY_UINT32(oscclk, MPS2SCC, num_oscclk,
 , vmstate_info_uint32, uint32_t),
          VMSTATE_END_OF_LIST()
 +    },
 +    .subsections = (const VMStateDescription * const []) {
 +        &vmstate_cfg7,
 +        NULL
      }
  };
 --
-.20.1
+.34.1

-[PULL 14/23] target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
+[PULL 30/35] hw/arm/mps3r: Initial skeleton for mps3-an536 board
-Convert the Neon 2-reg-scalar long multiplies to decodetree.
+The AN536 is another FPGA image for the MPS3 development board. Unlike
-These are the last instructions in the group.
+the existing FPGA images we already model, this board uses a Cortex-R
 family CPU, and it does not use any equivalent to the M-profile
 "Subsystem for Embedded" SoC-equivalent that we model in hw/arm/armsse.c.
 It's therefore more convenient for us to model it as a completely
 separate C file.
 This commit adds the basic skeleton of the board model, and the
 code to create all the RAM and ROM. We assume that we're probably
 going to want to add more images in future, so use the same
 base class/subclass setup that mps2-tz.c uses, even though at
 the moment there's only a single subclass.
 Following commits will add the CPUs and the peripherals.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206132931.38376-9-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  18 ++++
+ MAINTAINERS                             |   3 +-
- target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
+ configs/devices/arm-softmmu/default.mak |   1 +
- target/arm/translate.c          | 182 ++------------------------------
+ hw/arm/mps3r.c                          | 239 ++++++++++++++++++++++++
-files changed, 187 insertions(+), 176 deletions(-)
+ hw/arm/Kconfig                          |   5 +
+ hw/arm/meson.build                      |   1 +
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+files changed, 248 insertions(+), 1 deletion(-)
  create mode 100644 hw/arm/mps3r.c
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/MAINTAINERS
-+++ b/target/arm/neon-dp.decode
++++ b/MAINTAINERS
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ F: include/hw/misc/imx7_*.h
+ F: hw/pci-host/designware.c
-     @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
+ F: include/hw/pci-host/designware.h
-                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+    # For the 'long' ops the Q bit is part of insn decode
+-MPS2
-+    @2scalar_q0  .... ... . . . size:2 .... .... .... . . . . .... \
++MPS2 / MPS3
-+                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
+ M: Peter Maydell <peter.maydell@linaro.org>
+ L: qemu-arm@nongnu.org
-     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
+ S: Maintained
-     VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
+ F: hw/arm/mps2.c
+ F: hw/arm/mps2-tz.c
-+    VMLAL_S_2sc  1111 001 0 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
++F: hw/arm/mps3r.c
-+    VMLAL_U_2sc  1111 001 1 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+ F: hw/misc/mps2-*.c
-+
+ F: include/hw/misc/mps2-*.h
-+    VQDMLAL_2sc  1111 001 0 1 . .. .... .... 0011 . 1 . 0 .... @2scalar_q0
+ F: hw/arm/armsse.c
-+
+diff --git a/configs/devices/arm-softmmu/default.mak b/configs/devices/arm-softmmu/default.mak
      VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
      VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
 +    VMLSL_S_2sc  1111 001 0 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
 +    VMLSL_U_2sc  1111 001 1 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
 +
 +    VQDMLSL_2sc  1111 001 0 1 . .. .... .... 0111 . 1 . 0 .... @2scalar_q0
 +
      VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
      VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
 +    VMULL_S_2sc  1111 001 0 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
 +    VMULL_U_2sc  1111 001 1 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
 +
 +    VQDMULL_2sc  1111 001 0 1 . .. .... .... 1011 . 1 . 0 .... @2scalar_q0
 +
      VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
      VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/configs/devices/arm-softmmu/default.mak
-+++ b/target/arm/translate-neon.inc.c
++++ b/configs/devices/arm-softmmu/default.mak
-@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
+@@ -XXX,XX +XXX,XX @@ CONFIG_ARM_VIRT=y
-     };
+ # CONFIG_INTEGRATOR=n
-     return do_vqrdmlah_2sc(s, a, opfn[a->size]);
+ # CONFIG_FSL_IMX31=n
- }
+ # CONFIG_MUSICPAL=n
-+
++# CONFIG_MPS3R=n
-+static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
+ # CONFIG_MUSCA=n
-+                            NeonGenTwoOpWidenFn *opfn,
+ # CONFIG_CHEETAH=n
-+                            NeonGenTwo64OpFn *accfn)
+ # CONFIG_SX1=n
 diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/arm/mps3r.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Arm MPS3 board emulation for Cortex-R-based FPGA images.
 + * (For M-profile images see mps2.c and mps2tz.c.)
 + *
 + * Copyright (c) 2017 Linaro Limited
 + * Written by Peter Maydell
 + *
 + *  This program is free software; you can redistribute it and/or modify
 + *  it under the terms of the GNU General Public License version 2 or
 + *  (at your option) any later version.
 + */
 +
 +/*
 + * The MPS3 is an FPGA based dev board. This file handles FPGA images
 + * which use the Cortex-R CPUs. We model these separately from the
 + * M-profile images, because on M-profile the FPGA image is based on
 + * a "Subsystem for Embedded" which is similar to an SoC, whereas
 + * the R-profile FPGA images don't have that abstraction layer.
 + *
 + * We model the following FPGA images here:
 + *  "mps3-an536" -- dual Cortex-R52 as documented in Arm Application Note AN536
 + *
 + * Application Note AN536:
 + * https://developer.arm.com/documentation/dai0536/latest/
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/units.h"
 +#include "qapi/error.h"
 +#include "exec/address-spaces.h"
 +#include "cpu.h"
 +#include "hw/boards.h"
 +#include "hw/arm/boot.h"
 +
 +/* Define the layout of RAM and ROM in a board */
 +typedef struct RAMInfo {
 +    const char *name;
 +    hwaddr base;
 +    hwaddr size;
 +    int mrindex; /* index into rams[]; -1 for the system RAM block */
 +    int flags;
 +} RAMInfo;
 +
 +/*
 + * The MPS3 DDR is 3GiB, but on a 32-bit host QEMU doesn't permit
 + * emulation of that much guest RAM, so artificially make it smaller.
 + */
 +#if HOST_LONG_BITS == 32
 +#define MPS3_DDR_SIZE (1 * GiB)
 +#else
 +#define MPS3_DDR_SIZE (3 * GiB)
 +#endif
 +
 +/*
 + * Flag values:
 + * IS_MAIN: this is the main machine RAM
 + * IS_ROM: this area is read-only
 + */
 +#define IS_MAIN 1
 +#define IS_ROM 2
 +
 +#define MPS3R_RAM_MAX 9
 +
 +typedef enum MPS3RFPGAType {
 +    FPGA_AN536,
 +} MPS3RFPGAType;
 +
 +struct MPS3RMachineClass {
 +    MachineClass parent;
 +    MPS3RFPGAType fpga_type;
 +    const RAMInfo *raminfo;
 +};
 +
 +struct MPS3RMachineState {
 +    MachineState parent;
 +    MemoryRegion ram[MPS3R_RAM_MAX];
 +};
 +
 +#define TYPE_MPS3R_MACHINE "mps3r"
 +#define TYPE_MPS3R_AN536_MACHINE MACHINE_TYPE_NAME("mps3-an536")
 +
 +OBJECT_DECLARE_TYPE(MPS3RMachineState, MPS3RMachineClass, MPS3R_MACHINE)
 +
 +static const RAMInfo an536_raminfo[] = {
 +    {
 +        .name = "ATCM",
 +        .base = 0x00000000,
 +        .size = 0x00008000,
 +        .mrindex = 0,
 +    }, {
 +        /* We model the QSPI flash as simple ROM for now */
 +        .name = "QSPI",
 +        .base = 0x08000000,
 +        .size = 0x00800000,
 +        .flags = IS_ROM,
 +        .mrindex = 1,
 +    }, {
 +        .name = "BRAM",
 +        .base = 0x10000000,
 +        .size = 0x00080000,
 +        .mrindex = 2,
 +    }, {
 +        .name = "DDR",
 +        .base = 0x20000000,
 +        .size = MPS3_DDR_SIZE,
 +        .mrindex = -1,
 +    }, {
 +        .name = "ATCM0",
 +        .base = 0xee000000,
 +        .size = 0x00008000,
 +        .mrindex = 3,
 +    }, {
 +        .name = "BTCM0",
 +        .base = 0xee100000,
 +        .size = 0x00008000,
 +        .mrindex = 4,
 +    }, {
 +        .name = "CTCM0",
 +        .base = 0xee200000,
 +        .size = 0x00008000,
 +        .mrindex = 5,
 +    }, {
 +        .name = "ATCM1",
 +        .base = 0xee400000,
 +        .size = 0x00008000,
 +        .mrindex = 6,
 +    }, {
 +        .name = "BTCM1",
 +        .base = 0xee500000,
 +        .size = 0x00008000,
 +        .mrindex = 7,
 +    }, {
 +        .name = "CTCM1",
 +        .base = 0xee600000,
 +        .size = 0x00008000,
 +        .mrindex = 8,
 +    }, {
 +        .name = NULL,
 +    }
 +};
 +
 +static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
 +                                    const RAMInfo *raminfo)
 +{
 +    /* Return an initialized MemoryRegion for the RAMInfo. */
 +    MemoryRegion *ram;
 +
 +    if (raminfo->mrindex < 0) {
 +        /* Means this RAMInfo is for QEMU's "system memory" */
 +        MachineState *machine = MACHINE(mms);
 +        assert(!(raminfo->flags & IS_ROM));
 +        return machine->ram;
 +    }
 +
 +    assert(raminfo->mrindex < MPS3R_RAM_MAX);
 +    ram = &mms->ram[raminfo->mrindex];
 +
 +    memory_region_init_ram(ram, NULL, raminfo->name,
 +                           raminfo->size, &error_fatal);
 +    if (raminfo->flags & IS_ROM) {
 +        memory_region_set_readonly(ram, true);
 +    }
 +    return ram;
 +}
 +
 +static void mps3r_common_init(MachineState *machine)
 +{
 +    MPS3RMachineState *mms = MPS3R_MACHINE(machine);
 +    MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
 +    MemoryRegion *sysmem = get_system_memory();
 +
 +    for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
 +        MemoryRegion *mr = mr_for_raminfo(mms, ri);
 +        memory_region_add_subregion(sysmem, ri->base, mr);
 +    }
 +}
 +
 +static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
 +{
 +    /*
-+     * Two registers and a scalar, long operations: perform an
++     * Set mc->default_ram_size and default_ram_id from the
-+     * operation on the input elements and the scalar which produces
++     * information in mmc->raminfo.
 +     * a double-width result, and then possibly perform an accumulation
 +     * operation of that result into the destination.
 +     */
-+    TCGv_i32 scalar, rn;
++    MachineClass *mc = MACHINE_CLASS(mmc);
-+    TCGv_i64 rn0_64, rn1_64;
++    const RAMInfo *p;
 +
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
++    for (p = mmc->raminfo; p->name; p++) {
-+        return false;
++        if (p->mrindex < 0) {
-+    }
++            /* Found the entry for "system memory" */
-+
++            mc->default_ram_size = p->size;
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++            mc->default_ram_id = p->name;
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
++            return;
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
++        }
-+        return false;
++    }
-+    }
++    g_assert_not_reached();
-+
++}
-+    if (!opfn) {
++
-+        /* Bad size (including size == 3, which is a different insn group) */
++static void mps3r_class_init(ObjectClass *oc, void *data)
-+        return false;
++{
-+    }
++    MachineClass *mc = MACHINE_CLASS(oc);
 +
-+    if (a->vd & 1) {
++    mc->init = mps3r_common_init;
-+        return false;
++}
-+    }
++
-+
++static void mps3r_an536_class_init(ObjectClass *oc, void *data)
-+    if (!vfp_access_check(s)) {
++{
-+        return true;
++    MachineClass *mc = MACHINE_CLASS(oc);
-+    }
++    MPS3RMachineClass *mmc = MPS3R_MACHINE_CLASS(oc);
-+
++    static const char * const valid_cpu_types[] = {
-+    scalar = neon_get_scalar(a->size, a->vm);
++        ARM_CPU_TYPE_NAME("cortex-r52"),
-+
++        NULL
 +    /* Load all inputs before writing any outputs, in case of overlap */
 +    rn = neon_load_reg(a->vn, 0);
 +    rn0_64 = tcg_temp_new_i64();
 +    opfn(rn0_64, rn, scalar);
 +    tcg_temp_free_i32(rn);
 +
 +    rn = neon_load_reg(a->vn, 1);
 +    rn1_64 = tcg_temp_new_i64();
 +    opfn(rn1_64, rn, scalar);
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(scalar);
 +
 +    if (accfn) {
 +        TCGv_i64 t64 = tcg_temp_new_i64();
 +        neon_load_reg64(t64, a->vd);
 +        accfn(t64, t64, rn0_64);
 +        neon_store_reg64(t64, a->vd);
 +        neon_load_reg64(t64, a->vd + 1);
 +        accfn(t64, t64, rn1_64);
 +        neon_store_reg64(t64, a->vd + 1);
 +        tcg_temp_free_i64(t64);
 +    } else {
 +        neon_store_reg64(rn0_64, a->vd);
 +        neon_store_reg64(rn1_64, a->vd + 1);
 +    }
 +    tcg_temp_free_i64(rn0_64);
 +    tcg_temp_free_i64(rn1_64);
 +    return true;
 +}
 +
 +static bool trans_VMULL_S_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_helper_neon_mull_s16,
 +        gen_mull_s32,
 +        NULL,
 +    };
 +
-+    return do_2scalar_long(s, a, opfn[a->size], NULL);
++    mc->desc = "ARM MPS3 with AN536 FPGA image for Cortex-R52";
-+}
++    mc->default_cpus = 2;
-+
++    mc->min_cpus = mc->default_cpus;
-+static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a)
++    mc->max_cpus = mc->default_cpus;
-+{
++    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-r52");
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++    mc->valid_cpu_types = valid_cpu_types;
-+        NULL,
++    mmc->raminfo = an536_raminfo;
-+        gen_helper_neon_mull_u16,
++    mps3r_set_default_ram_info(mmc);
-+        gen_mull_u32,
++}
-+        NULL,
++
-+    };
++static const TypeInfo mps3r_machine_types[] = {
-+
++    {
-+    return do_2scalar_long(s, a, opfn[a->size], NULL);
++        .name = TYPE_MPS3R_MACHINE,
-+}
++        .parent = TYPE_MACHINE,
-+
++        .abstract = true,
-+#define DO_VMLAL_2SC(INSN, MULL, ACC)                                   \
++        .instance_size = sizeof(MPS3RMachineState),
-+    static bool trans_##INSN##_2sc(DisasContext *s, arg_2scalar *a)     \
++        .class_size = sizeof(MPS3RMachineClass),
-+    {                                                                   \
++        .class_init = mps3r_class_init,
-+        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
++    }, {
-+            NULL,                                                       \
++        .name = TYPE_MPS3R_AN536_MACHINE,
-+            gen_helper_neon_##MULL##16,                                 \
++        .parent = TYPE_MPS3R_MACHINE,
-+            gen_##MULL##32,                                             \
++        .class_init = mps3r_an536_class_init,
-+            NULL,                                                       \
++    },
-+        };                                                              \
++};
-+        static NeonGenTwo64OpFn * const accfn[] = {                     \
++
-+            NULL,                                                       \
++DEFINE_TYPES(mps3r_machine_types);
-+            gen_helper_neon_##ACC##l_u32,                               \
+diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
 +            tcg_gen_##ACC##_i64,                                        \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);    \
 +    }
 +
 +DO_VMLAL_2SC(VMLAL_S, mull_s, add)
 +DO_VMLAL_2SC(VMLAL_U, mull_u, add)
 +DO_VMLAL_2SC(VMLSL_S, mull_s, sub)
 +DO_VMLAL_2SC(VMLSL_U, mull_u, sub)
 +
 +static bool trans_VQDMULL_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULL_16,
 +        gen_VQDMULL_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar_long(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VQDMLAL_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULL_16,
 +        gen_VQDMULL_32,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const accfn[] = {
 +        NULL,
 +        gen_VQDMLAL_acc_16,
 +        gen_VQDMLAL_acc_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
 +}
 +
 +static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULL_16,
 +        gen_VQDMULL_32,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const accfn[] = {
 +        NULL,
 +        gen_VQDMLSL_acc_16,
 +        gen_VQDMLSL_acc_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/arm/Kconfig
-+++ b/target/arm/translate.c
++++ b/hw/arm/Kconfig
-@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
+@@ -XXX,XX +XXX,XX @@ config MAINSTONE
-     tcg_gen_ext16s_i32(dest, var);
+     select PFLASH_CFI01
- }
+     select SMC91C111
--/* 32x32->64 multiply.  Marks inputs as dead.  */
++config MPS3R
--static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
++    bool
--{
++    default y
--    TCGv_i32 lo = tcg_temp_new_i32();
++    depends on TCG && ARM
--    TCGv_i32 hi = tcg_temp_new_i32();
++
--    TCGv_i64 ret;
+ config MUSCA
--
+     bool
--    tcg_gen_mulu2_i32(lo, hi, a, b);
+     default y
--    tcg_temp_free_i32(a);
+diff --git a/hw/arm/meson.build b/hw/arm/meson.build
--    tcg_temp_free_i32(b);
+index XXXXXXX..XXXXXXX 100644
--
+--- a/hw/arm/meson.build
--    ret = tcg_temp_new_i64();
++++ b/hw/arm/meson.build
--    tcg_gen_concat_i32_i64(ret, lo, hi);
+@@ -XXX,XX +XXX,XX @@ arm_ss.add(when: 'CONFIG_HIGHBANK', if_true: files('highbank.c'))
--    tcg_temp_free_i32(lo);
+ arm_ss.add(when: 'CONFIG_INTEGRATOR', if_true: files('integratorcp.c'))
--    tcg_temp_free_i32(hi);
+ arm_ss.add(when: 'CONFIG_MAINSTONE', if_true: files('mainstone.c'))
--
+ arm_ss.add(when: 'CONFIG_MICROBIT', if_true: files('microbit.c'))
--    return ret;
++arm_ss.add(when: 'CONFIG_MPS3R', if_true: files('mps3r.c'))
--}
+ arm_ss.add(when: 'CONFIG_MUSICPAL', if_true: files('musicpal.c'))
--
+ arm_ss.add(when: 'CONFIG_NETDUINOPLUS2', if_true: files('netduinoplus2.c'))
--static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
+ arm_ss.add(when: 'CONFIG_OLIMEX_STM32_H405', if_true: files('olimex-stm32-h405.c'))
 -{
 -    TCGv_i32 lo = tcg_temp_new_i32();
 -    TCGv_i32 hi = tcg_temp_new_i32();
 -    TCGv_i64 ret;
 -
 -    tcg_gen_muls2_i32(lo, hi, a, b);
 -    tcg_temp_free_i32(a);
 -    tcg_temp_free_i32(b);
 -
 -    ret = tcg_temp_new_i64();
 -    tcg_gen_concat_i32_i64(ret, lo, hi);
 -    tcg_temp_free_i32(lo);
 -    tcg_temp_free_i32(hi);
 -
 -    return ret;
 -}
 -
  /* Swap low and high halfwords.  */
  static void gen_swap_half(TCGv_i32 var)
  {
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
      }
  }
 -static inline void gen_neon_negl(TCGv_i64 var, int size)
 -{
 -    switch (size) {
 -    case 0: gen_helper_neon_negl_u16(var, var); break;
 -    case 1: gen_helper_neon_negl_u32(var, var); break;
 -    case 2:
 -        tcg_gen_neg_i64(var, var);
 -        break;
 -    default: abort();
 -    }
 -}
 -
 -static inline void gen_neon_addl_saturate(TCGv_i64 op0, TCGv_i64 op1, int size)
 -{
 -    switch (size) {
 -    case 1: gen_helper_neon_addl_saturate_s32(op0, cpu_env, op0, op1); break;
 -    case 2: gen_helper_neon_addl_saturate_s64(op0, cpu_env, op0, op1); break;
 -    default: abort();
 -    }
 -}
 -
 -static inline void gen_neon_mull(TCGv_i64 dest, TCGv_i32 a, TCGv_i32 b,
 -                                 int size, int u)
 -{
 -    TCGv_i64 tmp;
 -
 -    switch ((size << 1) | u) {
 -    case 0: gen_helper_neon_mull_s8(dest, a, b); break;
 -    case 1: gen_helper_neon_mull_u8(dest, a, b); break;
 -    case 2: gen_helper_neon_mull_s16(dest, a, b); break;
 -    case 3: gen_helper_neon_mull_u16(dest, a, b); break;
 -    case 4:
 -        tmp = gen_muls_i64_i32(a, b);
 -        tcg_gen_mov_i64(dest, tmp);
 -        tcg_temp_free_i64(tmp);
 -        break;
 -    case 5:
 -        tmp = gen_mulu_i64_i32(a, b);
 -        tcg_gen_mov_i64(dest, tmp);
 -        tcg_temp_free_i64(tmp);
 -        break;
 -    default: abort();
 -    }
 -
 -    /* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
 -       Don't forget to clean them now.  */
 -    if (size < 2) {
 -        tcg_temp_free_i32(a);
 -        tcg_temp_free_i32(b);
 -    }
 -}
 -
  static void gen_neon_narrow_op(int op, int u, int size,
                                 TCGv_i32 dest, TCGv_i64 src)
  {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
      int u;
      int vec_size;
      uint32_t imm;
 -    TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
 +    TCGv_i32 tmp, tmp2, tmp3, tmp5;
      TCGv_ptr ptr1;
      TCGv_i64 tmp64;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          return 1;
      } else { /* (insn & 0x00800010 == 0x00800000) */
          if (size != 3) {
 -            op = (insn >> 8) & 0xf;
 -            if ((insn & (1 << 6)) == 0) {
 -                /* Three registers of different lengths: handled by decodetree */
 -                return 1;
 -            } else {
 -                /* Two registers and a scalar. NB that for ops of this form
 -                 * the ARM ARM labels bit 24 as Q, but it is in our variable
 -                 * 'u', not 'q'.
 -                 */
 -                if (size == 0) {
 -                    return 1;
 -                }
 -                switch (op) {
 -                case 0: /* Integer VMLA scalar */
 -                case 4: /* Integer VMLS scalar */
 -                case 8: /* Integer VMUL scalar */
 -                case 1: /* Float VMLA scalar */
 -                case 5: /* Floating point VMLS scalar */
 -                case 9: /* Floating point VMUL scalar */
 -                case 12: /* VQDMULH scalar */
 -                case 13: /* VQRDMULH scalar */
 -                case 14: /* VQRDMLAH scalar */
 -                case 15: /* VQRDMLSH scalar */
 -                    return 1; /* handled by decodetree */
 -
 -                case 3: /* VQDMLAL scalar */
 -                case 7: /* VQDMLSL scalar */
 -                case 11: /* VQDMULL scalar */
 -                    if (u == 1) {
 -                        return 1;
 -                    }
 -                    /* fall through */
 -                case 2: /* VMLAL sclar */
 -                case 6: /* VMLSL scalar */
 -                case 10: /* VMULL scalar */
 -                    if (rd & 1) {
 -                        return 1;
 -                    }
 -                    tmp2 = neon_get_scalar(size, rm);
 -                    /* We need a copy of tmp2 because gen_neon_mull
 -                     * deletes it during pass 0.  */
 -                    tmp4 = tcg_temp_new_i32();
 -                    tcg_gen_mov_i32(tmp4, tmp2);
 -                    tmp3 = neon_load_reg(rn, 1);
 -
 -                    for (pass = 0; pass < 2; pass++) {
 -                        if (pass == 0) {
 -                            tmp = neon_load_reg(rn, 0);
 -                        } else {
 -                            tmp = tmp3;
 -                            tmp2 = tmp4;
 -                        }
 -                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
 -                        if (op != 11) {
 -                            neon_load_reg64(cpu_V1, rd + pass);
 -                        }
 -                        switch (op) {
 -                        case 6:
 -                            gen_neon_negl(cpu_V0, size);
 -                            /* Fall through */
 -                        case 2:
 -                            gen_neon_addl(size);
 -                            break;
 -                        case 3: case 7:
 -                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
 -                            if (op == 7) {
 -                                gen_neon_negl(cpu_V0, size);
 -                            }
 -                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
 -                            break;
 -                        case 10:
 -                            /* no-op */
 -                            break;
 -                        case 11:
 -                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
 -                            break;
 -                        default:
 -                            abort();
 -                        }
 -                        neon_store_reg64(cpu_V0, rd + pass);
 -                    }
 -                    break;
 -                default:
 -                    g_assert_not_reached();
 -                }
 -            }
 +            /*
 +             * Three registers of different lengths, or two registers and
 +             * a scalar: handled by decodetree
 +             */
 +            return 1;
          } else { /* size == 3 */
              if (!u) {
                  /* Extract.  */
 --
-.20.1
+.34.1

-[PULL 06/23] target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
+[PULL 31/35] hw/arm/mps3r: Add CPUs, GIC, and per-CPU RAM
-Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
+Create the CPUs, the GIC, and the per-CPU RAM block for
-these are all saturating doubling long multiplies with a possible
+the mps3-an536 board.
 accumulate step.
 These are the last insns in the group which use the pass-over-each
 elements loop, so we can delete that code.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-10-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  6 +++
+ hw/arm/mps3r.c | 180 ++++++++++++++++++++++++++++++++++++++++++++++++-
- target/arm/translate-neon.inc.c | 82 +++++++++++++++++++++++++++++++++
+file changed, 177 insertions(+), 3 deletions(-)
  target/arm/translate.c          | 59 ++----------------------
 files changed, 92 insertions(+), 55 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/neon-dp.decode
++++ b/hw/arm/mps3r.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@
-     VMLAL_S_3d   1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
+ #include "qemu/osdep.h"
-     VMLAL_U_3d   1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
+ #include "qemu/units.h"
+ #include "qapi/error.h"
-+    VQDMLAL_3d   1111 001 0 1 . .. .... .... 1001 . 0 . 0 .... @3diff
++#include "qapi/qmp/qlist.h"
-+
+ #include "exec/address-spaces.h"
-     VMLSL_S_3d   1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
+ #include "cpu.h"
-     VMLSL_U_3d   1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
+ #include "hw/boards.h"
++#include "hw/qdev-properties.h"
-+    VQDMLSL_3d   1111 001 0 1 . .. .... .... 1011 . 0 . 0 .... @3diff
+ #include "hw/arm/boot.h"
-+
++#include "hw/arm/bsa.h"
-     VMULL_S_3d   1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
++#include "hw/intc/arm_gicv3.h"
-     VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
-+
+ /* Define the layout of RAM and ROM in a board */
-+    VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
+ typedef struct RAMInfo {
-   ]
+@@ -XXX,XX +XXX,XX @@ typedef struct RAMInfo {
  #define IS_ROM 2
  #define MPS3R_RAM_MAX 9
 +#define MPS3R_CPU_MAX 2
 +
 +#define PERIPHBASE 0xf0000000
 +#define NUM_SPIS 96
  typedef enum MPS3RFPGAType {
      FPGA_AN536,
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineClass {
      MachineClass parent;
      MPS3RFPGAType fpga_type;
      const RAMInfo *raminfo;
 +    hwaddr loader_start;
  };
  struct MPS3RMachineState {
      MachineState parent;
 +    struct arm_boot_info bootinfo;
      MemoryRegion ram[MPS3R_RAM_MAX];
 +    Object *cpu[MPS3R_CPU_MAX];
 +    MemoryRegion cpu_sysmem[MPS3R_CPU_MAX];
 +    MemoryRegion sysmem_alias[MPS3R_CPU_MAX];
 +    MemoryRegion cpu_ram[MPS3R_CPU_MAX];
 +    GICv3State gic;
  };
  #define TYPE_MPS3R_MACHINE "mps3r"
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
      return ram;
  }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
++/*
---- a/target/arm/translate-neon.inc.c
++ * There is no defined secondary boot protocol for Linux for the AN536,
-+++ b/target/arm/translate-neon.inc.c
++ * because real hardware has a restriction that atomic operations between
-@@ -XXX,XX +XXX,XX @@ DO_VMLAL(VMLAL_S,mull_s,add)
++ * the two CPUs do not function correctly, and so true SMP is not
- DO_VMLAL(VMLAL_U,mull_u,add)
++ * possible. Therefore for cases where the user is directly booting
- DO_VMLAL(VMLSL_S,mull_s,sub)
++ * a kernel, we treat the system as essentially uniprocessor, and
- DO_VMLAL(VMLSL_U,mull_u,sub)
++ * put the secondary CPU into power-off state (as if the user on the
-+
++ * real hardware had configured the secondary to be halted via the
-+static void gen_VQDMULL_16(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
++ * SCC config registers).
 + *
 + * Note that the default secondary boot code would not work here anyway
 + * as it assumes a GICv2, and we have a GICv3.
 + */
 +static void mps3r_write_secondary_boot(ARMCPU *cpu,
 +                                       const struct arm_boot_info *info)
 +{
-+    gen_helper_neon_mull_s16(rd, rn, rm);
++    /*
-+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rd, rd);
++     * Power the secondary CPU off. This means we don't need to write any
 +     * boot code into guest memory. Note that the 'cpu' argument to this
 +     * function is the primary CPU we passed to arm_load_kernel(), not
 +     * the secondary. Loop around all the other CPUs, as the boot.c
 +     * code does for the "disable secondaries if PSCI is enabled" case.
 +     */
 +    for (CPUState *cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
 +        if (cs != first_cpu) {
 +            object_property_set_bool(OBJECT(cs), "start-powered-off", true,
 +                                     &error_abort);
 +        }
 +    }
 +}
 +
-+static void gen_VQDMULL_32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
++static void mps3r_secondary_cpu_reset(ARMCPU *cpu,
 +                                      const struct arm_boot_info *info)
 +{
-+    gen_mull_s32(rd, rn, rm);
++    /* We don't need to do anything here because the CPU will be off */
 +    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rd, rd);
 +}
 +
-+static bool trans_VQDMULL_3d(DisasContext *s, arg_3diff *a)
++static void create_gic(MPS3RMachineState *mms, MemoryRegion *sysmem)
 +{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++    MachineState *machine = MACHINE(mms);
-+        NULL,
++    DeviceState *gicdev;
-+        gen_VQDMULL_16,
++    QList *redist_region_count;
-+        gen_VQDMULL_32,
++
-+        NULL,
++    object_initialize_child(OBJECT(mms), "gic", &mms->gic, TYPE_ARM_GICV3);
-+    };
++    gicdev = DEVICE(&mms->gic);
-+
++    qdev_prop_set_uint32(gicdev, "num-cpu", machine->smp.cpus);
-+    return do_long_3d(s, a, opfn[a->size], NULL);
++    qdev_prop_set_uint32(gicdev, "num-irq", NUM_SPIS + GIC_INTERNAL);
 +    redist_region_count = qlist_new();
 +    qlist_append_int(redist_region_count, machine->smp.cpus);
 +    qdev_prop_set_array(gicdev, "redist-region-count", redist_region_count);
 +    object_property_set_link(OBJECT(&mms->gic), "sysmem",
 +                             OBJECT(sysmem), &error_fatal);
 +    sysbus_realize(SYS_BUS_DEVICE(&mms->gic), &error_fatal);
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->gic), 0, PERIPHBASE);
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->gic), 1, PERIPHBASE + 0x100000);
 +    /*
 +     * Wire the outputs from each CPU's generic timer and the GICv3
 +     * maintenance interrupt signal to the appropriate GIC PPI inputs,
 +     * and the GIC's IRQ/FIQ/VIRQ/VFIQ interrupt outputs to the CPU's inputs.
 +     */
 +    for (int i = 0; i < machine->smp.cpus; i++) {
 +        DeviceState *cpudev = DEVICE(mms->cpu[i]);
 +        SysBusDevice *gicsbd = SYS_BUS_DEVICE(&mms->gic);
 +        int intidbase = NUM_SPIS + i * GIC_INTERNAL;
 +        int irq;
 +        /*
 +         * Mapping from the output timer irq lines from the CPU to the
 +         * GIC PPI inputs used for this board. This isn't a BSA board,
 +         * but it uses the standard convention for the PPI numbers.
 +         */
 +        const int timer_irq[] = {
 +            [GTIMER_PHYS] = ARCH_TIMER_NS_EL1_IRQ,
 +            [GTIMER_VIRT] = ARCH_TIMER_VIRT_IRQ,
 +            [GTIMER_HYP]  = ARCH_TIMER_NS_EL2_IRQ,
 +        };
 +
 +        for (irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
 +            qdev_connect_gpio_out(cpudev, irq,
 +                                  qdev_get_gpio_in(gicdev,
 +                                                   intidbase + timer_irq[irq]));
 +        }
 +
 +        qdev_connect_gpio_out_named(cpudev, "gicv3-maintenance-interrupt", 0,
 +                                    qdev_get_gpio_in(gicdev,
 +                                                     intidbase + ARCH_GIC_MAINT_IRQ));
 +
 +        qdev_connect_gpio_out_named(cpudev, "pmu-interrupt", 0,
 +                                    qdev_get_gpio_in(gicdev,
 +                                                     intidbase + VIRTUAL_PMU_IRQ));
 +
 +        sysbus_connect_irq(gicsbd, i,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_IRQ));
 +        sysbus_connect_irq(gicsbd, i + machine->smp.cpus,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_FIQ));
 +        sysbus_connect_irq(gicsbd, i + 2 * machine->smp.cpus,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_VIRQ));
 +        sysbus_connect_irq(gicsbd, i + 3 * machine->smp.cpus,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ));
 +    }
 +}
 +
-+static void gen_VQDMLAL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+ static void mps3r_common_init(MachineState *machine)
-+{
+ {
-+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
+     MPS3RMachineState *mms = MPS3R_MACHINE(machine);
-+}
+@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
-+
+         MemoryRegion *mr = mr_for_raminfo(mms, ri);
-+static void gen_VQDMLAL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+         memory_region_add_subregion(sysmem, ri->base, mr);
-+{
+     }
-+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
++
-+}
++    assert(machine->smp.cpus <= MPS3R_CPU_MAX);
-+
++    for (int i = 0; i < machine->smp.cpus; i++) {
-+static bool trans_VQDMLAL_3d(DisasContext *s, arg_3diff *a)
++        g_autofree char *sysmem_name = g_strdup_printf("cpu-%d-memory", i);
-+{
++        g_autofree char *ramname = g_strdup_printf("cpu-%d-memory", i);
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++        g_autofree char *alias_name = g_strdup_printf("sysmem-alias-%d", i);
-+        NULL,
++
-+        gen_VQDMULL_16,
++        /*
-+        gen_VQDMULL_32,
++         * Each CPU has some private RAM/peripherals, so create the container
-+        NULL,
++         * which will house those, with the whole-machine system memory being
-+    };
++         * used where there's no CPU-specific device. Note that we need the
-+    static NeonGenTwo64OpFn * const accfn[] = {
++         * sysmem_alias aliases because we can't put one MR (the original
-+        NULL,
++         * 'sysmem') into more than one other MR.
-+        gen_VQDMLAL_acc_16,
++         */
-+        gen_VQDMLAL_acc_32,
++        memory_region_init(&mms->cpu_sysmem[i], OBJECT(machine),
-+        NULL,
++                           sysmem_name, UINT64_MAX);
-+    };
++        memory_region_init_alias(&mms->sysmem_alias[i], OBJECT(machine),
-+
++                                 alias_name, sysmem, 0, UINT64_MAX);
-+    return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
++        memory_region_add_subregion_overlap(&mms->cpu_sysmem[i], 0,
-+}
++                                            &mms->sysmem_alias[i], -1);
 +
-+static void gen_VQDMLSL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
++        mms->cpu[i] = object_new(machine->cpu_type);
-+{
++        object_property_set_link(mms->cpu[i], "memory",
-+    gen_helper_neon_negl_u32(rm, rm);
++                                 OBJECT(&mms->cpu_sysmem[i]), &error_abort);
-+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
++        object_property_set_int(mms->cpu[i], "reset-cbar",
-+}
++                                PERIPHBASE, &error_abort);
-+
++        qdev_realize(DEVICE(mms->cpu[i]), NULL, &error_fatal);
-+static void gen_VQDMLSL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
++        object_unref(mms->cpu[i]);
-+{
++
-+    tcg_gen_neg_i64(rm, rm);
++        /* Per-CPU RAM */
-+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
++        memory_region_init_ram(&mms->cpu_ram[i], NULL, ramname,
-+}
++                               0x1000, &error_fatal);
-+
++        memory_region_add_subregion(&mms->cpu_sysmem[i], 0xe7c01000,
-+static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
++                                    &mms->cpu_ram[i]);
-+{
++    }
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++
-+        NULL,
++    create_gic(mms, sysmem);
-+        gen_VQDMULL_16,
++
-+        gen_VQDMULL_32,
++    mms->bootinfo.ram_size = machine->ram_size;
-+        NULL,
++    mms->bootinfo.board_id = -1;
-+    };
++    mms->bootinfo.loader_start = mmc->loader_start;
-+    static NeonGenTwo64OpFn * const accfn[] = {
++    mms->bootinfo.write_secondary_boot = mps3r_write_secondary_boot;
-+        NULL,
++    mms->bootinfo.secondary_cpu_reset_hook = mps3r_secondary_cpu_reset;
-+        gen_VQDMLSL_acc_16,
++    arm_load_kernel(ARM_CPU(mms->cpu[0]), machine, &mms->bootinfo);
-+        gen_VQDMLSL_acc_32,
+ }
-+        NULL,
-+    };
+ static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
-+
+@@ -XXX,XX +XXX,XX @@ static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
-+    return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
+             /* Found the entry for "system memory" */
-+}
+             mc->default_ram_size = p->size;
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+             mc->default_ram_id = p->name;
-index XXXXXXX..XXXXXXX 100644
++            mmc->loader_start = p->base;
---- a/target/arm/translate.c
+             return;
-+++ b/target/arm/translate.c
+         }
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+     }
-                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
+@@ -XXX,XX +XXX,XX @@ static void mps3r_an536_class_init(ObjectClass *oc, void *data)
-                     {0, 0, 0, 7}, /* VABDL */
+     };
-                     {0, 0, 0, 7}, /* VMLAL */
--                    {0, 0, 0, 9}, /* VQDMLAL */
+     mc->desc = "ARM MPS3 with AN536 FPGA image for Cortex-R52";
-+                    {0, 0, 0, 7}, /* VQDMLAL */
+-    mc->default_cpus = 2;
-                     {0, 0, 0, 7}, /* VMLSL */
+-    mc->min_cpus = mc->default_cpus;
--                    {0, 0, 0, 9}, /* VQDMLSL */
+-    mc->max_cpus = mc->default_cpus;
-+                    {0, 0, 0, 7}, /* VQDMLSL */
++    /*
-                     {0, 0, 0, 7}, /* Integer VMULL */
++     * In the real FPGA image there are always two cores, but the standard
--                    {0, 0, 0, 9}, /* VQDMULL */
++     * initial setting for the SCC SYSCON 0x000 register is 0x21, meaning
-+                    {0, 0, 0, 7}, /* VQDMULL */
++     * that the second core is held in reset and halted. Many images built for
-                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
++     * the board do not expect the second core to run at startup (especially
-                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
++     * since on the real FPGA image it is not possible to use LDREX/STREX
-                 };
++     * in RAM between the two cores, so a true SMP setup isn't supported).
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++     *
-                     }
++     * As QEMU's equivalent of this, we support both -smp 1 and -smp 2,
-                     return 0;
++     * with the default being -smp 1. This seems a more intuitive UI for
-                 }
++     * QEMU users than, for instance, having a machine property to allow
--
++     * the user to set the initial value of the SYSCON 0x000 register.
--                /* Avoid overlapping operands.  Wide source operands are
++     */
--                   always aligned so will never overlap with wide
++    mc->default_cpus = 1;
--                   destinations in problematic ways.  */
++    mc->min_cpus = 1;
--                if (rd == rm) {
++    mc->max_cpus = 2;
--                    tmp = neon_load_reg(rm, 1);
+     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-r52");
--                    neon_store_scratch(2, tmp);
+     mc->valid_cpu_types = valid_cpu_types;
--                } else if (rd == rn) {
+     mmc->raminfo = an536_raminfo;
 -                    tmp = neon_load_reg(rn, 1);
 -                    neon_store_scratch(2, tmp);
 -                }
 -                tmp3 = NULL;
 -                for (pass = 0; pass < 2; pass++) {
 -                    if (pass == 1 && rd == rn) {
 -                        tmp = neon_load_scratch(2);
 -                    } else {
 -                        tmp = neon_load_reg(rn, pass);
 -                    }
 -                    if (pass == 1 && rd == rm) {
 -                        tmp2 = neon_load_scratch(2);
 -                    } else {
 -                        tmp2 = neon_load_reg(rm, pass);
 -                    }
 -                    switch (op) {
 -                    case 9: case 11: case 13:
 -                        /* VQDMLAL, VQDMLSL, VQDMULL */
 -                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
 -                        break;
 -                    default: /* 15 is RESERVED: caught earlier  */
 -                        abort();
 -                    }
 -                    if (op == 13) {
 -                        /* VQDMULL */
 -                        gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
 -                        neon_store_reg64(cpu_V0, rd + pass);
 -                    } else {
 -                        /* Accumulate.  */
 -                        neon_load_reg64(cpu_V1, rd + pass);
 -                        switch (op) {
 -                        case 9: case 11: /* VQDMLAL, VQDMLSL */
 -                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
 -                            if (op == 11) {
 -                                gen_neon_negl(cpu_V0, size);
 -                            }
 -                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
 -                            break;
 -                        default:
 -                            abort();
 -                        }
 -                        neon_store_reg64(cpu_V0, rd + pass);
 -                    }
 -                }
 +                abort(); /* all others handled by decodetree */
              } else {
                  /* Two registers and a scalar. NB that for ops of this form
                   * the ARM ARM labels bit 24 as Q, but it is in our variable
 --
-.20.1
+.34.1

-[PULL 04/23] target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
+[PULL 32/35] hw/arm/mps3r: Add UARTs
-Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
+This board has a lot of UARTs: there is one UART per CPU in the
-Like almost all the remaining insns in this group, these are
+per-CPU peripheral part of the address map, whose interrupts are
-a combination of a two-input operation which returns a double width
+connected as per-CPU interrupt lines.  Then there are 4 UARTs in the
-result and then a possible accumulation of that double width
+normal part of the peripheral space, whose interrupts are shared
-result into the destination.
+peripheral interrupts.
 Connect and wire them all up; this involves some OR gates where
 multiple overflow interrupts are wired into one GIC input.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206132931.38376-11-peter.maydell@linaro.org
 ---
- target/arm/translate.h          |   1 +
+ hw/arm/mps3r.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++
- target/arm/neon-dp.decode       |   6 ++
+file changed, 94 insertions(+)
  target/arm/translate-neon.inc.c | 132 ++++++++++++++++++++++++++++++++
  target/arm/translate.c          |  31 +-------
 files changed, 142 insertions(+), 28 deletions(-)
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/translate.h
++++ b/hw/arm/mps3r.c
-@@ -XXX,XX +XXX,XX @@ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
+@@ -XXX,XX +XXX,XX @@
- typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
+ #include "qapi/qmp/qlist.h"
- typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
+ #include "exec/address-spaces.h"
- typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
+ #include "cpu.h"
-+typedef void NeonGenTwoOpWidenFn(TCGv_i64, TCGv_i32, TCGv_i32);
++#include "sysemu/sysemu.h"
- typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
+ #include "hw/boards.h"
- typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
++#include "hw/or-irq.h"
- typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
+ #include "hw/qdev-properties.h"
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+ #include "hw/arm/boot.h"
-index XXXXXXX..XXXXXXX 100644
+ #include "hw/arm/bsa.h"
---- a/target/arm/neon-dp.decode
++#include "hw/char/cmsdk-apb-uart.h"
-+++ b/target/arm/neon-dp.decode
+ #include "hw/intc/arm_gicv3.h"
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+ /* Define the layout of RAM and ROM in a board */
-     VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+@@ -XXX,XX +XXX,XX @@ typedef struct RAMInfo {
-+    VABAL_S_3d   1111 001 0 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+ #define MPS3R_RAM_MAX 9
-+    VABAL_U_3d   1111 001 1 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+ #define MPS3R_CPU_MAX 2
 +#define MPS3R_UART_MAX 4 /* shared UART count */
  #define PERIPHBASE 0xf0000000
  #define NUM_SPIS 96
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
      MemoryRegion sysmem_alias[MPS3R_CPU_MAX];
      MemoryRegion cpu_ram[MPS3R_CPU_MAX];
      GICv3State gic;
 +    /* per-CPU UARTs followed by the shared UARTs */
 +    CMSDKAPBUART uart[MPS3R_CPU_MAX + MPS3R_UART_MAX];
 +    OrIRQState cpu_uart_oflow[MPS3R_CPU_MAX];
 +    OrIRQState uart_oflow;
  };
  #define TYPE_MPS3R_MACHINE "mps3r"
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
  OBJECT_DECLARE_TYPE(MPS3RMachineState, MPS3RMachineClass, MPS3R_MACHINE)
 +/*
 + * Main clock frequency CLK in Hz (50MHz). In the image there are also
 + * ACLK, MCLK, GPUCLK and PERIPHCLK at the same frequency; for our
 + * model we just roll them all into one.
 + */
 +#define CLK_FRQ 50000000
 +
-     VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+ static const RAMInfo an536_raminfo[] = {
-     VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+     {
          .name = "ATCM",
@@ -XXX,XX +XXX,XX @@ static void create_gic(MPS3RMachineState *mms, MemoryRegion *sysmem)
      }
  }
 +/*
 + * Create UART uartno, and map it into the MemoryRegion mem at address baseaddr.
 + * The qemu_irq arguments are where we connect the various IRQs from the UART.
 + */
 +static void create_uart(MPS3RMachineState *mms, int uartno, MemoryRegion *mem,
 +                        hwaddr baseaddr, qemu_irq txirq, qemu_irq rxirq,
 +                        qemu_irq txoverirq, qemu_irq rxoverirq,
 +                        qemu_irq combirq)
 +{
 +    g_autofree char *s = g_strdup_printf("uart%d", uartno);
 +    SysBusDevice *sbd;
 +
-+    VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
++    assert(uartno < ARRAY_SIZE(mms->uart));
-+    VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
++    object_initialize_child(OBJECT(mms), s, &mms->uart[uartno],
-   ]
++                            TYPE_CMSDK_APB_UART);
- }
++    qdev_prop_set_uint32(DEVICE(&mms->uart[uartno]), "pclk-frq", CLK_FRQ);
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
++    qdev_prop_set_chr(DEVICE(&mms->uart[uartno]), "chardev", serial_hd(uartno));
-index XXXXXXX..XXXXXXX 100644
++    sbd = SYS_BUS_DEVICE(&mms->uart[uartno]);
---- a/target/arm/translate-neon.inc.c
++    sysbus_realize(sbd, &error_fatal);
-+++ b/target/arm/translate-neon.inc.c
++    memory_region_add_subregion(mem, baseaddr,
-@@ -XXX,XX +XXX,XX @@ DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
++                                sysbus_mmio_get_region(sbd, 0));
- DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
++    sysbus_connect_irq(sbd, 0, txirq);
- DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
++    sysbus_connect_irq(sbd, 1, rxirq);
- DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
++    sysbus_connect_irq(sbd, 2, txoverirq);
-+
++    sysbus_connect_irq(sbd, 3, rxoverirq);
-+static bool do_long_3d(DisasContext *s, arg_3diff *a,
++    sysbus_connect_irq(sbd, 4, combirq);
 +                       NeonGenTwoOpWidenFn *opfn,
 +                       NeonGenTwo64OpFn *accfn)
 +{
 +    /*
 +     * 3-regs different lengths, long operations.
 +     * These perform an operation on two inputs that returns a double-width
 +     * result, and then possibly perform an accumulation operation of
 +     * that result into the double-width destination.
 +     */
 +    TCGv_i64 rd0, rd1, tmp;
 +    TCGv_i32 rn, rm;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!opfn) {
 +        /* size == 3 case, which is an entirely different insn group */
 +        return false;
 +    }
 +
 +    if (a->vd & 1) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    rd0 = tcg_temp_new_i64();
 +    rd1 = tcg_temp_new_i64();
 +
 +    rn = neon_load_reg(a->vn, 0);
 +    rm = neon_load_reg(a->vm, 0);
 +    opfn(rd0, rn, rm);
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rm);
 +
 +    rn = neon_load_reg(a->vn, 1);
 +    rm = neon_load_reg(a->vm, 1);
 +    opfn(rd1, rn, rm);
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rm);
 +
 +    /* Don't store results until after all loads: they might overlap */
 +    if (accfn) {
 +        tmp = tcg_temp_new_i64();
 +        neon_load_reg64(tmp, a->vd);
 +        accfn(tmp, tmp, rd0);
 +        neon_store_reg64(tmp, a->vd);
 +        neon_load_reg64(tmp, a->vd + 1);
 +        accfn(tmp, tmp, rd1);
 +        neon_store_reg64(tmp, a->vd + 1);
 +        tcg_temp_free_i64(tmp);
 +    } else {
 +        neon_store_reg64(rd0, a->vd);
 +        neon_store_reg64(rd1, a->vd + 1);
 +    }
 +
 +    tcg_temp_free_i64(rd0);
 +    tcg_temp_free_i64(rd1);
 +
 +    return true;
 +}
 +
-+static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
+ static void mps3r_common_init(MachineState *machine)
-+{
+ {
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
+     MPS3RMachineState *mms = MPS3R_MACHINE(machine);
-+        gen_helper_neon_abdl_s16,
+     MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
-+        gen_helper_neon_abdl_s32,
+     MemoryRegion *sysmem = get_system_memory();
-+        gen_helper_neon_abdl_s64,
++    DeviceState *gicdev;
-+        NULL,
-+    };
+     for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
          MemoryRegion *mr = mr_for_raminfo(mms, ri);
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
      }
      create_gic(mms, sysmem);
 +    gicdev = DEVICE(&mms->gic);
 +
-+    return do_long_3d(s, a, opfn[a->size], NULL);
++    /*
-+}
++     * UARTs 0 and 1 are per-CPU; their interrupts are wired to
 +     * the relevant CPU's PPI 0..3, aka INTID 16..19
 +     */
 +    for (int i = 0; i < machine->smp.cpus; i++) {
 +        int intidbase = NUM_SPIS + i * GIC_INTERNAL;
 +        g_autofree char *s = g_strdup_printf("cpu-uart-oflow-orgate%d", i);
 +        DeviceState *orgate;
 +
-+static bool trans_VABDL_U_3d(DisasContext *s, arg_3diff *a)
++        /* The two overflow IRQs from the UART are ORed together into PPI 3 */
-+{
++        object_initialize_child(OBJECT(mms), s, &mms->cpu_uart_oflow[i],
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++                                TYPE_OR_IRQ);
-+        gen_helper_neon_abdl_u16,
++        orgate = DEVICE(&mms->cpu_uart_oflow[i]);
-+        gen_helper_neon_abdl_u32,
++        qdev_prop_set_uint32(orgate, "num-lines", 2);
-+        gen_helper_neon_abdl_u64,
++        qdev_realize(orgate, NULL, &error_fatal);
-+        NULL,
++        qdev_connect_gpio_out(orgate, 0,
-+    };
++                              qdev_get_gpio_in(gicdev, intidbase + 19));
 +
-+    return do_long_3d(s, a, opfn[a->size], NULL);
++        create_uart(mms, i, &mms->cpu_sysmem[i], 0xe7c00000,
-+}
++                    qdev_get_gpio_in(gicdev, intidbase + 17), /* tx */
 +                    qdev_get_gpio_in(gicdev, intidbase + 16), /* rx */
 +                    qdev_get_gpio_in(orgate, 0), /* txover */
 +                    qdev_get_gpio_in(orgate, 1), /* rxover */
 +                    qdev_get_gpio_in(gicdev, intidbase + 18) /* combined */);
 +    }
 +    /*
 +     * UARTs 2 to 5 are whole-system; all overflow IRQs are ORed
 +     * together into IRQ 17
 +     */
 +    object_initialize_child(OBJECT(mms), "uart-oflow-orgate",
 +                            &mms->uart_oflow, TYPE_OR_IRQ);
 +    qdev_prop_set_uint32(DEVICE(&mms->uart_oflow), "num-lines",
 +                         MPS3R_UART_MAX * 2);
 +    qdev_realize(DEVICE(&mms->uart_oflow), NULL, &error_fatal);
 +    qdev_connect_gpio_out(DEVICE(&mms->uart_oflow), 0,
 +                          qdev_get_gpio_in(gicdev, 17));
 +
-+static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a)
++    for (int i = 0; i < MPS3R_UART_MAX; i++) {
-+{
++        hwaddr baseaddr = 0xe0205000 + i * 0x1000;
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++        int rxirq = 5 + i * 2, txirq = 6 + i * 2, combirq = 13 + i;
 +        gen_helper_neon_abdl_s16,
 +        gen_helper_neon_abdl_s32,
 +        gen_helper_neon_abdl_s64,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const addfn[] = {
 +        gen_helper_neon_addl_u16,
 +        gen_helper_neon_addl_u32,
 +        tcg_gen_add_i64,
 +        NULL,
 +    };
 +
-+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
++        create_uart(mms, i + MPS3R_CPU_MAX, sysmem, baseaddr,
-+}
++                    qdev_get_gpio_in(gicdev, txirq),
-+
++                    qdev_get_gpio_in(gicdev, rxirq),
-+static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
++                    qdev_get_gpio_in(DEVICE(&mms->uart_oflow), i * 2),
-+{
++                    qdev_get_gpio_in(DEVICE(&mms->uart_oflow), i * 2 + 1),
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++                    qdev_get_gpio_in(gicdev, combirq));
-+        gen_helper_neon_abdl_u16,
++    }
-+        gen_helper_neon_abdl_u32,
-+        gen_helper_neon_abdl_u64,
+     mms->bootinfo.ram_size = machine->ram_size;
-+        NULL,
+     mms->bootinfo.board_id = -1;
 +    };
 +    static NeonGenTwo64OpFn * const addfn[] = {
 +        gen_helper_neon_addl_u16,
 +        gen_helper_neon_addl_u32,
 +        tcg_gen_add_i64,
 +        NULL,
 +    };
 +
 +    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                      {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                      {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
 -                    {0, 0, 0, 0}, /* VABAL */
 +                    {0, 0, 0, 7}, /* VABAL */
                      {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
 -                    {0, 0, 0, 0}, /* VABDL */
 +                    {0, 0, 0, 7}, /* VABDL */
                      {0, 0, 0, 0}, /* VMLAL */
                      {0, 0, 0, 9}, /* VQDMLAL */
                      {0, 0, 0, 0}, /* VMLSL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          tmp2 = neon_load_reg(rm, pass);
                      }
                      switch (op) {
 -                    case 5: case 7: /* VABAL, VABDL */
 -                        switch ((size << 1) | u) {
 -                        case 0:
 -                            gen_helper_neon_abdl_s16(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 1:
 -                            gen_helper_neon_abdl_u16(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 2:
 -                            gen_helper_neon_abdl_s32(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 3:
 -                            gen_helper_neon_abdl_u32(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 4:
 -                            gen_helper_neon_abdl_s64(cpu_V0, tmp, tmp2);
 -                            break;
 -                        case 5:
 -                            gen_helper_neon_abdl_u64(cpu_V0, tmp, tmp2);
 -                            break;
 -                        default: abort();
 -                        }
 -                        tcg_temp_free_i32(tmp2);
 -                        tcg_temp_free_i32(tmp);
 -                        break;
                      case 8: case 9: case 10: case 11: case 12: case 13:
                          /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
                          gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          case 10: /* VMLSL */
                              gen_neon_negl(cpu_V0, size);
                              /* Fall through */
 -                        case 5: case 8: /* VABAL, VMLAL */
 +                        case 8: /* VABAL, VMLAL */
                              gen_neon_addl(size);
                              break;
                          case 9: case 11: /* VQDMLAL, VQDMLSL */
 --
-.20.1
+.34.1

-[PULL 07/23] target/arm: Convert Neon 3-reg-diff polynomial VMULL
+[PULL 33/35] hw/arm/mps3r: Add GPIO, watchdog, dual-timer, I2C devices
-Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
+Add the GPIO, watchdog, dual-timer and I2C devices to the mps3-an536
-insn in this group to be converted.
+board.  These are all simple devices that just need to be created and
 wired up.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206132931.38376-12-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  2 ++
+ hw/arm/mps3r.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
- target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
+file changed, 59 insertions(+)
  target/arm/translate.c          | 60 ++-------------------------------
 files changed, 48 insertions(+), 57 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/neon-dp.decode
++++ b/hw/arm/mps3r.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@
-     VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
+ #include "sysemu/sysemu.h"
+ #include "hw/boards.h"
-     VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
+ #include "hw/or-irq.h"
 +#include "hw/qdev-clock.h"
  #include "hw/qdev-properties.h"
  #include "hw/arm/boot.h"
  #include "hw/arm/bsa.h"
  #include "hw/char/cmsdk-apb-uart.h"
 +#include "hw/i2c/arm_sbcon_i2c.h"
  #include "hw/intc/arm_gicv3.h"
 +#include "hw/misc/unimp.h"
 +#include "hw/timer/cmsdk-apb-dualtimer.h"
 +#include "hw/watchdog/cmsdk-apb-watchdog.h"
  /* Define the layout of RAM and ROM in a board */
  typedef struct RAMInfo {
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
      CMSDKAPBUART uart[MPS3R_CPU_MAX + MPS3R_UART_MAX];
      OrIRQState cpu_uart_oflow[MPS3R_CPU_MAX];
      OrIRQState uart_oflow;
 +    CMSDKAPBWatchdog watchdog;
 +    CMSDKAPBDualTimer dualtimer;
 +    ArmSbconI2CState i2c[5];
 +    Clock *clk;
  };
  #define TYPE_MPS3R_MACHINE "mps3r"
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
      MemoryRegion *sysmem = get_system_memory();
      DeviceState *gicdev;
 +    mms->clk = clock_new(OBJECT(machine), "CLK");
 +    clock_set_hz(mms->clk, CLK_FRQ);
 +
-+    VMULL_P_3d   1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
+     for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
-   ]
+         MemoryRegion *mr = mr_for_raminfo(mms, ri);
- }
+         memory_region_add_subregion(sysmem, ri->base, mr);
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
-index XXXXXXX..XXXXXXX 100644
+                     qdev_get_gpio_in(gicdev, combirq));
---- a/target/arm/translate-neon.inc.c
+     }
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
++    for (int i = 0; i < 4; i++) {
++        /* CMSDK GPIO controllers */
-     return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
++        g_autofree char *s = g_strdup_printf("gpio%d", i);
- }
++        create_unimplemented_device(s, 0xe0000000 + i * 0x1000, 0x1000);
 +
 +static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
 +{
 +    gen_helper_gvec_3 *fn_gvec;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++    object_initialize_child(OBJECT(mms), "watchdog", &mms->watchdog,
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
++                            TYPE_CMSDK_APB_WATCHDOG);
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
++    qdev_connect_clock_in(DEVICE(&mms->watchdog), "WDOGCLK", mms->clk);
-+        return false;
++    sysbus_realize(SYS_BUS_DEVICE(&mms->watchdog), &error_fatal);
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->watchdog), 0,
 +                       qdev_get_gpio_in(gicdev, 0));
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->watchdog), 0, 0xe0100000);
 +
 +    object_initialize_child(OBJECT(mms), "dualtimer", &mms->dualtimer,
 +                            TYPE_CMSDK_APB_DUALTIMER);
 +    qdev_connect_clock_in(DEVICE(&mms->dualtimer), "TIMCLK", mms->clk);
 +    sysbus_realize(SYS_BUS_DEVICE(&mms->dualtimer), &error_fatal);
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 0,
 +                       qdev_get_gpio_in(gicdev, 3));
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 1,
 +                       qdev_get_gpio_in(gicdev, 1));
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 2,
 +                       qdev_get_gpio_in(gicdev, 2));
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->dualtimer), 0, 0xe0101000);
 +
 +    for (int i = 0; i < ARRAY_SIZE(mms->i2c); i++) {
 +        static const hwaddr i2cbase[] = {0xe0102000,    /* Touch */
 +                                         0xe0103000,    /* Audio */
 +                                         0xe0107000,    /* Shield0 */
 +                                         0xe0108000,    /* Shield1 */
 +                                         0xe0109000};   /* DDR4 EEPROM */
 +        g_autofree char *s = g_strdup_printf("i2c%d", i);
 +
 +        object_initialize_child(OBJECT(mms), s, &mms->i2c[i],
 +                                TYPE_ARM_SBCON_I2C);
 +        sysbus_realize(SYS_BUS_DEVICE(&mms->i2c[i]), &error_fatal);
 +        sysbus_mmio_map(SYS_BUS_DEVICE(&mms->i2c[i]), 0, i2cbase[i]);
 +        if (i != 2 && i != 3) {
 +            /*
 +             * internal-only bus: mark it full to avoid user-created
 +             * i2c devices being plugged into it.
 +             */
 +            qbus_mark_full(qdev_get_child_bus(DEVICE(&mms->i2c[i]), "i2c"));
 +        }
 +    }
 +
-+    if (a->vd & 1) {
+     mms->bootinfo.ram_size = machine->ram_size;
-+        return false;
+     mms->bootinfo.board_id = -1;
-+    }
+     mms->bootinfo.loader_start = mmc->loader_start;
 +
 +    switch (a->size) {
 +    case 0:
 +        fn_gvec = gen_helper_neon_pmull_h;
 +        break;
 +    case 2:
 +        if (!dc_isar_feature(aa32_pmull, s)) {
 +            return false;
 +        }
 +        fn_gvec = gen_helper_gvec_pmull_q;
 +        break;
 +    default:
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
 +                       neon_reg_offset(a->vn, 0),
 +                       neon_reg_offset(a->vm, 0),
 +                       16, 16, 0, fn_gvec);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
  {
      int op;
      int q;
 -    int rd, rn, rm, rd_ofs, rn_ofs, rm_ofs;
 +    int rd, rn, rm, rd_ofs, rm_ofs;
      int size;
      int pass;
      int u;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
      size = (insn >> 20) & 3;
      vec_size = q ? 16 : 8;
      rd_ofs = neon_reg_offset(rd, 0);
 -    rn_ofs = neon_reg_offset(rn, 0);
      rm_ofs = neon_reg_offset(rm, 0);
      if ((insn & (1 << 23)) == 0) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          if (size != 3) {
              op = (insn >> 8) & 0xf;
              if ((insn & (1 << 6)) == 0) {
 -                /* Three registers of different lengths.  */
 -                /* undefreq: bit 0 : UNDEF if size == 0
 -                 *           bit 1 : UNDEF if size == 1
 -                 *           bit 2 : UNDEF if size == 2
 -                 *           bit 3 : UNDEF if U == 1
 -                 * Note that [2:0] set implies 'always UNDEF'
 -                 */
 -                int undefreq;
 -                /* prewiden, src1_wide, src2_wide, undefreq */
 -                static const int neon_3reg_wide[16][4] = {
 -                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VABAL */
 -                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
 -                    {0, 0, 0, 7}, /* VABDL */
 -                    {0, 0, 0, 7}, /* VMLAL */
 -                    {0, 0, 0, 7}, /* VQDMLAL */
 -                    {0, 0, 0, 7}, /* VMLSL */
 -                    {0, 0, 0, 7}, /* VQDMLSL */
 -                    {0, 0, 0, 7}, /* Integer VMULL */
 -                    {0, 0, 0, 7}, /* VQDMULL */
 -                    {0, 0, 0, 0xa}, /* Polynomial VMULL */
 -                    {0, 0, 0, 7}, /* Reserved: always UNDEF */
 -                };
 -
 -                undefreq = neon_3reg_wide[op][3];
 -
 -                if ((undefreq & (1 << size)) ||
 -                    ((undefreq & 8) && u)) {
 -                    return 1;
 -                }
 -                if (rd & 1) {
 -                    return 1;
 -                }
 -
 -                /* Handle polynomial VMULL in a single pass.  */
 -                if (op == 14) {
 -                    if (size == 0) {
 -                        /* VMULL.P8 */
 -                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
 -                                           0, gen_helper_neon_pmull_h);
 -                    } else {
 -                        /* VMULL.P64 */
 -                        if (!dc_isar_feature(aa32_pmull, s)) {
 -                            return 1;
 -                        }
 -                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
 -                                           0, gen_helper_gvec_pmull_q);
 -                    }
 -                    return 0;
 -                }
 -                abort(); /* all others handled by decodetree */
 +                /* Three registers of different lengths: handled by decodetree */
 +                return 1;
              } else {
                  /* Two registers and a scalar. NB that for ops of this form
                   * the ARM ARM labels bit 24 as Q, but it is in our variable
 --
-.20.1
+.34.1

-[PULL 02/23] target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
+[PULL 34/35] hw/arm/mps3r: Add remaining devices
-Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
+Add the remaining devices (or unimplemented-device stubs) for
-in the Neon 3-registers-different-lengths group to decodetree.
+this board: SPI controllers, SCC, FPGAIO, I2S, RTC, the
-These insns work by widening one or both inputs to double their
+QSPI write-config block, and ethernet.
 size, performing an add or subtract at the doubled size and
 then storing the double-size result.
 As usual, rather than copying the loop of the original decoder
 (which needs awkward code to avoid problems when source and
 destination registers overlap) we just unroll the two passes.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206132931.38376-13-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  43 +++++++++++++
+ hw/arm/mps3r.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++
- target/arm/translate-neon.inc.c | 104 ++++++++++++++++++++++++++++++++
+file changed, 74 insertions(+)
  target/arm/translate.c          |  16 ++---
 files changed, 151 insertions(+), 12 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/neon-dp.decode
++++ b/hw/arm/mps3r.c
-@@ -XXX,XX +XXX,XX @@ VCVT_FU_2sh      1111 001 1 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
+@@ -XXX,XX +XXX,XX @@
- # So we have a single decode line and check the cmode/op in the
+ #include "hw/char/cmsdk-apb-uart.h"
- # trans function.
+ #include "hw/i2c/arm_sbcon_i2c.h"
- Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+ #include "hw/intc/arm_gicv3.h"
 +#include "hw/misc/mps2-scc.h"
 +#include "hw/misc/mps2-fpgaio.h"
  #include "hw/misc/unimp.h"
 +#include "hw/net/lan9118.h"
 +#include "hw/rtc/pl031.h"
 +#include "hw/ssi/pl022.h"
  #include "hw/timer/cmsdk-apb-dualtimer.h"
  #include "hw/watchdog/cmsdk-apb-watchdog.h"
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
      CMSDKAPBWatchdog watchdog;
      CMSDKAPBDualTimer dualtimer;
      ArmSbconI2CState i2c[5];
 +    PL022State spi[3];
 +    MPS2SCC scc;
 +    MPS2FPGAIO fpgaio;
 +    UnimplementedDeviceState i2s_audio;
 +    PL031State rtc;
      Clock *clk;
  };
@@ -XXX,XX +XXX,XX @@ static const RAMInfo an536_raminfo[] = {
      }
  };
 +static const int an536_oscclk[] = {
 +    24000000, /* 24MHz reference for RTC and timers */
 +    50000000, /* 50MHz ACLK */
 +    50000000, /* 50MHz MCLK */
 +    50000000, /* 50MHz GPUCLK */
 +    24576000, /* 24.576MHz AUDCLK */
 +    23750000, /* 23.75MHz HDLCDCLK */
 +    100000000, /* 100MHz DDR4_REF_CLK */
 +};
 +
-+######################################################################
+ static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
-+# Within the "two registers, or three registers of different lengths"
+                                     const RAMInfo *raminfo)
-+# grouping ([23,4]=0b10), bits [21:20] are either part of the opcode
+ {
-+# decode: 0b11 for VEXT, two-reg-misc, VTBL, and duplicate-scalar;
+@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
-+# or they are a size field for the three-reg-different-lengths and
+     MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
-+# two-reg-and-scalar insn groups (where size cannot be 0b11). This
+     MemoryRegion *sysmem = get_system_memory();
-+# is slightly awkward for decodetree: we handle it with this
+     DeviceState *gicdev;
-+# non-exclusive group which contains within it two exclusive groups:
++    QList *oscclk;
-+# one for the size=0b11 patterns, and one for the size-not-0b11
-+# patterns. This allows us to check that none of the insns within
+     mms->clk = clock_new(OBJECT(machine), "CLK");
-+# each subgroup accidentally overlap each other. Note that all the
+     clock_set_hz(mms->clk, CLK_FRQ);
-+# trans functions for the size-not-0b11 patterns must check and
+@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
-+# return false for size==3.
+         }
-+######################################################################
+     }
-+{
-+  # 0b11 subgroup will go here
++    for (int i = 0; i < ARRAY_SIZE(mms->spi); i++) {
 +        g_autofree char *s = g_strdup_printf("spi%d", i);
 +        hwaddr baseaddr = 0xe0104000 + i * 0x1000;
 +
-+  # Subgroup for size != 0b11
++        object_initialize_child(OBJECT(mms), s, &mms->spi[i], TYPE_PL022);
-+  [
++        sysbus_realize(SYS_BUS_DEVICE(&mms->spi[i]), &error_fatal);
-+    ##################################################################
++        sysbus_mmio_map(SYS_BUS_DEVICE(&mms->spi[i]), 0, baseaddr);
-+    # 3-reg-different-length grouping:
++        sysbus_connect_irq(SYS_BUS_DEVICE(&mms->spi[i]), 0,
-+    # 1111 001 U 1 D sz!=11 Vn:4 Vd:4 opc:4 N 0 M 0 Vm:4
++                           qdev_get_gpio_in(gicdev, 22 + i));
 +    ##################################################################
 +
 +    &3diff vm vn vd size
 +
 +    @3diff       .... ... . . . size:2 .... .... .... . . . . .... \
 +                 &3diff vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +    VADDL_S_3d   1111 001 0 1 . .. .... .... 0000 . 0 . 0 .... @3diff
 +    VADDL_U_3d   1111 001 1 1 . .. .... .... 0000 . 0 . 0 .... @3diff
 +
 +    VADDW_S_3d   1111 001 0 1 . .. .... .... 0001 . 0 . 0 .... @3diff
 +    VADDW_U_3d   1111 001 1 1 . .. .... .... 0001 . 0 . 0 .... @3diff
 +
 +    VSUBL_S_3d   1111 001 0 1 . .. .... .... 0010 . 0 . 0 .... @3diff
 +    VSUBL_U_3d   1111 001 1 1 . .. .... .... 0010 . 0 . 0 .... @3diff
 +
 +    VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
 +    VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
 +  ]
 +}
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
      }
      return do_1reg_imm(s, a, fn);
  }
 +
 +static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
 +                           NeonGenWidenFn *widenfn,
 +                           NeonGenTwo64OpFn *opfn,
 +                           bool src1_wide)
 +{
 +    /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
 +    TCGv_i64 rn0_64, rn1_64, rm_64;
 +    TCGv_i32 rm;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++    object_initialize_child(OBJECT(mms), "scc", &mms->scc, TYPE_MPS2_SCC);
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
++    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-cfg0", 0);
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
++    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-cfg4", 0x2);
-+        return false;
++    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-aid", 0x00200008);
 +    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-id", 0x41055360);
 +    oscclk = qlist_new();
 +    for (int i = 0; i < ARRAY_SIZE(an536_oscclk); i++) {
 +        qlist_append_int(oscclk, an536_oscclk[i]);
 +    }
++    qdev_prop_set_array(DEVICE(&mms->scc), "oscclk", oscclk);
++    sysbus_realize(SYS_BUS_DEVICE(&mms->scc), &error_fatal);
++    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->scc), 0, 0xe0200000);
 +
-+    if (!widenfn || !opfn) {
++    create_unimplemented_device("i2s-audio", 0xe0201000, 0x1000);
 +        /* size == 3 case, which is an entirely different insn group */
 +        return false;
 +    }
 +
-+    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
++    object_initialize_child(OBJECT(mms), "fpgaio", &mms->fpgaio,
-+        return false;
++                            TYPE_MPS2_FPGAIO);
-+    }
++    qdev_prop_set_uint32(DEVICE(&mms->fpgaio), "prescale-clk", an536_oscclk[1]);
 +    qdev_prop_set_uint32(DEVICE(&mms->fpgaio), "num-leds", 10);
 +    qdev_prop_set_bit(DEVICE(&mms->fpgaio), "has-switches", true);
 +    qdev_prop_set_bit(DEVICE(&mms->fpgaio), "has-dbgctrl", false);
 +    sysbus_realize(SYS_BUS_DEVICE(&mms->fpgaio), &error_fatal);
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->fpgaio), 0, 0xe0202000);
 +
-+    if (!vfp_access_check(s)) {
++    create_unimplemented_device("clcd", 0xe0209000, 0x1000);
 +        return true;
 +    }
 +
-+    rn0_64 = tcg_temp_new_i64();
++    object_initialize_child(OBJECT(mms), "rtc", &mms->rtc, TYPE_PL031);
-+    rn1_64 = tcg_temp_new_i64();
++    sysbus_realize(SYS_BUS_DEVICE(&mms->rtc), &error_fatal);
-+    rm_64 = tcg_temp_new_i64();
++    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->rtc), 0, 0xe020a000);
-+
++    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->rtc), 0,
-+    if (src1_wide) {
++                       qdev_get_gpio_in(gicdev, 4));
 +        neon_load_reg64(rn0_64, a->vn);
 +    } else {
 +        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
 +        widenfn(rn0_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
 +    rm = neon_load_reg(a->vm, 0);
 +
 +    widenfn(rm_64, rm);
 +    tcg_temp_free_i32(rm);
 +    opfn(rn0_64, rn0_64, rm_64);
 +
 +    /*
-+     * Load second pass inputs before storing the first pass result, to
++     * In hardware this is a LAN9220; the LAN9118 is software compatible
-+     * avoid incorrect results if a narrow input overlaps with the result.
++     * except that it doesn't support the checksum-offload feature.
 +     */
-+    if (src1_wide) {
++    lan9118_init(0xe0300000,
-+        neon_load_reg64(rn1_64, a->vn + 1);
++                 qdev_get_gpio_in(gicdev, 18));
 +    } else {
 +        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
 +        widenfn(rn1_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
 +    rm = neon_load_reg(a->vm, 1);
 +
-+    neon_store_reg64(rn0_64, a->vd);
++    create_unimplemented_device("usb", 0xe0301000, 0x1000);
 +    create_unimplemented_device("qspi-write-config", 0xe0600000, 0x1000);
 +
-+    widenfn(rm_64, rm);
+     mms->bootinfo.ram_size = machine->ram_size;
-+    tcg_temp_free_i32(rm);
+     mms->bootinfo.board_id = -1;
-+    opfn(rn1_64, rn1_64, rm_64);
+     mms->bootinfo.loader_start = mmc->loader_start;
 +    neon_store_reg64(rn1_64, a->vd + 1);
 +
 +    tcg_temp_free_i64(rn0_64);
 +    tcg_temp_free_i64(rn1_64);
 +    tcg_temp_free_i64(rm_64);
 +
 +    return true;
 +}
 +
 +#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
 +    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
 +    {                                                                   \
 +        static NeonGenWidenFn * const widenfn[] = {                     \
 +            gen_helper_neon_widen_##S##8,                               \
 +            gen_helper_neon_widen_##S##16,                              \
 +            tcg_gen_##EXT##_i32_i64,                                    \
 +            NULL,                                                       \
 +        };                                                              \
 +        static NeonGenTwo64OpFn * const addfn[] = {                     \
 +            gen_helper_neon_##OP##l_u16,                                \
 +            gen_helper_neon_##OP##l_u32,                                \
 +            tcg_gen_##OP##_i64,                                         \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_prewiden_3d(s, a, widenfn[a->size],                   \
 +                              addfn[a->size], SRC1WIDE);                \
 +    }
 +
 +DO_PREWIDEN(VADDL_S, s, ext, add, false)
 +DO_PREWIDEN(VADDL_U, u, extu, add, false)
 +DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
 +DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
 +DO_PREWIDEN(VADDW_S, s, ext, add, true)
 +DO_PREWIDEN(VADDW_U, u, extu, add, true)
 +DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
 +DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  /* Three registers of different lengths.  */
                  int src1_wide;
                  int src2_wide;
 -                int prewiden;
                  /* undefreq: bit 0 : UNDEF if size == 0
                   *           bit 1 : UNDEF if size == 1
                   *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  int undefreq;
                  /* prewiden, src1_wide, src2_wide, undefreq */
                  static const int neon_3reg_wide[16][4] = {
 -                    {1, 0, 0, 0}, /* VADDL */
 -                    {1, 1, 0, 0}, /* VADDW */
 -                    {1, 0, 0, 0}, /* VSUBL */
 -                    {1, 1, 0, 0}, /* VSUBW */
 +                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
 +                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
 +                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
 +                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                      {0, 1, 1, 0}, /* VADDHN */
                      {0, 0, 0, 0}, /* VABAL */
                      {0, 1, 1, 0}, /* VSUBHN */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* Reserved: always UNDEF */
                  };
 -                prewiden = neon_3reg_wide[op][0];
                  src1_wide = neon_3reg_wide[op][1];
                  src2_wide = neon_3reg_wide[op][2];
                  undefreq = neon_3reg_wide[op][3];
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          } else {
                              tmp = neon_load_reg(rn, pass);
                          }
 -                        if (prewiden) {
 -                            gen_neon_widen(cpu_V0, tmp, size, u);
 -                        }
                      }
                      if (src2_wide) {
                          neon_load_reg64(cpu_V1, rm + pass);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          } else {
                              tmp2 = neon_load_reg(rm, pass);
                          }
 -                        if (prewiden) {
 -                            gen_neon_widen(cpu_V1, tmp2, size, u);
 -                        }
                      }
                      switch (op) {
                      case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
 --
-.20.1
+.34.1

-[PULL 03/23] target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
+[PULL 35/35] docs: Add documentation for the mps3-an536 board
-Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
+Add documentation for the mps3-an536 board type.
 VRSUBHN in the Neon 3-registers-different-lengths group to
 decodetree.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206132931.38376-14-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  6 +++
+ docs/system/arm/mps2.rst | 37 ++++++++++++++++++++++++++++++++++---
- target/arm/translate-neon.inc.c | 87 +++++++++++++++++++++++++++++++
+file changed, 34 insertions(+), 3 deletions(-)
  target/arm/translate.c          | 91 ++++-----------------------------
 files changed, 104 insertions(+), 80 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/docs/system/arm/mps2.rst b/docs/system/arm/mps2.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/docs/system/arm/mps2.rst
-+++ b/target/arm/neon-dp.decode
++++ b/docs/system/arm/mps2.rst
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@
+-Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``, ``mps3-an547``)
-     VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+-=========================================================================================================================================================
-     VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
++Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``, ``mps3-an536``, ``mps3-an547``)
 +=========================================================================================================================================================================
 -These board models all use Arm M-profile CPUs.
 +These board models use Arm M-profile or R-profile CPUs.
  The Arm MPS2, MPS2+ and MPS3 dev boards are FPGA based (the 2+ has a
  bigger FPGA but is otherwise the same as the 2; the 3 has a bigger
@@ -XXX,XX +XXX,XX @@ FPGA image.
  QEMU models the following FPGA images:
 +FPGA images using M-profile CPUs:
 +
-+    VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+ ``mps2-an385``
-+    VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+   Cortex-M3 as documented in Arm Application Note AN385
  ``mps2-an386``
@@ -XXX,XX +XXX,XX @@ QEMU models the following FPGA images:
  ``mps3-an547``
    Cortex-M55 on an MPS3, as documented in Arm Application Note AN547
 +FPGA images using R-profile CPUs:
 +
-+    VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
++``mps3-an536``
-+    VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
++  Dual Cortex-R52 on an MPS3, as documented in Arm Application Note AN536
    ]
  }
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_PREWIDEN(VADDW_S, s, ext, add, true)
  DO_PREWIDEN(VADDW_U, u, extu, add, true)
  DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
  DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
 +
-+static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
+ Differences between QEMU and real hardware:
-+                         NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
-+{
+ - AN385/AN386 remapping of low 16K of memory to either ZBT SSRAM1 or to
-+    /* 3-regs different lengths, narrowing (VADDHN/VSUBHN/VRADDHN/VRSUBHN) */
+@@ -XXX,XX +XXX,XX @@ Differences between QEMU and real hardware:
-+    TCGv_i64 rn_64, rm_64;
+   flash, but only as simple ROM, so attempting to rewrite the flash
-+    TCGv_i32 rd0, rd1;
+   from the guest will fail
  - QEMU does not model the USB controller in MPS3 boards
 +- AN536 does not support runtime control of CPU reset and halt via
 +  the SCC CFG_REG0 register.
 +- AN536 does not support enabling or disabling the flash and ATCM
 +  interfaces via the SCC CFG_REG1 register.
 +- AN536 does not support setting of the initial vector table
 +  base address via the SCC CFG_REG6 and CFG_REG7 register config,
 +  and does not provide a mechanism for specifying these values at
 +  startup, so all guest images must be built to start from TCM
 +  (i.e. to expect the interrupt vector base at 0 from reset).
 +- AN536 defaults to only creating a single CPU; this is the equivalent
 +  of the way the real FPGA image usually runs with the second Cortex-R52
 +  held in halt via the initial SCC CFG_REG0 register setting. You can
 +  create the second CPU with ``-smp 2``; both CPUs will then start
 +  execution immediately on startup.
 +
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
++Note that for the AN536 the first UART is accessible only by
-+        return false;
++CPU0, and the second UART is accessible only by CPU1. The
-+    }
++first UART accessible shared between both CPUs is the third
-+
++UART. Guest software might therefore be built to use either
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++the first UART or the third UART; if you don't see any output
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
++from the UART you are looking at, try one of the others.
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
++(Even if the AN536 machine is started with a single CPU and so
-+        return false;
++no "CPU1-only UART", the UART numbering remains the same,
-+    }
++with the third UART being the first of the shared ones.)
-+
-+    if (!opfn || !narrowfn) {
+ Machine-specific options
-+        /* size == 3 case, which is an entirely different insn group */
+ """"""""""""""""""""""""
 +        return false;
 +    }
 +
 +    if ((a->vn | a->vm) & 1) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    rn_64 = tcg_temp_new_i64();
 +    rm_64 = tcg_temp_new_i64();
 +    rd0 = tcg_temp_new_i32();
 +    rd1 = tcg_temp_new_i32();
 +
 +    neon_load_reg64(rn_64, a->vn);
 +    neon_load_reg64(rm_64, a->vm);
 +
 +    opfn(rn_64, rn_64, rm_64);
 +
 +    narrowfn(rd0, rn_64);
 +
 +    neon_load_reg64(rn_64, a->vn + 1);
 +    neon_load_reg64(rm_64, a->vm + 1);
 +
 +    opfn(rn_64, rn_64, rm_64);
 +
 +    narrowfn(rd1, rn_64);
 +
 +    neon_store_reg(a->vd, 0, rd0);
 +    neon_store_reg(a->vd, 1, rd1);
 +
 +    tcg_temp_free_i64(rn_64);
 +    tcg_temp_free_i64(rm_64);
 +
 +    return true;
 +}
 +
 +#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP)                       \
 +    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
 +    {                                                                   \
 +        static NeonGenTwo64OpFn * const addfn[] = {                     \
 +            gen_helper_neon_##OP##l_u16,                                \
 +            gen_helper_neon_##OP##l_u32,                                \
 +            tcg_gen_##OP##_i64,                                         \
 +            NULL,                                                       \
 +        };                                                              \
 +        static NeonGenNarrowFn * const narrowfn[] = {                   \
 +            gen_helper_neon_##NARROWTYPE##_high_u8,                     \
 +            gen_helper_neon_##NARROWTYPE##_high_u16,                    \
 +            EXTOP,                                                      \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]);   \
 +    }
 +
 +static void gen_narrow_round_high_u32(TCGv_i32 rd, TCGv_i64 rn)
 +{
 +    tcg_gen_addi_i64(rn, rn, 1u << 31);
 +    tcg_gen_extrh_i64_i32(rd, rn);
 +}
 +
 +DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
 +DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
 +DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
 +DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
      }
  }
 -static inline void gen_neon_subl(int size)
 -{
 -    switch (size) {
 -    case 0: gen_helper_neon_subl_u16(CPU_V001); break;
 -    case 1: gen_helper_neon_subl_u32(CPU_V001); break;
 -    case 2: tcg_gen_sub_i64(CPU_V001); break;
 -    default: abort();
 -    }
 -}
 -
  static inline void gen_neon_negl(TCGv_i64 var, int size)
  {
      switch (size) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
              op = (insn >> 8) & 0xf;
              if ((insn & (1 << 6)) == 0) {
                  /* Three registers of different lengths.  */
 -                int src1_wide;
 -                int src2_wide;
                  /* undefreq: bit 0 : UNDEF if size == 0
                   *           bit 1 : UNDEF if size == 1
                   *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* VADDW: handled by decodetree */
                      {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                      {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
 -                    {0, 1, 1, 0}, /* VADDHN */
 +                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
                      {0, 0, 0, 0}, /* VABAL */
 -                    {0, 1, 1, 0}, /* VSUBHN */
 +                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                      {0, 0, 0, 0}, /* VABDL */
                      {0, 0, 0, 0}, /* VMLAL */
                      {0, 0, 0, 9}, /* VQDMLAL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {0, 0, 0, 7}, /* Reserved: always UNDEF */
                  };
 -                src1_wide = neon_3reg_wide[op][1];
 -                src2_wide = neon_3reg_wide[op][2];
                  undefreq = neon_3reg_wide[op][3];
                  if ((undefreq & (1 << size)) ||
                      ((undefreq & 8) && u)) {
                      return 1;
                  }
 -                if ((src1_wide && (rn & 1)) ||
 -                    (src2_wide && (rm & 1)) ||
 -                    (!src2_wide && (rd & 1))) {
 +                if (rd & 1) {
                      return 1;
                  }
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  /* Avoid overlapping operands.  Wide source operands are
                     always aligned so will never overlap with wide
                     destinations in problematic ways.  */
 -                if (rd == rm && !src2_wide) {
 +                if (rd == rm) {
                      tmp = neon_load_reg(rm, 1);
                      neon_store_scratch(2, tmp);
 -                } else if (rd == rn && !src1_wide) {
 +                } else if (rd == rn) {
                      tmp = neon_load_reg(rn, 1);
                      neon_store_scratch(2, tmp);
                  }
                  tmp3 = NULL;
                  for (pass = 0; pass < 2; pass++) {
 -                    if (src1_wide) {
 -                        neon_load_reg64(cpu_V0, rn + pass);
 -                        tmp = NULL;
 +                    if (pass == 1 && rd == rn) {
 +                        tmp = neon_load_scratch(2);
                      } else {
 -                        if (pass == 1 && rd == rn) {
 -                            tmp = neon_load_scratch(2);
 -                        } else {
 -                            tmp = neon_load_reg(rn, pass);
 -                        }
 +                        tmp = neon_load_reg(rn, pass);
                      }
 -                    if (src2_wide) {
 -                        neon_load_reg64(cpu_V1, rm + pass);
 -                        tmp2 = NULL;
 +                    if (pass == 1 && rd == rm) {
 +                        tmp2 = neon_load_scratch(2);
                      } else {
 -                        if (pass == 1 && rd == rm) {
 -                            tmp2 = neon_load_scratch(2);
 -                        } else {
 -                            tmp2 = neon_load_reg(rm, pass);
 -                        }
 +                        tmp2 = neon_load_reg(rm, pass);
                      }
                      switch (op) {
 -                    case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
 -                        gen_neon_addl(size);
 -                        break;
 -                    case 2: case 3: case 6: /* VSUBL, VSUBW, VSUBHN, VRSUBHN */
 -                        gen_neon_subl(size);
 -                        break;
                      case 5: case 7: /* VABAL, VABDL */
                          switch ((size << 1) | u) {
                          case 0:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              abort();
                          }
                          neon_store_reg64(cpu_V0, rd + pass);
 -                    } else if (op == 4 || op == 6) {
 -                        /* Narrowing operation.  */
 -                        tmp = tcg_temp_new_i32();
 -                        if (!u) {
 -                            switch (size) {
 -                            case 0:
 -                                gen_helper_neon_narrow_high_u8(tmp, cpu_V0);
 -                                break;
 -                            case 1:
 -                                gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
 -                                break;
 -                            case 2:
 -                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
 -                                break;
 -                            default: abort();
 -                            }
 -                        } else {
 -                            switch (size) {
 -                            case 0:
 -                                gen_helper_neon_narrow_round_high_u8(tmp, cpu_V0);
 -                                break;
 -                            case 1:
 -                                gen_helper_neon_narrow_round_high_u16(tmp, cpu_V0);
 -                                break;
 -                            case 2:
 -                                tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
 -                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
 -                                break;
 -                            default: abort();
 -                            }
 -                        }
 -                        if (pass == 0) {
 -                            tmp3 = tmp;
 -                        } else {
 -                            neon_store_reg(rd, 0, tmp3);
 -                            neon_store_reg(rd, 1, tmp);
 -                        }
                      } else {
                          /* Write back the result.  */
                          neon_store_reg64(cpu_V0, rd + pass);
 --
-.20.1
+.34.1

Mostly my decodetree stuff, but also some patches for various
smaller bugs/features from others.

thanks
-- PMM

The following changes since commit 53550e81e2cafe7c03a39526b95cd21b5194d9b1:

Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2020-06-15 16:36:34 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200616

for you to fetch changes up to 64b397417a26509bcdff44ab94356a35c7901c79:

hw: arm: Set vendor property for IMX SDHCI emulations (2020-06-16 10:32:29 +0100)

----------------------------------------------------------------
 * hw: arm: Set vendor property for IMX SDHCI emulations
 * sd: sdhci: Implement basic vendor specific register support
 * hw/net/imx_fec: Convert debug fprintf() to trace events
 * target/arm/cpu: adjust virtual time for all KVM arm cpus
 * Implement configurable descriptor size in ftgmac100
 * hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
 * target/arm: More Neon decodetree conversion work

----------------------------------------------------------------
Erik Smit (1):
      Implement configurable descriptor size in ftgmac100

Guenter Roeck (2):
      sd: sdhci: Implement basic vendor specific register support
      hw: arm: Set vendor property for IMX SDHCI emulations

Jean-Christophe Dubois (2):
      hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
      hw/net/imx_fec: Convert debug fprintf() to trace events

Peter Maydell (17):
      target/arm: Fix missing temp frees in do_vshll_2sh
      target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
      target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
      target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
      target/arm: Convert Neon 3-reg-diff long multiplies
      target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
      target/arm: Convert Neon 3-reg-diff polynomial VMULL
      target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
      target/arm: Add missing TCG temp free in do_2shift_env_64()
      target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
      target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
      target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
      target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
      target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
      target/arm: Convert Neon VEXT to decodetree
      target/arm: Convert Neon VTBL, VTBX to decodetree
      target/arm: Convert Neon VDUP (scalar) to decodetree

fangying (1):
      target/arm/cpu: adjust virtual time for all KVM arm cpus

The widenfn() in do_vshll_2sh() does not free the input 32-bit
TCGv, so we need to do this in the calling code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 target/arm/translate-neon.inc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
     tmp = tcg_temp_new_i64();
 
     widenfn(tmp, rm0);
+    tcg_temp_free_i32(rm0);
     if (a->shift != 0) {
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
     neon_store_reg64(tmp, a->vd);
 
     widenfn(tmp, rm1);
+    tcg_temp_free_i32(rm1);
     if (a->shift != 0) {
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
-- 
2.20.1

Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
in the Neon 3-registers-different-lengths group to decodetree.
These insns work by widening one or both inputs to double their
size, performing an add or subtract at the doubled size and
then storing the double-size result.

As usual, rather than copying the loop of the original decoder
(which needs awkward code to avoid problems when source and
destination registers overlap) we just unroll the two passes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  43 +++++++++++++
 target/arm/translate-neon.inc.c | 104 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  16 ++---
 3 files changed, 151 insertions(+), 12 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_FU_2sh      1111 001 1 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
 # So we have a single decode line and check the cmode/op in the
 # trans function.
 Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+
+######################################################################
+# Within the "two registers, or three registers of different lengths"
+# grouping ([23,4]=0b10), bits [21:20] are either part of the opcode
+# decode: 0b11 for VEXT, two-reg-misc, VTBL, and duplicate-scalar;
+# or they are a size field for the three-reg-different-lengths and
+# two-reg-and-scalar insn groups (where size cannot be 0b11). This
+# is slightly awkward for decodetree: we handle it with this
+# non-exclusive group which contains within it two exclusive groups:
+# one for the size=0b11 patterns, and one for the size-not-0b11
+# patterns. This allows us to check that none of the insns within
+# each subgroup accidentally overlap each other. Note that all the
+# trans functions for the size-not-0b11 patterns must check and
+# return false for size==3.
+######################################################################
+{
+  # 0b11 subgroup will go here
+
+  # Subgroup for size != 0b11
+  [
+    ##################################################################
+    # 3-reg-different-length grouping:
+    # 1111 001 U 1 D sz!=11 Vn:4 Vd:4 opc:4 N 0 M 0 Vm:4
+    ##################################################################
+
+    &3diff vm vn vd size
+
+    @3diff       .... ... . . . size:2 .... .... .... . . . . .... \
+                 &3diff vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VADDL_S_3d   1111 001 0 1 . .. .... .... 0000 . 0 . 0 .... @3diff
+    VADDL_U_3d   1111 001 1 1 . .. .... .... 0000 . 0 . 0 .... @3diff
+
+    VADDW_S_3d   1111 001 0 1 . .. .... .... 0001 . 0 . 0 .... @3diff
+    VADDW_U_3d   1111 001 1 1 . .. .... .... 0001 . 0 . 0 .... @3diff
+
+    VSUBL_S_3d   1111 001 0 1 . .. .... .... 0010 . 0 . 0 .... @3diff
+    VSUBL_U_3d   1111 001 1 1 . .. .... .... 0010 . 0 . 0 .... @3diff
+
+    VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+    VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+  ]
+}
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
     }
     return do_1reg_imm(s, a, fn);
 }
+
+static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+                           NeonGenWidenFn *widenfn,
+                           NeonGenTwo64OpFn *opfn,
+                           bool src1_wide)
+{
+    /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
+    TCGv_i64 rn0_64, rn1_64, rm_64;
+    TCGv_i32 rm;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!widenfn || !opfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rn0_64 = tcg_temp_new_i64();
+    rn1_64 = tcg_temp_new_i64();
+    rm_64 = tcg_temp_new_i64();
+
+    if (src1_wide) {
+        neon_load_reg64(rn0_64, a->vn);
+    } else {
+        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
+        widenfn(rn0_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
+    rm = neon_load_reg(a->vm, 0);
+
+    widenfn(rm_64, rm);
+    tcg_temp_free_i32(rm);
+    opfn(rn0_64, rn0_64, rm_64);
+
+    /*
+     * Load second pass inputs before storing the first pass result, to
+     * avoid incorrect results if a narrow input overlaps with the result.
+     */
+    if (src1_wide) {
+        neon_load_reg64(rn1_64, a->vn + 1);
+    } else {
+        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
+        widenfn(rn1_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
+    rm = neon_load_reg(a->vm, 1);
+
+    neon_store_reg64(rn0_64, a->vd);
+
+    widenfn(rm_64, rm);
+    tcg_temp_free_i32(rm);
+    opfn(rn1_64, rn1_64, rm_64);
+    neon_store_reg64(rn1_64, a->vd + 1);
+
+    tcg_temp_free_i64(rn0_64);
+    tcg_temp_free_i64(rn1_64);
+    tcg_temp_free_i64(rm_64);
+
+    return true;
+}
+
+#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+    {                                                                   \
+        static NeonGenWidenFn * const widenfn[] = {                     \
+            gen_helper_neon_widen_##S##8,                               \
+            gen_helper_neon_widen_##S##16,                              \
+            tcg_gen_##EXT##_i32_i64,                                    \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenTwo64OpFn * const addfn[] = {                     \
+            gen_helper_neon_##OP##l_u16,                                \
+            gen_helper_neon_##OP##l_u32,                                \
+            tcg_gen_##OP##_i64,                                         \
+            NULL,                                                       \
+        };                                                              \
+        return do_prewiden_3d(s, a, widenfn[a->size],                   \
+                              addfn[a->size], SRC1WIDE);                \
+    }
+
+DO_PREWIDEN(VADDL_S, s, ext, add, false)
+DO_PREWIDEN(VADDL_U, u, extu, add, false)
+DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
+DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
+DO_PREWIDEN(VADDW_S, s, ext, add, true)
+DO_PREWIDEN(VADDW_U, u, extu, add, true)
+DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
+DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 /* Three registers of different lengths.  */
                 int src1_wide;
                 int src2_wide;
-                int prewiden;
                 /* undefreq: bit 0 : UNDEF if size == 0
                  *           bit 1 : UNDEF if size == 1
                  *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 int undefreq;
                 /* prewiden, src1_wide, src2_wide, undefreq */
                 static const int neon_3reg_wide[16][4] = {
-                    {1, 0, 0, 0}, /* VADDL */
-                    {1, 1, 0, 0}, /* VADDW */
-                    {1, 0, 0, 0}, /* VSUBL */
-                    {1, 1, 0, 0}, /* VSUBW */
+                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
+                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
+                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
+                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                     {0, 1, 1, 0}, /* VADDHN */
                     {0, 0, 0, 0}, /* VABAL */
                     {0, 1, 1, 0}, /* VSUBHN */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
 
-                prewiden = neon_3reg_wide[op][0];
                 src1_wide = neon_3reg_wide[op][1];
                 src2_wide = neon_3reg_wide[op][2];
                 undefreq = neon_3reg_wide[op][3];
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         } else {
                             tmp = neon_load_reg(rn, pass);
                         }
-                        if (prewiden) {
-                            gen_neon_widen(cpu_V0, tmp, size, u);
-                        }
                     }
                     if (src2_wide) {
                         neon_load_reg64(cpu_V1, rm + pass);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         } else {
                             tmp2 = neon_load_reg(rm, pass);
                         }
-                        if (prewiden) {
-                            gen_neon_widen(cpu_V1, tmp2, size, u);
-                        }
                     }
                     switch (op) {
                     case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
-- 
2.20.1

Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
VRSUBHN in the Neon 3-registers-different-lengths group to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  6 +++
 target/arm/translate-neon.inc.c | 87 +++++++++++++++++++++++++++++++
 target/arm/translate.c          | 91 ++++-----------------------------
 3 files changed, 104 insertions(+), 80 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
     VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+
+    VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+    VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+
+    VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+    VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_PREWIDEN(VADDW_S, s, ext, add, true)
 DO_PREWIDEN(VADDW_U, u, extu, add, true)
 DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
 DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+
+static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
+                         NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
+{
+    /* 3-regs different lengths, narrowing (VADDHN/VSUBHN/VRADDHN/VRSUBHN) */
+    TCGv_i64 rn_64, rm_64;
+    TCGv_i32 rd0, rd1;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn || !narrowfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if ((a->vn | a->vm) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rn_64 = tcg_temp_new_i64();
+    rm_64 = tcg_temp_new_i64();
+    rd0 = tcg_temp_new_i32();
+    rd1 = tcg_temp_new_i32();
+
+    neon_load_reg64(rn_64, a->vn);
+    neon_load_reg64(rm_64, a->vm);
+
+    opfn(rn_64, rn_64, rm_64);
+
+    narrowfn(rd0, rn_64);
+
+    neon_load_reg64(rn_64, a->vn + 1);
+    neon_load_reg64(rm_64, a->vm + 1);
+
+    opfn(rn_64, rn_64, rm_64);
+
+    narrowfn(rd1, rn_64);
+
+    neon_store_reg(a->vd, 0, rd0);
+    neon_store_reg(a->vd, 1, rd1);
+
+    tcg_temp_free_i64(rn_64);
+    tcg_temp_free_i64(rm_64);
+
+    return true;
+}
+
+#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP)                       \
+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+    {                                                                   \
+        static NeonGenTwo64OpFn * const addfn[] = {                     \
+            gen_helper_neon_##OP##l_u16,                                \
+            gen_helper_neon_##OP##l_u32,                                \
+            tcg_gen_##OP##_i64,                                         \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenNarrowFn * const narrowfn[] = {                   \
+            gen_helper_neon_##NARROWTYPE##_high_u8,                     \
+            gen_helper_neon_##NARROWTYPE##_high_u16,                    \
+            EXTOP,                                                      \
+            NULL,                                                       \
+        };                                                              \
+        return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]);   \
+    }
+
+static void gen_narrow_round_high_u32(TCGv_i32 rd, TCGv_i64 rn)
+{
+    tcg_gen_addi_i64(rn, rn, 1u << 31);
+    tcg_gen_extrh_i64_i32(rd, rn);
+}
+
+DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
+DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
+DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
+DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
     }
 }
 
-static inline void gen_neon_subl(int size)
-{
-    switch (size) {
-    case 0: gen_helper_neon_subl_u16(CPU_V001); break;
-    case 1: gen_helper_neon_subl_u32(CPU_V001); break;
-    case 2: tcg_gen_sub_i64(CPU_V001); break;
-    default: abort();
-    }
-}
-
 static inline void gen_neon_negl(TCGv_i64 var, int size)
 {
     switch (size) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             op = (insn >> 8) & 0xf;
             if ((insn & (1 << 6)) == 0) {
                 /* Three registers of different lengths.  */
-                int src1_wide;
-                int src2_wide;
                 /* undefreq: bit 0 : UNDEF if size == 0
                  *           bit 1 : UNDEF if size == 1
                  *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VADDW: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
-                    {0, 1, 1, 0}, /* VADDHN */
+                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
                     {0, 0, 0, 0}, /* VABAL */
-                    {0, 1, 1, 0}, /* VSUBHN */
+                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                     {0, 0, 0, 0}, /* VABDL */
                     {0, 0, 0, 0}, /* VMLAL */
                     {0, 0, 0, 9}, /* VQDMLAL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
 
-                src1_wide = neon_3reg_wide[op][1];
-                src2_wide = neon_3reg_wide[op][2];
                 undefreq = neon_3reg_wide[op][3];
 
                 if ((undefreq & (1 << size)) ||
                     ((undefreq & 8) && u)) {
                     return 1;
                 }
-                if ((src1_wide && (rn & 1)) ||
-                    (src2_wide && (rm & 1)) ||
-                    (!src2_wide && (rd & 1))) {
+                if (rd & 1) {
                     return 1;
                 }
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 /* Avoid overlapping operands.  Wide source operands are
                    always aligned so will never overlap with wide
                    destinations in problematic ways.  */
-                if (rd == rm && !src2_wide) {
+                if (rd == rm) {
                     tmp = neon_load_reg(rm, 1);
                     neon_store_scratch(2, tmp);
-                } else if (rd == rn && !src1_wide) {
+                } else if (rd == rn) {
                     tmp = neon_load_reg(rn, 1);
                     neon_store_scratch(2, tmp);
                 }
                 tmp3 = NULL;
                 for (pass = 0; pass < 2; pass++) {
-                    if (src1_wide) {
-                        neon_load_reg64(cpu_V0, rn + pass);
-                        tmp = NULL;
+                    if (pass == 1 && rd == rn) {
+                        tmp = neon_load_scratch(2);
                     } else {
-                        if (pass == 1 && rd == rn) {
-                            tmp = neon_load_scratch(2);
-                        } else {
-                            tmp = neon_load_reg(rn, pass);
-                        }
+                        tmp = neon_load_reg(rn, pass);
                     }
-                    if (src2_wide) {
-                        neon_load_reg64(cpu_V1, rm + pass);
-                        tmp2 = NULL;
+                    if (pass == 1 && rd == rm) {
+                        tmp2 = neon_load_scratch(2);
                     } else {
-                        if (pass == 1 && rd == rm) {
-                            tmp2 = neon_load_scratch(2);
-                        } else {
-                            tmp2 = neon_load_reg(rm, pass);
-                        }
+                        tmp2 = neon_load_reg(rm, pass);
                     }
                     switch (op) {
-                    case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
-                        gen_neon_addl(size);
-                        break;
-                    case 2: case 3: case 6: /* VSUBL, VSUBW, VSUBHN, VRSUBHN */
-                        gen_neon_subl(size);
-                        break;
                     case 5: case 7: /* VABAL, VABDL */
                         switch ((size << 1) | u) {
                         case 0:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             abort();
                         }
                         neon_store_reg64(cpu_V0, rd + pass);
-                    } else if (op == 4 || op == 6) {
-                        /* Narrowing operation.  */
-                        tmp = tcg_temp_new_i32();
-                        if (!u) {
-                            switch (size) {
-                            case 0:
-                                gen_helper_neon_narrow_high_u8(tmp, cpu_V0);
-                                break;
-                            case 1:
-                                gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
-                                break;
-                            case 2:
-                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
-                                break;
-                            default: abort();
-                            }
-                        } else {
-                            switch (size) {
-                            case 0:
-                                gen_helper_neon_narrow_round_high_u8(tmp, cpu_V0);
-                                break;
-                            case 1:
-                                gen_helper_neon_narrow_round_high_u16(tmp, cpu_V0);
-                                break;
-                            case 2:
-                                tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
-                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
-                                break;
-                            default: abort();
-                            }
-                        }
-                        if (pass == 0) {
-                            tmp3 = tmp;
-                        } else {
-                            neon_store_reg(rd, 0, tmp3);
-                            neon_store_reg(rd, 1, tmp);
-                        }
                     } else {
                         /* Write back the result.  */
                         neon_store_reg64(cpu_V0, rd + pass);
-- 
2.20.1

Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
Like almost all the remaining insns in this group, these are
a combination of a two-input operation which returns a double width
result and then a possible accumulation of that double width
result into the destination.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.h          |   1 +
 target/arm/neon-dp.decode       |   6 ++
 target/arm/translate-neon.inc.c | 132 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  31 +-------
 4 files changed, 142 insertions(+), 28 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
 typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
 typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
 typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
+typedef void NeonGenTwoOpWidenFn(TCGv_i64, TCGv_i32, TCGv_i32);
 typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
 typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
     VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
 
+    VABAL_S_3d   1111 001 0 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+    VABAL_U_3d   1111 001 1 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+
     VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
     VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+
+    VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
+    VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
 DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
 DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
 DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
+
+static bool do_long_3d(DisasContext *s, arg_3diff *a,
+                       NeonGenTwoOpWidenFn *opfn,
+                       NeonGenTwo64OpFn *accfn)
+{
+    /*
+     * 3-regs different lengths, long operations.
+     * These perform an operation on two inputs that returns a double-width
+     * result, and then possibly perform an accumulation operation of
+     * that result into the double-width destination.
+     */
+    TCGv_i64 rd0, rd1, tmp;
+    TCGv_i32 rn, rm;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rd0 = tcg_temp_new_i64();
+    rd1 = tcg_temp_new_i64();
+
+    rn = neon_load_reg(a->vn, 0);
+    rm = neon_load_reg(a->vm, 0);
+    opfn(rd0, rn, rm);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rm);
+
+    rn = neon_load_reg(a->vn, 1);
+    rm = neon_load_reg(a->vm, 1);
+    opfn(rd1, rn, rm);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rm);
+
+    /* Don't store results until after all loads: they might overlap */
+    if (accfn) {
+        tmp = tcg_temp_new_i64();
+        neon_load_reg64(tmp, a->vd);
+        accfn(tmp, tmp, rd0);
+        neon_store_reg64(tmp, a->vd);
+        neon_load_reg64(tmp, a->vd + 1);
+        accfn(tmp, tmp, rd1);
+        neon_store_reg64(tmp, a->vd + 1);
+        tcg_temp_free_i64(tmp);
+    } else {
+        neon_store_reg64(rd0, a->vd);
+        neon_store_reg64(rd1, a->vd + 1);
+    }
+
+    tcg_temp_free_i64(rd0);
+    tcg_temp_free_i64(rd1);
+
+    return true;
+}
+
+static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_s16,
+        gen_helper_neon_abdl_s32,
+        gen_helper_neon_abdl_s64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VABDL_U_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_u16,
+        gen_helper_neon_abdl_u32,
+        gen_helper_neon_abdl_u64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_s16,
+        gen_helper_neon_abdl_s32,
+        gen_helper_neon_abdl_s64,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const addfn[] = {
+        gen_helper_neon_addl_u16,
+        gen_helper_neon_addl_u32,
+        tcg_gen_add_i64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
+}
+
+static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_u16,
+        gen_helper_neon_abdl_u32,
+        gen_helper_neon_abdl_u64,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const addfn[] = {
+        gen_helper_neon_addl_u16,
+        gen_helper_neon_addl_u32,
+        tcg_gen_add_i64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                     {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
-                    {0, 0, 0, 0}, /* VABAL */
+                    {0, 0, 0, 7}, /* VABAL */
                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
-                    {0, 0, 0, 0}, /* VABDL */
+                    {0, 0, 0, 7}, /* VABDL */
                     {0, 0, 0, 0}, /* VMLAL */
                     {0, 0, 0, 9}, /* VQDMLAL */
                     {0, 0, 0, 0}, /* VMLSL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         tmp2 = neon_load_reg(rm, pass);
                     }
                     switch (op) {
-                    case 5: case 7: /* VABAL, VABDL */
-                        switch ((size << 1) | u) {
-                        case 0:
-                            gen_helper_neon_abdl_s16(cpu_V0, tmp, tmp2);
-                            break;
-                        case 1:
-                            gen_helper_neon_abdl_u16(cpu_V0, tmp, tmp2);
-                            break;
-                        case 2:
-                            gen_helper_neon_abdl_s32(cpu_V0, tmp, tmp2);
-                            break;
-                        case 3:
-                            gen_helper_neon_abdl_u32(cpu_V0, tmp, tmp2);
-                            break;
-                        case 4:
-                            gen_helper_neon_abdl_s64(cpu_V0, tmp, tmp2);
-                            break;
-                        case 5:
-                            gen_helper_neon_abdl_u64(cpu_V0, tmp, tmp2);
-                            break;
-                        default: abort();
-                        }
-                        tcg_temp_free_i32(tmp2);
-                        tcg_temp_free_i32(tmp);
-                        break;
                     case 8: case 9: case 10: case 11: case 12: case 13:
                         /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
                         gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         case 10: /* VMLSL */
                             gen_neon_negl(cpu_V0, size);
                             /* Fall through */
-                        case 5: case 8: /* VABAL, VMLAL */
+                        case 8: /* VABAL, VMLAL */
                             gen_neon_addl(size);
                             break;
                         case 9: case 11: /* VQDMLAL, VQDMLSL */
-- 
2.20.1

Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
a 32x32->64 multiply with possible accumulate.

Note that for VMLSL we do the accumulate directly with a subtraction
rather than doing a negate-then-add as the old code did.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  9 +++++
 target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 21 +++-------
 3 files changed, 86 insertions(+), 15 deletions(-)

Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
these are all saturating doubling long multiplies with a possible
accumulate step.

These are the last insns in the group which use the pass-over-each
elements loop, so we can delete that code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  6 +++
 target/arm/translate-neon.inc.c | 82 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 59 ++----------------------
 3 files changed, 92 insertions(+), 55 deletions(-)

Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
insn in this group to be converted.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  2 ++
 target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
 target/arm/translate.c          | 60 ++-------------------------------
 3 files changed, 48 insertions(+), 57 deletions(-)

Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
trans_VSHLL_U_2sh() as both 'static' and 'const'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-neon.inc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
 
 static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
 {
-    NeonGenWidenFn *widenfn[] = {
+    static NeonGenWidenFn * const widenfn[] = {
         gen_helper_neon_widen_s8,
         gen_helper_neon_widen_s16,
         tcg_gen_ext_i32_i64,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
 
 static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
 {
-    NeonGenWidenFn *widenfn[] = {
+    static NeonGenWidenFn * const widenfn[] = {
         gen_helper_neon_widen_u8,
         gen_helper_neon_widen_u16,
         tcg_gen_extu_i32_i64,
-- 
2.20.1

Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
scalar" group to decodetree.  These are 32x32->32 operations where
one of the inputs is the scalar, followed by a possible accumulate
operation of the 32-bit result.

The refactoring removes some of the oddities of the old decoder:
 * operands to the operation and accumulation were often
   reversed (taking advantage of the fact that most of these ops
   are commutative); the new code follows the pseudocode order
 * the Q bit in the insn was in a local variable 'u'; in the
   new code it is decoded into a->q

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  15 ++++
 target/arm/translate-neon.inc.c | 133 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  77 ++----------------
 3 files changed, 154 insertions(+), 71 deletions(-)

Convert the float versions of VMLA, VMLS and VMUL in the Neon
2-reg-scalar group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
As noted in the comment on the WRAP_FP_FN macro, we could have
had a do_2scalar_fp() function, but for 3 insns it seemed
simpler to just do the wrapping to get hold of the fpstatus ptr.
(These are the only fp insns in the group.)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 65 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 37 ++-----------------
 3 files changed, 71 insertions(+), 34 deletions(-)

Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 +++
 target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
 target/arm/translate.c          | 42 ++-------------------------------
 3 files changed, 34 insertions(+), 40 deletions(-)

Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 74 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 38 +----------------
 3 files changed, 79 insertions(+), 36 deletions(-)

Convert the Neon 2-reg-scalar long multiplies to decodetree.
These are the last instructions in the group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  18 ++++
 target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
 target/arm/translate.c          | 182 ++------------------------------
 3 files changed, 187 insertions(+), 176 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
+    # For the 'long' ops the Q bit is part of insn decode
+    @2scalar_q0  .... ... . . . size:2 .... .... .... . . . . .... \
+                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
 
     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
     VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
 
+    VMLAL_S_2sc  1111 001 0 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+    VMLAL_U_2sc  1111 001 1 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+
+    VQDMLAL_2sc  1111 001 0 1 . .. .... .... 0011 . 1 . 0 .... @2scalar_q0
+
     VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
     VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
 
+    VMLSL_S_2sc  1111 001 0 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
+    VMLSL_U_2sc  1111 001 1 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
+
+    VQDMLSL_2sc  1111 001 0 1 . .. .... .... 0111 . 1 . 0 .... @2scalar_q0
+
     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
     VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
 
+    VMULL_S_2sc  1111 001 0 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
+    VMULL_U_2sc  1111 001 1 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
+
+    VQDMULL_2sc  1111 001 0 1 . .. .... .... 1011 . 1 . 0 .... @2scalar_q0
+
     VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
     VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
 
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
     };
     return do_vqrdmlah_2sc(s, a, opfn[a->size]);
 }
+
+static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
+                            NeonGenTwoOpWidenFn *opfn,
+                            NeonGenTwo64OpFn *accfn)
+{
+    /*
+     * Two registers and a scalar, long operations: perform an
+     * operation on the input elements and the scalar which produces
+     * a double-width result, and then possibly perform an accumulation
+     * operation of that result into the destination.
+     */
+    TCGv_i32 scalar, rn;
+    TCGv_i64 rn0_64, rn1_64;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* Bad size (including size == 3, which is a different insn group) */
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    scalar = neon_get_scalar(a->size, a->vm);
+
+    /* Load all inputs before writing any outputs, in case of overlap */
+    rn = neon_load_reg(a->vn, 0);
+    rn0_64 = tcg_temp_new_i64();
+    opfn(rn0_64, rn, scalar);
+    tcg_temp_free_i32(rn);
+
+    rn = neon_load_reg(a->vn, 1);
+    rn1_64 = tcg_temp_new_i64();
+    opfn(rn1_64, rn, scalar);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(scalar);
+
+    if (accfn) {
+        TCGv_i64 t64 = tcg_temp_new_i64();
+        neon_load_reg64(t64, a->vd);
+        accfn(t64, t64, rn0_64);
+        neon_store_reg64(t64, a->vd);
+        neon_load_reg64(t64, a->vd + 1);
+        accfn(t64, t64, rn1_64);
+        neon_store_reg64(t64, a->vd + 1);
+        tcg_temp_free_i64(t64);
+    } else {
+        neon_store_reg64(rn0_64, a->vd);
+        neon_store_reg64(rn1_64, a->vd + 1);
+    }
+    tcg_temp_free_i64(rn0_64);
+    tcg_temp_free_i64(rn1_64);
+    return true;
+}
+
+static bool trans_VMULL_S_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mull_s16,
+        gen_mull_s32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mull_u16,
+        gen_mull_u32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+#define DO_VMLAL_2SC(INSN, MULL, ACC)                                   \
+    static bool trans_##INSN##_2sc(DisasContext *s, arg_2scalar *a)     \
+    {                                                                   \
+        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
+            NULL,                                                       \
+            gen_helper_neon_##MULL##16,                                 \
+            gen_##MULL##32,                                             \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenTwo64OpFn * const accfn[] = {                     \
+            NULL,                                                       \
+            gen_helper_neon_##ACC##l_u32,                               \
+            tcg_gen_##ACC##_i64,                                        \
+            NULL,                                                       \
+        };                                                              \
+        return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);    \
+    }
+
+DO_VMLAL_2SC(VMLAL_S, mull_s, add)
+DO_VMLAL_2SC(VMLAL_U, mull_u, add)
+DO_VMLAL_2SC(VMLSL_S, mull_s, sub)
+DO_VMLAL_2SC(VMLSL_U, mull_u, sub)
+
+static bool trans_VQDMULL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VQDMLAL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const accfn[] = {
+        NULL,
+        gen_VQDMLAL_acc_16,
+        gen_VQDMLAL_acc_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
+}
+
+static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const accfn[] = {
+        NULL,
+        gen_VQDMLSL_acc_16,
+        gen_VQDMLSL_acc_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
     tcg_gen_ext16s_i32(dest, var);
 }
 
-/* 32x32->64 multiply.  Marks inputs as dead.  */
-static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
-{
-    TCGv_i32 lo = tcg_temp_new_i32();
-    TCGv_i32 hi = tcg_temp_new_i32();
-    TCGv_i64 ret;
-
-    tcg_gen_mulu2_i32(lo, hi, a, b);
-    tcg_temp_free_i32(a);
-    tcg_temp_free_i32(b);
-
-    ret = tcg_temp_new_i64();
-    tcg_gen_concat_i32_i64(ret, lo, hi);
-    tcg_temp_free_i32(lo);
-    tcg_temp_free_i32(hi);
-
-    return ret;
-}
-
-static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
-{
-    TCGv_i32 lo = tcg_temp_new_i32();
-    TCGv_i32 hi = tcg_temp_new_i32();
-    TCGv_i64 ret;
-
-    tcg_gen_muls2_i32(lo, hi, a, b);
-    tcg_temp_free_i32(a);
-    tcg_temp_free_i32(b);
-
-    ret = tcg_temp_new_i64();
-    tcg_gen_concat_i32_i64(ret, lo, hi);
-    tcg_temp_free_i32(lo);
-    tcg_temp_free_i32(hi);
-
-    return ret;
-}
-
 /* Swap low and high halfwords.  */
 static void gen_swap_half(TCGv_i32 var)
 {
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
     }
 }
 
-static inline void gen_neon_negl(TCGv_i64 var, int size)
-{
-    switch (size) {
-    case 0: gen_helper_neon_negl_u16(var, var); break;
-    case 1: gen_helper_neon_negl_u32(var, var); break;
-    case 2:
-        tcg_gen_neg_i64(var, var);
-        break;
-    default: abort();
-    }
-}
-
-static inline void gen_neon_addl_saturate(TCGv_i64 op0, TCGv_i64 op1, int size)
-{
-    switch (size) {
-    case 1: gen_helper_neon_addl_saturate_s32(op0, cpu_env, op0, op1); break;
-    case 2: gen_helper_neon_addl_saturate_s64(op0, cpu_env, op0, op1); break;
-    default: abort();
-    }
-}
-
-static inline void gen_neon_mull(TCGv_i64 dest, TCGv_i32 a, TCGv_i32 b,
-                                 int size, int u)
-{
-    TCGv_i64 tmp;
-
-    switch ((size << 1) | u) {
-    case 0: gen_helper_neon_mull_s8(dest, a, b); break;
-    case 1: gen_helper_neon_mull_u8(dest, a, b); break;
-    case 2: gen_helper_neon_mull_s16(dest, a, b); break;
-    case 3: gen_helper_neon_mull_u16(dest, a, b); break;
-    case 4:
-        tmp = gen_muls_i64_i32(a, b);
-        tcg_gen_mov_i64(dest, tmp);
-        tcg_temp_free_i64(tmp);
-        break;
-    case 5:
-        tmp = gen_mulu_i64_i32(a, b);
-        tcg_gen_mov_i64(dest, tmp);
-        tcg_temp_free_i64(tmp);
-        break;
-    default: abort();
-    }
-
-    /* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
-       Don't forget to clean them now.  */
-    if (size < 2) {
-        tcg_temp_free_i32(a);
-        tcg_temp_free_i32(b);
-    }
-}
-
 static void gen_neon_narrow_op(int op, int u, int size,
                                TCGv_i32 dest, TCGv_i64 src)
 {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int u;
     int vec_size;
     uint32_t imm;
-    TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
+    TCGv_i32 tmp, tmp2, tmp3, tmp5;
     TCGv_ptr ptr1;
     TCGv_i64 tmp64;
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         return 1;
     } else { /* (insn & 0x00800010 == 0x00800000) */
         if (size != 3) {
-            op = (insn >> 8) & 0xf;
-            if ((insn & (1 << 6)) == 0) {
-                /* Three registers of different lengths: handled by decodetree */
-                return 1;
-            } else {
-                /* Two registers and a scalar. NB that for ops of this form
-                 * the ARM ARM labels bit 24 as Q, but it is in our variable
-                 * 'u', not 'q'.
-                 */
-                if (size == 0) {
-                    return 1;
-                }
-                switch (op) {
-                case 0: /* Integer VMLA scalar */
-                case 4: /* Integer VMLS scalar */
-                case 8: /* Integer VMUL scalar */
-                case 1: /* Float VMLA scalar */
-                case 5: /* Floating point VMLS scalar */
-                case 9: /* Floating point VMUL scalar */
-                case 12: /* VQDMULH scalar */
-                case 13: /* VQRDMULH scalar */
-                case 14: /* VQRDMLAH scalar */
-                case 15: /* VQRDMLSH scalar */
-                    return 1; /* handled by decodetree */
-
-                case 3: /* VQDMLAL scalar */
-                case 7: /* VQDMLSL scalar */
-                case 11: /* VQDMULL scalar */
-                    if (u == 1) {
-                        return 1;
-                    }
-                    /* fall through */
-                case 2: /* VMLAL sclar */
-                case 6: /* VMLSL scalar */
-                case 10: /* VMULL scalar */
-                    if (rd & 1) {
-                        return 1;
-                    }
-                    tmp2 = neon_get_scalar(size, rm);
-                    /* We need a copy of tmp2 because gen_neon_mull
-                     * deletes it during pass 0.  */
-                    tmp4 = tcg_temp_new_i32();
-                    tcg_gen_mov_i32(tmp4, tmp2);
-                    tmp3 = neon_load_reg(rn, 1);
-
-                    for (pass = 0; pass < 2; pass++) {
-                        if (pass == 0) {
-                            tmp = neon_load_reg(rn, 0);
-                        } else {
-                            tmp = tmp3;
-                            tmp2 = tmp4;
-                        }
-                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
-                        if (op != 11) {
-                            neon_load_reg64(cpu_V1, rd + pass);
-                        }
-                        switch (op) {
-                        case 6:
-                            gen_neon_negl(cpu_V0, size);
-                            /* Fall through */
-                        case 2:
-                            gen_neon_addl(size);
-                            break;
-                        case 3: case 7:
-                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                            if (op == 7) {
-                                gen_neon_negl(cpu_V0, size);
-                            }
-                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
-                            break;
-                        case 10:
-                            /* no-op */
-                            break;
-                        case 11:
-                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                            break;
-                        default:
-                            abort();
-                        }
-                        neon_store_reg64(cpu_V0, rd + pass);
-                    }
-                    break;
-                default:
-                    g_assert_not_reached();
-                }
-            }
+            /*
+             * Three registers of different lengths, or two registers and
+             * a scalar: handled by decodetree
+             */
+            return 1;
         } else { /* size == 3 */
             if (!u) {
                 /* Extract.  */
-- 
2.20.1

Convert the Neon VEXT insn to decodetree. Rather than keeping the
old implementation which used fixed temporaries cpu_V0 and cpu_V1
and did the extraction with by-hand shift and logic ops, we use
the TCG extract2 insn.

We don't need to special case 0 or 8 immediates any more as the
optimizer is smart enough to throw away the dead code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  8 +++-
 target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 58 +------------------------
 3 files changed, 85 insertions(+), 57 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 # return false for size==3.
 ######################################################################
 {
-  # 0b11 subgroup will go here
+  [
+    ##################################################################
+    # Miscellaneous size=0b11 insns
+    ##################################################################
+    VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+  ]
 
   # Subgroup for size != 0b11
   [
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
 
     return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
 }
+
+static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (a->imm > 7 && !a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (!a->q) {
+        /* Extract 64 bits from <Vm:Vn> */
+        TCGv_i64 left, right, dest;
+
+        left = tcg_temp_new_i64();
+        right = tcg_temp_new_i64();
+        dest = tcg_temp_new_i64();
+
+        neon_load_reg64(right, a->vn);
+        neon_load_reg64(left, a->vm);
+        tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
+        neon_store_reg64(dest, a->vd);
+
+        tcg_temp_free_i64(left);
+        tcg_temp_free_i64(right);
+        tcg_temp_free_i64(dest);
+    } else {
+        /* Extract 128 bits from <Vm+1:Vm:Vn+1:Vn> */
+        TCGv_i64 left, middle, right, destleft, destright;
+
+        left = tcg_temp_new_i64();
+        middle = tcg_temp_new_i64();
+        right = tcg_temp_new_i64();
+        destleft = tcg_temp_new_i64();
+        destright = tcg_temp_new_i64();
+
+        if (a->imm < 8) {
+            neon_load_reg64(right, a->vn);
+            neon_load_reg64(middle, a->vn + 1);
+            tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
+            neon_load_reg64(left, a->vm);
+            tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
+        } else {
+            neon_load_reg64(right, a->vn + 1);
+            neon_load_reg64(middle, a->vm);
+            tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
+            neon_load_reg64(left, a->vm + 1);
+            tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
+        }
+
+        neon_store_reg64(destright, a->vd);
+        neon_store_reg64(destleft, a->vd + 1);
+
+        tcg_temp_free_i64(destright);
+        tcg_temp_free_i64(destleft);
+        tcg_temp_free_i64(right);
+        tcg_temp_free_i64(middle);
+        tcg_temp_free_i64(left);
+    }
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int pass;
     int u;
     int vec_size;
-    uint32_t imm;
     TCGv_i32 tmp, tmp2, tmp3, tmp5;
     TCGv_ptr ptr1;
-    TCGv_i64 tmp64;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         } else { /* size == 3 */
             if (!u) {
-                /* Extract.  */
-                imm = (insn >> 8) & 0xf;
-
-                if (imm > 7 && !q)
-                    return 1;
-
-                if (q && ((rd | rn | rm) & 1)) {
-                    return 1;
-                }
-
-                if (imm == 0) {
-                    neon_load_reg64(cpu_V0, rn);
-                    if (q) {
-                        neon_load_reg64(cpu_V1, rn + 1);
-                    }
-                } else if (imm == 8) {
-                    neon_load_reg64(cpu_V0, rn + 1);
-                    if (q) {
-                        neon_load_reg64(cpu_V1, rm);
-                    }
-                } else if (q) {
-                    tmp64 = tcg_temp_new_i64();
-                    if (imm < 8) {
-                        neon_load_reg64(cpu_V0, rn);
-                        neon_load_reg64(tmp64, rn + 1);
-                    } else {
-                        neon_load_reg64(cpu_V0, rn + 1);
-                        neon_load_reg64(tmp64, rm);
-                    }
-                    tcg_gen_shri_i64(cpu_V0, cpu_V0, (imm & 7) * 8);
-                    tcg_gen_shli_i64(cpu_V1, tmp64, 64 - ((imm & 7) * 8));
-                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
-                    if (imm < 8) {
-                        neon_load_reg64(cpu_V1, rm);
-                    } else {
-                        neon_load_reg64(cpu_V1, rm + 1);
-                        imm -= 8;
-                    }
-                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
-                    tcg_gen_shri_i64(tmp64, tmp64, imm * 8);
-                    tcg_gen_or_i64(cpu_V1, cpu_V1, tmp64);
-                    tcg_temp_free_i64(tmp64);
-                } else {
-                    /* BUGFIX */
-                    neon_load_reg64(cpu_V0, rn);
-                    tcg_gen_shri_i64(cpu_V0, cpu_V0, imm * 8);
-                    neon_load_reg64(cpu_V1, rm);
-                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
-                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
-                }
-                neon_store_reg64(cpu_V0, rd);
-                if (q) {
-                    neon_store_reg64(cpu_V1, rd + 1);
-                }
+                /* Extract: handled by decodetree */
+                return 1;
             } else if ((insn & (1 << 11)) == 0) {
                 /* Two register misc.  */
                 op = ((insn >> 12) & 0x30) | ((insn >> 7) & 0xf);
-- 
2.20.1

Convert the Neon VTBL, VTBX instructions to decodetree.  The actual
implementation of the insn is copied across to the new trans function
unchanged except for renaming 'tmp5' to 'tmp4'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 41 +++---------------------
 3 files changed, 63 insertions(+), 37 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     ##################################################################
     VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
   ]
 
   # Subgroup for size != 0b11
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
     }
     return true;
 }
+
+static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
+{
+    int n;
+    TCGv_i32 tmp, tmp2, tmp3, tmp4;
+    TCGv_ptr ptr1;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    n = a->len + 1;
+    if ((a->vn + n) > 32) {
+        /*
+         * This is UNPREDICTABLE; we choose to UNDEF to avoid the
+         * helper function running off the end of the register file.
+         */
+        return false;
+    }
+    n <<= 3;
+    if (a->op) {
+        tmp = neon_load_reg(a->vd, 0);
+    } else {
+        tmp = tcg_temp_new_i32();
+        tcg_gen_movi_i32(tmp, 0);
+    }
+    tmp2 = neon_load_reg(a->vm, 0);
+    ptr1 = vfp_reg_ptr(true, a->vn);
+    tmp4 = tcg_const_i32(n);
+    gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp);
+    if (a->op) {
+        tmp = neon_load_reg(a->vd, 1);
+    } else {
+        tmp = tcg_temp_new_i32();
+        tcg_gen_movi_i32(tmp, 0);
+    }
+    tmp3 = neon_load_reg(a->vm, 1);
+    gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp4);
+    tcg_temp_free_ptr(ptr1);
+    neon_store_reg(a->vd, 0, tmp2);
+    neon_store_reg(a->vd, 1, tmp3);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
 {
     int op;
     int q;
-    int rd, rn, rm, rd_ofs, rm_ofs;
+    int rd, rm, rd_ofs, rm_ofs;
     int size;
     int pass;
     int u;
     int vec_size;
-    TCGv_i32 tmp, tmp2, tmp3, tmp5;
-    TCGv_ptr ptr1;
+    TCGv_i32 tmp, tmp2, tmp3;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     q = (insn & (1 << 6)) != 0;
     u = (insn >> 24) & 1;
     VFP_DREG_D(rd, insn);
-    VFP_DREG_N(rn, insn);
     VFP_DREG_M(rm, insn);
     size = (insn >> 20) & 3;
     vec_size = q ? 16 : 8;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     break;
                 }
             } else if ((insn & (1 << 10)) == 0) {
-                /* VTBL, VTBX.  */
-                int n = ((insn >> 8) & 3) + 1;
-                if ((rn + n) > 32) {
-                    /* This is UNPREDICTABLE; we choose to UNDEF to avoid the
-                     * helper function running off the end of the register file.
-                     */
-                    return 1;
-                }
-                n <<= 3;
-                if (insn & (1 << 6)) {
-                    tmp = neon_load_reg(rd, 0);
-                } else {
-                    tmp = tcg_temp_new_i32();
-                    tcg_gen_movi_i32(tmp, 0);
-                }
-                tmp2 = neon_load_reg(rm, 0);
-                ptr1 = vfp_reg_ptr(true, rn);
-                tmp5 = tcg_const_i32(n);
-                gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
-                tcg_temp_free_i32(tmp);
-                if (insn & (1 << 6)) {
-                    tmp = neon_load_reg(rd, 1);
-                } else {
-                    tmp = tcg_temp_new_i32();
-                    tcg_gen_movi_i32(tmp, 0);
-                }
-                tmp3 = neon_load_reg(rm, 1);
-                gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
-                tcg_temp_free_i32(tmp5);
-                tcg_temp_free_ptr(ptr1);
-                neon_store_reg(rd, 0, tmp2);
-                neon_store_reg(rd, 1, tmp3);
-                tcg_temp_free_i32(tmp);
+                /* VTBL, VTBX: handled by decodetree */
+                return 1;
             } else if ((insn & 0x380) == 0) {
                 /* VDUP */
                 int element;
-- 
2.20.1

Convert the Neon VDUP (scalar) insn to decodetree.  (Note that we
can't call this just "VDUP" as we used that already in vfp.decode for
the "VDUP (general purpose register" insn.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  7 +++++++
 target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
 target/arm/translate.c          | 25 +------------------------
 3 files changed, 34 insertions(+), 24 deletions(-)

From: Jean-Christophe Dubois <jcd@tribudubois.net>

Some bits of the CCM registers are non writable.

This was left undone in the initial commit (all bits of registers were
writable).

This patch adds the required code to protect the non writable bits.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 20200608133508.550046-1-jcd@tribudubois.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/imx6ul_ccm.c | 76 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 13 deletions(-)

diff --git a/hw/misc/imx6ul_ccm.c b/hw/misc/imx6ul_ccm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/imx6ul_ccm.c
+++ b/hw/misc/imx6ul_ccm.c
@@ -XXX,XX +XXX,XX @@
 
 #include "trace.h"
 
+static const uint32_t ccm_mask[CCM_MAX] = {
+    [CCM_CCR] = 0xf01fef80,
+    [CCM_CCDR] = 0xfffeffff,
+    [CCM_CSR] = 0xffffffff,
+    [CCM_CCSR] = 0xfffffef2,
+    [CCM_CACRR] = 0xfffffff8,
+    [CCM_CBCDR] = 0xc1f8e000,
+    [CCM_CBCMR] = 0xfc03cfff,
+    [CCM_CSCMR1] = 0x80700000,
+    [CCM_CSCMR2] = 0xe01ff003,
+    [CCM_CSCDR1] = 0xfe00c780,
+    [CCM_CS1CDR] = 0xfe00fe00,
+    [CCM_CS2CDR] = 0xf8007000,
+    [CCM_CDCDR] = 0xf00fffff,
+    [CCM_CHSCCDR] = 0xfffc01ff,
+    [CCM_CSCDR2] = 0xfe0001ff,
+    [CCM_CSCDR3] = 0xffffc1ff,
+    [CCM_CDHIPR] = 0xffffffff,
+    [CCM_CTOR] = 0x00000000,
+    [CCM_CLPCR] = 0xf39ff01c,
+    [CCM_CISR] = 0xfb85ffbe,
+    [CCM_CIMR] = 0xfb85ffbf,
+    [CCM_CCOSR] = 0xfe00fe00,
+    [CCM_CGPR] = 0xfffc3fea,
+    [CCM_CCGR0] = 0x00000000,
+    [CCM_CCGR1] = 0x00000000,
+    [CCM_CCGR2] = 0x00000000,
+    [CCM_CCGR3] = 0x00000000,
+    [CCM_CCGR4] = 0x00000000,
+    [CCM_CCGR5] = 0x00000000,
+    [CCM_CCGR6] = 0x00000000,
+    [CCM_CMEOR] = 0xafffff1f,
+};
+
+static const uint32_t analog_mask[CCM_ANALOG_MAX] = {
+    [CCM_ANALOG_PLL_ARM] = 0xfff60f80,
+    [CCM_ANALOG_PLL_USB1] = 0xfffe0fbc,
+    [CCM_ANALOG_PLL_USB2] = 0xfffe0fbc,
+    [CCM_ANALOG_PLL_SYS] = 0xfffa0ffe,
+    [CCM_ANALOG_PLL_SYS_SS] = 0x00000000,
+    [CCM_ANALOG_PLL_SYS_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_SYS_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_AUDIO] = 0xffe20f80,
+    [CCM_ANALOG_PLL_AUDIO_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_AUDIO_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_VIDEO] = 0xffe20f80,
+    [CCM_ANALOG_PLL_VIDEO_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_VIDEO_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_ENET] = 0xffc20ff0,
+    [CCM_ANALOG_PFD_480] = 0x40404040,
+    [CCM_ANALOG_PFD_528] = 0x40404040,
+    [PMU_MISC0] = 0x01fe8306,
+    [PMU_MISC1] = 0x07fcede0,
+    [PMU_MISC2] = 0x005f5f5f,
+};
+
 static const char *imx6ul_ccm_reg_name(uint32_t reg)
 {
     static char unknown[20];
@@ -XXX,XX +XXX,XX @@ static void imx6ul_ccm_write(void *opaque, hwaddr offset, uint64_t value,
 
     trace_ccm_write_reg(imx6ul_ccm_reg_name(index), (uint32_t)value);
 
-    /*
-     * We will do a better implementation later. In particular some bits
-     * cannot be written to.
-     */
-    s->ccm[index] = (uint32_t)value;
+    s->ccm[index] = (s->ccm[index] & ccm_mask[index]) |
+                           ((uint32_t)value & ~ccm_mask[index]);
 }
 
 static uint64_t imx6ul_analog_read(void *opaque, hwaddr offset, unsigned size)
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, setting bits passed in the value.
          */
-        s->analog[index - 1] |= value;
+        s->analog[index - 1] |= (value & ~analog_mask[index - 1]);
         break;
     case CCM_ANALOG_PLL_ARM_CLR:
     case CCM_ANALOG_PLL_USB1_CLR:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, unsetting bits passed in the value.
          */
-        s->analog[index - 2] &= ~value;
+        s->analog[index - 2] &= ~(value & ~analog_mask[index - 2]);
         break;
     case CCM_ANALOG_PLL_ARM_TOG:
     case CCM_ANALOG_PLL_USB1_TOG:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, toggling bits passed in the value.
          */
-        s->analog[index - 3] ^= value;
+        s->analog[index - 3] ^= (value & ~analog_mask[index - 3]);
         break;
     default:
-        /*
-         * We will do a better implementation later. In particular some bits
-         * cannot be written to.
-         */
-        s->analog[index] = value;
+        s->analog[index] = (s->analog[index] & analog_mask[index]) |
+                           (value & ~analog_mask[index]);
         break;
     }
 }
-- 
2.20.1

From: Erik Smit <erik.lucas.smit@gmail.com>

The hardware supports configurable descriptor sizes, configured in the DBLAC
register.

Most drivers use the default 4 word descriptor, which is currently hardcoded,
but Aspeed SDK configures 8 words to store extra data.

Signed-off-by: Erik Smit <erik.lucas.smit@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[PMM: removed unnecessary parens]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/ftgmac100.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/ftgmac100.c
+++ b/hw/net/ftgmac100.c
@@ -XXX,XX +XXX,XX @@
 #define FTGMAC100_APTC_TXPOLL_CNT(x)        (((x) >> 8) & 0xf)
 #define FTGMAC100_APTC_TXPOLL_TIME_SEL      (1 << 12)
 
+/*
+ * DMA burst length and arbitration control register
+ */
+#define FTGMAC100_DBLAC_RXBURST_SIZE(x)     (((x) >> 8) & 0x3)
+#define FTGMAC100_DBLAC_TXBURST_SIZE(x)     (((x) >> 10) & 0x3)
+#define FTGMAC100_DBLAC_RXDES_SIZE(x)       ((((x) >> 12) & 0xf) * 8)
+#define FTGMAC100_DBLAC_TXDES_SIZE(x)       ((((x) >> 16) & 0xf) * 8)
+#define FTGMAC100_DBLAC_IFG_CNT(x)          (((x) >> 20) & 0x7)
+#define FTGMAC100_DBLAC_IFG_INC             (1 << 23)
+
 /*
  * PHY control register
  */
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_do_tx(FTGMAC100State *s, uint32_t tx_ring,
         if (bd.des0 & s->txdes0_edotr) {
             addr = tx_ring;
         } else {
-            addr += sizeof(FTGMAC100Desc);
+            addr += FTGMAC100_DBLAC_TXDES_SIZE(s->dblac);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_write(void *opaque, hwaddr addr,
         s->phydata = value & 0xffff;
         break;
     case FTGMAC100_DBLAC: /* DMA Burst Length and Arbitration Control */
+        if (FTGMAC100_DBLAC_TXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s: transmit descriptor too small : %d bytes\n",
+                          __func__, FTGMAC100_DBLAC_TXDES_SIZE(s->dblac));
+            break;
+        }
+        if (FTGMAC100_DBLAC_RXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s: receive descriptor too small : %d bytes\n",
+                          __func__, FTGMAC100_DBLAC_RXDES_SIZE(s->dblac));
+            break;
+        }
         s->dblac = value;
         break;
     case FTGMAC100_REVR:  /* Feature Register */
@@ -XXX,XX +XXX,XX @@ static ssize_t ftgmac100_receive(NetClientState *nc, const uint8_t *buf,
         if (bd.des0 & s->rxdes0_edorr) {
             addr = s->rx_ring;
         } else {
-            addr += sizeof(FTGMAC100Desc);
+            addr += FTGMAC100_DBLAC_RXDES_SIZE(s->dblac);
         }
     }
     s->rx_descriptor = addr;
-- 
2.20.1

From: fangying <fangying1@huawei.com>

Virtual time adjustment was implemented for virt-5.0 machine type,
but the cpu property was enabled only for host-passthrough and max
cpu model.  Let's add it for any KVM arm cpu which has the generic
timer feature enabled.

Signed-off-by: Ying Fang <fangying1@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 20200608121243.2076-1-fangying1@huawei.com
[PMM: minor commit message tweak, removed inaccurate
 suggested-by tag]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c   |  6 ++++--
 target/arm/cpu64.c |  1 -
 target/arm/kvm.c   | 21 +++++++++++----------
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
     if (arm_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER)) {
         qdev_property_add_static(DEVICE(cpu), &arm_cpu_gt_cntfrq_property);
     }
+
+    if (kvm_enabled()) {
+        kvm_arm_add_vcpu_properties(obj);
+    }
 }
 
 static void arm_cpu_finalizefn(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
-        kvm_arm_add_vcpu_properties(obj);
     } else {
         cortex_a15_initfn(obj);
 
@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
         aarch64_add_sve_properties(obj);
     }
-    kvm_arm_add_vcpu_properties(obj);
     arm_cpu_post_init(obj);
 }
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
-        kvm_arm_add_vcpu_properties(obj);
     } else {
         uint64_t t;
         uint32_t u;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp)
 /* KVM VCPU properties should be prefixed with "kvm-". */
 void kvm_arm_add_vcpu_properties(Object *obj)
 {
-    if (!kvm_enabled()) {
-        return;
-    }
+    ARMCPU *cpu = ARM_CPU(obj);
+    CPUARMState *env = &cpu->env;
 
-    ARM_CPU(obj)->kvm_adjvtime = true;
-    object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
-                             kvm_no_adjvtime_set);
-    object_property_set_description(obj, "kvm-no-adjvtime",
-                                    "Set on to disable the adjustment of "
-                                    "the virtual counter. VM stopped time "
-                                    "will be counted.");
+    if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
+        cpu->kvm_adjvtime = true;
+        object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
+                                 kvm_no_adjvtime_set);
+        object_property_set_description(obj, "kvm-no-adjvtime",
+                                        "Set on to disable the adjustment of "
+                                        "the virtual counter. VM stopped time "
+                                        "will be counted.");
+    }
 }
 
 bool kvm_arm_pmu_supported(CPUState *cpu)
-- 
2.20.1

From: Jean-Christophe Dubois <jcd@tribudubois.net>

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
[PMD: Fixed 32-bit format string using PRIx32/PRIx64]
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/imx_fec.c    | 106 +++++++++++++++++++-------------------------
 hw/net/trace-events |  18 ++++++++
 2 files changed, 63 insertions(+), 61 deletions(-)

diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/imx_fec.c
+++ b/hw/net/imx_fec.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/module.h"
 #include "net/checksum.h"
 #include "net/eth.h"
+#include "trace.h"
 
 /* For crc32 */
 #include <zlib.h>
 
-#ifndef DEBUG_IMX_FEC
-#define DEBUG_IMX_FEC 0
-#endif
-
-#define FEC_PRINTF(fmt, args...) \
-    do { \
-        if (DEBUG_IMX_FEC) { \
-            fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_FEC, \
-                                             __func__, ##args); \
-        } \
-    } while (0)
-
-#ifndef DEBUG_IMX_PHY
-#define DEBUG_IMX_PHY 0
-#endif
-
-#define PHY_PRINTF(fmt, args...) \
-    do { \
-        if (DEBUG_IMX_PHY) { \
-            fprintf(stderr, "[%s.phy]%s: " fmt , TYPE_IMX_FEC, \
-                                                 __func__, ##args); \
-        } \
-    } while (0)
-
 #define IMX_MAX_DESC    1024
 
 static const char *imx_default_reg_name(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static void imx_eth_update(IMXFECState *s);
  * For now we don't handle any GPIO/interrupt line, so the OS will
  * have to poll for the PHY status.
  */
-static void phy_update_irq(IMXFECState *s)
+static void imx_phy_update_irq(IMXFECState *s)
 {
     imx_eth_update(s);
 }
 
-static void phy_update_link(IMXFECState *s)
+static void imx_phy_update_link(IMXFECState *s)
 {
     /* Autonegotiation status mirrors link status.  */
     if (qemu_get_queue(s->nic)->link_down) {
-        PHY_PRINTF("link is down\n");
+        trace_imx_phy_update_link("down");
         s->phy_status &= ~0x0024;
         s->phy_int |= PHY_INT_DOWN;
     } else {
-        PHY_PRINTF("link is up\n");
+        trace_imx_phy_update_link("up");
         s->phy_status |= 0x0024;
         s->phy_int |= PHY_INT_ENERGYON;
         s->phy_int |= PHY_INT_AUTONEG_COMPLETE;
     }
-    phy_update_irq(s);
+    imx_phy_update_irq(s);
 }
 
 static void imx_eth_set_link(NetClientState *nc)
 {
-    phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
+    imx_phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
 }
 
-static void phy_reset(IMXFECState *s)
+static void imx_phy_reset(IMXFECState *s)
 {
+    trace_imx_phy_reset();
+
     s->phy_status = 0x7809;
     s->phy_control = 0x3000;
     s->phy_advertise = 0x01e1;
     s->phy_int_mask = 0;
     s->phy_int = 0;
-    phy_update_link(s);
+    imx_phy_update_link(s);
 }
 
-static uint32_t do_phy_read(IMXFECState *s, int reg)
+static uint32_t imx_phy_read(IMXFECState *s, int reg)
 {
     uint32_t val;
 
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
     case 29:    /* Interrupt source.  */
         val = s->phy_int;
         s->phy_int = 0;
-        phy_update_irq(s);
+        imx_phy_update_irq(s);
         break;
     case 30:    /* Interrupt mask */
         val = s->phy_int_mask;
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
         break;
     }
 
-    PHY_PRINTF("read 0x%04x @ %d\n", val, reg);
+    trace_imx_phy_read(val, reg);
 
     return val;
 }
 
-static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
+static void imx_phy_write(IMXFECState *s, int reg, uint32_t val)
 {
-    PHY_PRINTF("write 0x%04x @ %d\n", val, reg);
+    trace_imx_phy_write(val, reg);
 
     if (reg > 31) {
         /* we only advertise one phy */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
     switch (reg) {
     case 0:     /* Basic Control */
         if (val & 0x8000) {
-            phy_reset(s);
+            imx_phy_reset(s);
         } else {
             s->phy_control = val & 0x7980;
             /* Complete autonegotiation immediately.  */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
         break;
     case 30:    /* Interrupt mask */
         s->phy_int_mask = val & 0xff;
-        phy_update_irq(s);
+        imx_phy_update_irq(s);
         break;
     case 17:
     case 18:
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
 static void imx_fec_read_bd(IMXFECBufDesc *bd, dma_addr_t addr)
 {
     dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
+
+    trace_imx_fec_read_bd(addr, bd->flags, bd->length, bd->data);
 }
 
 static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
 static void imx_enet_read_bd(IMXENETBufDesc *bd, dma_addr_t addr)
 {
     dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
+
+    trace_imx_enet_read_bd(addr, bd->flags, bd->length, bd->data,
+                   bd->option, bd->status);
 }
 
 static void imx_enet_write_bd(IMXENETBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_do_tx(IMXFECState *s)
         int len;
 
         imx_fec_read_bd(&bd, addr);
-        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x\n",
-                   addr, bd.flags, bd.length, bd.data);
         if ((bd.flags & ENET_BD_R) == 0) {
+
             /* Run out of descriptors to transmit.  */
-            FEC_PRINTF("tx_bd ran out of descriptors to transmit\n");
+            trace_imx_eth_tx_bd_busy();
+
             break;
         }
         len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_enet_do_tx(IMXFECState *s, uint32_t index)
         int len;
 
         imx_enet_read_bd(&bd, addr);
-        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x option %04x "
-                   "status %04x\n", addr, bd.flags, bd.length, bd.data,
-                   bd.option, bd.status);
         if ((bd.flags & ENET_BD_R) == 0) {
             /* Run out of descriptors to transmit.  */
+
+            trace_imx_eth_tx_bd_busy();
+
             break;
         }
         len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_eth_enable_rx(IMXFECState *s, bool flush)
     s->regs[ENET_RDAR] = (bd.flags & ENET_BD_E) ? ENET_RDAR_RDAR : 0;
 
     if (!s->regs[ENET_RDAR]) {
-        FEC_PRINTF("RX buffer full\n");
+        trace_imx_eth_rx_bd_full();
     } else if (flush) {
         qemu_flush_queued_packets(qemu_get_queue(s->nic));
     }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_reset(DeviceState *d)
     memset(s->tx_descriptor, 0, sizeof(s->tx_descriptor));
 
     /* We also reset the PHY */
-    phy_reset(s);
+    imx_phy_reset(s);
 }
 
 static uint32_t imx_default_read(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_eth_read(void *opaque, hwaddr offset, unsigned size)
         break;
     }
 
-    FEC_PRINTF("reg[%s] => 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
-                                              value);
+    trace_imx_eth_read(index, imx_eth_reg_name(s, index), value);
 
     return value;
 }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
     const bool single_tx_ring = !imx_eth_is_multi_tx_ring(s);
     uint32_t index = offset >> 2;
 
-    FEC_PRINTF("reg[%s] <= 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
-                (uint32_t)value);
+    trace_imx_eth_write(index, imx_eth_reg_name(s, index), value);
 
     switch (index) {
     case ENET_EIR:
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
         if (extract32(value, 29, 1)) {
             /* This is a read operation */
             s->regs[ENET_MMFR] = deposit32(s->regs[ENET_MMFR], 0, 16,
-                                           do_phy_read(s,
+                                           imx_phy_read(s,
                                                        extract32(value,
                                                                  18, 10)));
         } else {
             /* This a write operation */
-            do_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
+            imx_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
         }
         /* raise the interrupt as the PHY operation is done */
         s->regs[ENET_EIR] |= ENET_INT_MII;
@@ -XXX,XX +XXX,XX @@ static bool imx_eth_can_receive(NetClientState *nc)
 {
     IMXFECState *s = IMX_FEC(qemu_get_nic_opaque(nc));
 
-    FEC_PRINTF("\n");
-
     return !!s->regs[ENET_RDAR];
 }
 
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
     unsigned int buf_len;
     size_t size = len;
 
-    FEC_PRINTF("len %d\n", (int)size);
+    trace_imx_fec_receive(size);
 
     if (!s->regs[ENET_RDAR]) {
         qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
         bd.length = buf_len;
         size -= buf_len;
 
-        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
+        trace_imx_fec_receive_len(addr, bd.length);
 
         /* The last 4 bytes are the CRC.  */
         if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
         if (size == 0) {
             /* Last buffer in frame.  */
             bd.flags |= flags | ENET_BD_L;
-            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
+
+            trace_imx_fec_receive_last(bd.flags);
+
             s->regs[ENET_EIR] |= ENET_INT_RXF;
         } else {
             s->regs[ENET_EIR] |= ENET_INT_RXB;
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
     size_t size = len;
     bool shift16 = s->regs[ENET_RACC] & ENET_RACC_SHIFT16;
 
-    FEC_PRINTF("len %d\n", (int)size);
+    trace_imx_enet_receive(size);
 
     if (!s->regs[ENET_RDAR]) {
         qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
         bd.length = buf_len;
         size -= buf_len;
 
-        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
+        trace_imx_enet_receive_len(addr, bd.length);
 
         /* The last 4 bytes are the CRC.  */
         if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
         if (size == 0) {
             /* Last buffer in frame.  */
             bd.flags |= flags | ENET_BD_L;
-            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
+
+            trace_imx_enet_receive_last(bd.flags);
+
             /* Indicate that we've updated the last buffer descriptor. */
             bd.last_buffer = ENET_BD_BDU;
             if (bd.option & ENET_BD_RX_INT) {
diff --git a/hw/net/trace-events b/hw/net/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -XXX,XX +XXX,XX @@ i82596_receive_packet(size_t sz) "len=%zu"
 i82596_new_mac(const char *id_with_mac) "New MAC for: %s"
 i82596_set_multicast(uint16_t count) "Added %d multicast entries"
 i82596_channel_attention(void *s) "%p: Received CHANNEL ATTENTION"
+
+# imx_fec.c
+imx_phy_read(uint32_t val, int reg) "0x%04"PRIx32" <= reg[%d]"
+imx_phy_write(uint32_t val, int reg) "0x%04"PRIx32" => reg[%d]"
+imx_phy_update_link(const char *s) "%s"
+imx_phy_reset(void) ""
+imx_fec_read_bd(uint64_t addr, int flags, int len, int data) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x"
+imx_enet_read_bd(uint64_t addr, int flags, int len, int data, int options, int status) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x option 0x%04x status 0x%04x"
+imx_eth_tx_bd_busy(void) "tx_bd ran out of descriptors to transmit"
+imx_eth_rx_bd_full(void) "RX buffer is full"
+imx_eth_read(int reg, const char *reg_name, uint32_t value) "reg[%d:%s] => 0x%08"PRIx32
+imx_eth_write(int reg, const char *reg_name, uint64_t value) "reg[%d:%s] <= 0x%08"PRIx64
+imx_fec_receive(size_t size) "len %zu"
+imx_fec_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
+imx_fec_receive_last(int last) "rx frame flags 0x%04x"
+imx_enet_receive(size_t size) "len %zu"
+imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
+imx_enet_receive_last(int last) "rx frame flags 0x%04x"
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

The Linux kernel's IMX code now uses vendor specific commands.
This results in endless warnings when booting the Linux kernel.

sdhci-esdhc-imx 2194000.usdhc: esdhc_wait_for_card_clock_gate_off:
	card clock still not gate off in 100us!.

Implement support for the vendor specific command implemented in IMX hardware
to be able to avoid this warning.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Message-id: 20200603145258.195920-2-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/sd/sdhci-internal.h |  5 +++++
 include/hw/sd/sdhci.h  |  5 +++++
 hw/sd/sdhci.c          | 18 +++++++++++++++++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/sdhci-internal.h
+++ b/hw/sd/sdhci-internal.h
@@ -XXX,XX +XXX,XX @@
 #define SDHC_CMD_INHIBIT               0x00000001
 #define SDHC_DATA_INHIBIT              0x00000002
 #define SDHC_DAT_LINE_ACTIVE           0x00000004
+#define SDHC_IMX_CLOCK_GATE_OFF        0x00000080
 #define SDHC_DOING_WRITE               0x00000100
 #define SDHC_DOING_READ                0x00000200
 #define SDHC_SPACE_AVAILABLE           0x00000400
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
 
 
 #define ESDHC_MIX_CTRL                  0x48
+
 #define ESDHC_VENDOR_SPEC               0xc0
+#define ESDHC_IMX_FRC_SDCLK_ON          (1 << 8)
+
 #define ESDHC_DLL_CTRL                  0x60
 
 #define ESDHC_TUNING_CTRL               0xcc
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
 #define DEFINE_SDHCI_COMMON_PROPERTIES(_state) \
     DEFINE_PROP_UINT8("sd-spec-version", _state, sd_spec_version, 2), \
     DEFINE_PROP_UINT8("uhs", _state, uhs_mode, UHS_NOT_SUPPORTED), \
+    DEFINE_PROP_UINT8("vendor", _state, vendor, SDHCI_VENDOR_NONE), \
     \
     /* Capabilities registers provide information on supported
      * features of this specific host controller implementation */ \
diff --git a/include/hw/sd/sdhci.h b/include/hw/sd/sdhci.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/sd/sdhci.h
+++ b/include/hw/sd/sdhci.h
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
     uint16_t acmd12errsts; /* Auto CMD12 error status register */
     uint16_t hostctl2;     /* Host Control 2 */
     uint64_t admasysaddr;  /* ADMA System Address Register */
+    uint16_t vendor_spec;  /* Vendor specific register */
 
     /* Read-only registers */
     uint64_t capareg;      /* Capabilities Register */
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
     uint32_t quirks;
     uint8_t sd_spec_version;
     uint8_t uhs_mode;
+    uint8_t vendor;        /* For vendor specific functionality */
 } SDHCIState;
 
+#define SDHCI_VENDOR_NONE       0
+#define SDHCI_VENDOR_IMX        1
+
 /*
  * Controller does not provide transfer-complete interrupt when not
  * busy.
diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -XXX,XX +XXX,XX @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
         }
         break;
 
+    case ESDHC_VENDOR_SPEC:
+        ret = s->vendor_spec;
+        break;
     case ESDHC_DLL_CTRL:
     case ESDHC_TUNE_CTRL_STATUS:
     case ESDHC_UNDOCUMENTED_REG27:
     case ESDHC_TUNING_CTRL:
-    case ESDHC_VENDOR_SPEC:
     case ESDHC_MIX_CTRL:
     case ESDHC_WTMK_LVL:
         ret = 0;
@@ -XXX,XX +XXX,XX @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
     case ESDHC_UNDOCUMENTED_REG27:
     case ESDHC_TUNING_CTRL:
     case ESDHC_WTMK_LVL:
+        break;
+
     case ESDHC_VENDOR_SPEC:
+        s->vendor_spec = value;
+        switch (s->vendor) {
+        case SDHCI_VENDOR_IMX:
+            if (value & ESDHC_IMX_FRC_SDCLK_ON) {
+                s->prnsts &= ~SDHC_IMX_CLOCK_GATE_OFF;
+            } else {
+                s->prnsts |= SDHC_IMX_CLOCK_GATE_OFF;
+            }
+            break;
+        default:
+            break;
+        }
         break;
 
     case SDHC_HOSTCTL:
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

Set vendor property to IMX to enable IMX specific functionality
in sdhci code.

Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200603145258.195920-3-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/fsl-imx25.c  | 6 ++++++
 hw/arm/fsl-imx6.c   | 6 ++++++
 hw/arm/fsl-imx6ul.c | 2 ++
 hw/arm/fsl-imx7.c   | 2 ++
 4 files changed, 16 insertions(+)

diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx25.c
+++ b/hw/arm/fsl-imx25.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
                                  &err);
         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX25_ESDHC_CAPABILITIES,
                                  "capareg", &err);
+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
         object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
         if (err) {
             error_propagate(errp, err);
diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx6.c
+++ b/hw/arm/fsl-imx6.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
                                  &err);
         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX6_ESDHC_CAPABILITIES,
                                  "capareg", &err);
+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
         object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
         if (err) {
             error_propagate(errp, err);
diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx6ul.c
+++ b/hw/arm/fsl-imx6ul.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
             FSL_IMX6UL_USDHC2_IRQ,
         };
 
+        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
+                                        "vendor", &error_abort);
         object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                  &error_abort);
 
diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx7.c
+++ b/hw/arm/fsl-imx7.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
             FSL_IMX7_USDHC3_IRQ,
         };
 
+        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &error_abort);
         object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                  &error_abort);
 
-- 
2.20.1

The following changes since commit 5767815218efd3cbfd409505ed824d5f356044ae:

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2024-02-14 15:45:52 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240215

for you to fetch changes up to f780e63fe731b058fe52d43653600d8729a1b5f2:

docs: Add documentation for the mps3-an536 board (2024-02-15 14:32:39 +0000)

----------------------------------------------------------------
target-arm queue:
 * hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
 * linux-user/aarch64: Choose SYNC as the preferred MTE mode
 * Fix some errors in SVE/SME handling of MTE tags
 * hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
 * hw/block/tc58128: Don't emit deprecation warning under qtest
 * tests/qtest: Fix handling of npcm7xx and GMAC tests
 * hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
 * tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
 * Don't assert on vmload/vmsave of M-profile CPUs
 * hw/arm/smmuv3: add support for stage 1 access fault
 * hw/arm/stellaris: QOM cleanups
 * Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
 * Improve Cortex_R52 IMPDEF sysreg modelling
 * Allow access to SPSR_hyp from hyp mode
 * New board model mps3-an536 (Cortex-R52)

----------------------------------------------------------------
Luc Michel (1):
      hw/arm/smmuv3: add support for stage 1 access fault

Nabih Estefan (1):
      tests/qtest: Fix GMAC test to run on a machine in upstream QEMU

Peter Maydell (22):
      hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
      hw/block/tc58128: Don't emit deprecation warning under qtest
      tests/qtest/meson.build: Don't include qtests_npcm7xx in qtests_aarch64
      tests/qtest/bios-tables-test: Allow changes to virt GTDT
      hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
      tests/qtest/bios-tables-tests: Update virt golden reference
      hw/arm/npcm7xx: Call qemu_configure_nic_device() for GMAC modules
      tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
      target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
      target/arm: Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
      target/arm: The Cortex-R52 has a read-only CBAR
      target/arm: Add Cortex-R52 IMPDEF sysregs
      target/arm: Allow access to SPSR_hyp from hyp mode
      hw/misc/mps2-scc: Fix condition for CFG3 register
      hw/misc/mps2-scc: Factor out which-board conditionals
      hw/misc/mps2-scc: Make changes needed for AN536 FPGA image
      hw/arm/mps3r: Initial skeleton for mps3-an536 board
      hw/arm/mps3r: Add CPUs, GIC, and per-CPU RAM
      hw/arm/mps3r: Add UARTs
      hw/arm/mps3r: Add GPIO, watchdog, dual-timer, I2C devices
      hw/arm/mps3r: Add remaining devices
      docs: Add documentation for the mps3-an536 board

Philippe Mathieu-Daudé (5):
      hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
      hw/arm/stellaris: Convert ADC controller to Resettable interface
      hw/arm/stellaris: Convert I2C controller to Resettable interface
      hw/arm/stellaris: Add missing QOM 'machine' parent
      hw/arm/stellaris: Add missing QOM 'SoC' parent

Richard Henderson (6):
      linux-user/aarch64: Choose SYNC as the preferred MTE mode
      target/arm: Fix nregs computation in do_{ld,st}_zpa
      target/arm: Adjust and validate mtedesc sizem1
      target/arm: Split out make_svemte_desc
      target/arm: Handle mte in do_ldrq, do_ldro
      target/arm: Fix SVE/SME gross MTE suppression checks

From: Philippe Mathieu-Daudé <philmd@linaro.org>

Similarly to commits dadbb58f59..5ae79fe825 for other ARM boards,
connect FIQ output of the GIC CPU interfaces to the CPU.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20240130152548.17855-1-philmd@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xilinx_zynq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -XXX,XX +XXX,XX @@ static void zynq_init(MachineState *machine)
     sysbus_mmio_map(busdev, 0, MPCORE_PERIPHBASE);
     sysbus_connect_irq(busdev, 0,
                        qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));
+    sysbus_connect_irq(busdev, 1,
+                       qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ));
 
     for (n = 0; n < 64; n++) {
         pic[n] = qdev_get_gpio_in(dev, n);
-- 
2.34.1