Series comparison

-[PULL 00/44] target-arm queue
+[PULL 00/35] target-arm queue
-First set of arm patches for 6.2. I have a lot more in my
+The following changes since commit 5767815218efd3cbfd409505ed824d5f356044ae:
 to-review queue still...
--- PMM
+  Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2024-02-14 15:45:52 +0000)
 The following changes since commit d42685765653ec155fdf60910662f8830bdb2cef:
   Open 6.2 development tree (2021-08-25 10:25:12 +0100)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210825
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240215
-for you to fetch changes up to 24b1a6aa43615be22c7ee66bd68ec5675f6a6a9a:
+for you to fetch changes up to f780e63fe731b058fe52d43653600d8729a1b5f2:
-  docs: Document how to use gdb with unix sockets (2021-08-25 10:48:51 +0100)
+  docs: Add documentation for the mps3-an536 board (2024-02-15 14:32:39 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * More MVE emulation work
+ * hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
- * Implement M-profile trapping on division by zero
+ * linux-user/aarch64: Choose SYNC as the preferred MTE mode
- * kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
+ * Fix some errors in SVE/SME handling of MTE tags
- * hw/char/pl011: add support for sending break
+ * hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
- * fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
+ * hw/block/tc58128: Don't emit deprecation warning under qtest
- * hw/dma/pl330: Add memory region to replace default
+ * tests/qtest: Fix handling of npcm7xx and GMAC tests
- * sbsa-ref: Rename SBSA_GWDT enum value
+ * hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
- * fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
+ * tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
- * docs: Document how to use gdb with unix sockets
+ * Don't assert on vmload/vmsave of M-profile CPUs
  * hw/arm/smmuv3: add support for stage 1 access fault
  * hw/arm/stellaris: QOM cleanups
  * Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
  * Improve Cortex_R52 IMPDEF sysreg modelling
  * Allow access to SPSR_hyp from hyp mode
  * New board model mps3-an536 (Cortex-R52)
 ----------------------------------------------------------------
-Eduardo Habkost (1):
+Luc Michel (1):
-      sbsa-ref: Rename SBSA_GWDT enum value
+      hw/arm/smmuv3: add support for stage 1 access fault
-Guenter Roeck (2):
+Nabih Estefan (1):
-      fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
+      tests/qtest: Fix GMAC test to run on a machine in upstream QEMU
       fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
-Hamza Mahfooz (1):
+Peter Maydell (22):
-      target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
+      hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
       hw/block/tc58128: Don't emit deprecation warning under qtest
       tests/qtest/meson.build: Don't include qtests_npcm7xx in qtests_aarch64
       tests/qtest/bios-tables-test: Allow changes to virt GTDT
       hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
       tests/qtest/bios-tables-tests: Update virt golden reference
       hw/arm/npcm7xx: Call qemu_configure_nic_device() for GMAC modules
       tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
       target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
       target/arm: Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
       target/arm: The Cortex-R52 has a read-only CBAR
       target/arm: Add Cortex-R52 IMPDEF sysregs
       target/arm: Allow access to SPSR_hyp from hyp mode
       hw/misc/mps2-scc: Fix condition for CFG3 register
       hw/misc/mps2-scc: Factor out which-board conditionals
       hw/misc/mps2-scc: Make changes needed for AN536 FPGA image
       hw/arm/mps3r: Initial skeleton for mps3-an536 board
       hw/arm/mps3r: Add CPUs, GIC, and per-CPU RAM
       hw/arm/mps3r: Add UARTs
       hw/arm/mps3r: Add GPIO, watchdog, dual-timer, I2C devices
       hw/arm/mps3r: Add remaining devices
       docs: Add documentation for the mps3-an536 board
-Jan Luebbe (1):
+Philippe Mathieu-Daudé (5):
-      hw/char/pl011: add support for sending break
+      hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
       hw/arm/stellaris: Convert ADC controller to Resettable interface
       hw/arm/stellaris: Convert I2C controller to Resettable interface
       hw/arm/stellaris: Add missing QOM 'machine' parent
       hw/arm/stellaris: Add missing QOM 'SoC' parent
-Peter Maydell (37):
+Richard Henderson (6):
-      target/arm: Note that we handle VMOVL as a special case of VSHLL
+      linux-user/aarch64: Choose SYNC as the preferred MTE mode
-      target/arm: Print MVE VPR in CPU dumps
+      target/arm: Fix nregs computation in do_{ld,st}_zpa
-      target/arm: Fix MVE VSLI by 0 and VSRI by <dt>
+      target/arm: Adjust and validate mtedesc sizem1
-      target/arm: Fix signed VADDV
+      target/arm: Split out make_svemte_desc
-      target/arm: Fix mask handling for MVE narrowing operations
+      target/arm: Handle mte in do_ldrq, do_ldro
-      target/arm: Fix 48-bit saturating shifts
+      target/arm: Fix SVE/SME gross MTE suppression checks
       target/arm: Fix MVE 48-bit SQRSHRL for small right shifts
       target/arm: Fix calculation of LTP mask when LR is 0
       target/arm: Factor out mve_eci_mask()
       target/arm: Fix VPT advance when ECI is non-zero
       target/arm: Fix VLDRB/H/W for predicated elements
       target/arm: Implement MVE VMULL (polynomial)
       target/arm: Implement MVE incrementing/decrementing dup insns
       target/arm: Factor out gen_vpst()
       target/arm: Implement MVE integer vector comparisons
       target/arm: Implement MVE integer vector-vs-scalar comparisons
       target/arm: Implement MVE VPSEL
       target/arm: Implement MVE VMLAS
       target/arm: Implement MVE shift-by-scalar
       target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats
       target/arm: Implement MVE integer min/max across vector
       target/arm: Implement MVE VABAV
       target/arm: Implement MVE narrowing moves
       target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn
       target/arm: Implement MVE VMLADAV and VMLSLDAV
       target/arm: Implement MVE VMLA
       target/arm: Implement MVE saturating doubling multiply accumulates
       target/arm: Implement MVE VQABS, VQNEG
       target/arm: Implement MVE VMAXA, VMINA
       target/arm: Implement MVE VMOV to/from 2 general-purpose registers
       target/arm: Implement MVE VPNOT
       target/arm: Implement MVE VCTP
       target/arm: Implement MVE scatter-gather insns
       target/arm: Implement MVE scatter-gather immediate forms
       target/arm: Implement MVE interleaving loads/stores
       target/arm: Re-indent sdiv and udiv helpers
       target/arm: Implement M-profile trapping on division by zero
-Sebastian Meyer (1):
+ MAINTAINERS                             |   3 +-
-      docs: Document how to use gdb with unix sockets
+ docs/system/arm/mps2.rst                |  37 +-
  configs/devices/arm-softmmu/default.mak |   1 +
  hw/arm/smmuv3-internal.h                |   1 +
  include/hw/arm/smmu-common.h            |   1 +
  include/hw/arm/virt.h                   |   2 +
  include/hw/misc/mps2-scc.h              |   1 +
  linux-user/aarch64/target_prctl.h       |  29 +-
  target/arm/internals.h                  |   2 +-
  target/arm/tcg/translate-a64.h          |   2 +
  hw/arm/mps3r.c                          | 640 ++++++++++++++++++++++++++++++++
  hw/arm/npcm7xx.c                        |   1 +
  hw/arm/smmu-common.c                    |  11 +
  hw/arm/smmuv3.c                         |   1 +
  hw/arm/stellaris.c                      |  47 ++-
  hw/arm/virt-acpi-build.c                |  20 +-
  hw/arm/virt.c                           |  60 ++-
  hw/arm/xilinx_zynq.c                    |   2 +
  hw/block/tc58128.c                      |   4 +-
  hw/misc/mps2-scc.c                      | 138 ++++++-
  hw/pci-host/raven.c                     |   1 +
  target/arm/helper.c                     |  14 +-
  target/arm/tcg/cpu32.c                  | 109 ++++++
  target/arm/tcg/op_helper.c              |  43 ++-
  target/arm/tcg/sme_helper.c             |   8 +-
  target/arm/tcg/sve_helper.c             |  12 +-
  target/arm/tcg/translate-sme.c          |  15 +-
  target/arm/tcg/translate-sve.c          |  83 +++--
  target/arm/tcg/translate.c              |  19 +-
  tests/qtest/npcm7xx_emc-test.c          |   5 +-
  tests/qtest/npcm_gmac-test.c            |  84 +----
  hw/arm/Kconfig                          |   5 +
  hw/arm/meson.build                      |   1 +
  tests/data/acpi/virt/FACP               | Bin 276 -> 276 bytes
  tests/data/acpi/virt/GTDT               | Bin 96 -> 104 bytes
  tests/qtest/meson.build                 |   4 +-
 files changed, 1184 insertions(+), 222 deletions(-)
  create mode 100644 hw/arm/mps3r.c
-Wen, Jianxian (1):
-      hw/dma/pl330: Add memory region to replace default
- docs/system/gdb.rst        |   26 +-
- include/hw/arm/fsl-imx7.h  |    5 +
- target/arm/cpu.h           |    1 +
- target/arm/helper-mve.h    |  283 ++++++++++
- target/arm/helper.h        |    4 +-
- target/arm/translate-a32.h |    2 +
- target/arm/vec_internal.h  |   11 +
- target/arm/mve.decode      |  226 +++++++-
- target/arm/t32.decode      |    1 +
- hw/arm/exynos4210.c        |    3 +
- hw/arm/fsl-imx6ul.c        |   12 +
- hw/arm/fsl-imx7.c          |    7 +
- hw/arm/sbsa-ref.c          |    6 +-
- hw/arm/xilinx_zynq.c       |    3 +
- hw/char/pl011.c            |    6 +
- hw/dma/pl330.c             |   26 +-
- target/arm/cpu.c           |    3 +
- target/arm/helper.c        |   34 +-
- target/arm/kvm.c           |   17 +-
- target/arm/m_helper.c      |    4 +
- target/arm/mve_helper.c    | 1254 ++++++++++++++++++++++++++++++++++++++++++--
- target/arm/translate-mve.c |  877 ++++++++++++++++++++++++++++++-
- target/arm/translate-vfp.c |    2 +-
- target/arm/translate.c     |   37 +-
- target/arm/vec_helper.c    |   14 +-
-files changed, 2746 insertions(+), 118 deletions(-)

-[PULL 41/44] hw/dma/pl330: Add memory region to replace default
+[PULL 01/35] hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
-From: "Wen, Jianxian" <Jianxian.Wen@verisilicon.com>
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
-Add property memory region which can connect with IOMMU region to support SMMU translate.
+Similarly to commits dadbb58f59..5ae79fe825 for other ARM boards,
 connect FIQ output of the GIC CPU interfaces to the CPU.
-Signed-off-by: Jianxian Wen <jianxian.wen@verisilicon.com>
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20240130152548.17855-1-philmd@linaro.org
-Message-id: 4C23C17B8E87E74E906A25A3254A03F4FA1FEC31@SHASXM03.verisilicon.com
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/exynos4210.c  |  3 +++
+ hw/arm/xilinx_zynq.c | 2 ++
- hw/arm/xilinx_zynq.c |  3 +++
+file changed, 2 insertions(+)
  hw/dma/pl330.c       | 26 ++++++++++++++++++++++----
 files changed, 28 insertions(+), 4 deletions(-)
-diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/exynos4210.c
-+++ b/hw/arm/exynos4210.c
-@@ -XXX,XX +XXX,XX @@ static DeviceState *pl330_create(uint32_t base, qemu_or_irq *orgate,
-     int i;
-     dev = qdev_new("pl330");
-+    object_property_set_link(OBJECT(dev), "memory",
-+                             OBJECT(get_system_memory()),
-+                             &error_fatal);
-     qdev_prop_set_uint8(dev, "num_events", nevents);
-     qdev_prop_set_uint8(dev, "num_chnls",  8);
-     qdev_prop_set_uint8(dev, "num_periph_req",  nreq);
 diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xilinx_zynq.c
 +++ b/hw/arm/xilinx_zynq.c
 @@ -XXX,XX +XXX,XX @@ static void zynq_init(MachineState *machine)
-     sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[39-IRQ_OFFSET]);
+     sysbus_mmio_map(busdev, 0, MPCORE_PERIPHBASE);
+     sysbus_connect_irq(busdev, 0,
-     dev = qdev_new("pl330");
+                        qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));
-+    object_property_set_link(OBJECT(dev), "memory",
++    sysbus_connect_irq(busdev, 1,
-+                             OBJECT(address_space_mem),
++                       qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ));
-+                             &error_fatal);
-     qdev_prop_set_uint8(dev, "num_chnls",  8);
+     for (n = 0; n < 64; n++) {
-     qdev_prop_set_uint8(dev, "num_periph_req",  4);
+         pic[n] = qdev_get_gpio_in(dev, n);
      qdev_prop_set_uint8(dev, "num_events",  16);
 diff --git a/hw/dma/pl330.c b/hw/dma/pl330.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/dma/pl330.c
 +++ b/hw/dma/pl330.c
@@ -XXX,XX +XXX,XX @@ struct PL330State {
      uint8_t num_faulting;
      uint8_t periph_busy[PL330_PERIPH_NUM];
 +    /* Memory region that DMA operation access */
 +    MemoryRegion *mem_mr;
 +    AddressSpace *mem_as;
  };
  #define TYPE_PL330 "pl330"
@@ -XXX,XX +XXX,XX @@ static inline const PL330InsnDesc *pl330_fetch_insn(PL330Chan *ch)
      uint8_t opcode;
      int i;
 -    dma_memory_read(&address_space_memory, ch->pc, &opcode, 1);
 +    dma_memory_read(ch->parent->mem_as, ch->pc, &opcode, 1);
      for (i = 0; insn_desc[i].size; i++) {
          if ((opcode & insn_desc[i].opmask) == insn_desc[i].opcode) {
              return &insn_desc[i];
@@ -XXX,XX +XXX,XX @@ static inline void pl330_exec_insn(PL330Chan *ch, const PL330InsnDesc *insn)
      uint8_t buf[PL330_INSN_MAXSIZE];
      assert(insn->size <= PL330_INSN_MAXSIZE);
 -    dma_memory_read(&address_space_memory, ch->pc, buf, insn->size);
 +    dma_memory_read(ch->parent->mem_as, ch->pc, buf, insn->size);
      insn->exec(ch, buf[0], &buf[1], insn->size - 1);
  }
@@ -XXX,XX +XXX,XX @@ static int pl330_exec_cycle(PL330Chan *channel)
      if (q != NULL && q->len <= pl330_fifo_num_free(&s->fifo)) {
          int len = q->len - (q->addr & (q->len - 1));
 -        dma_memory_read(&address_space_memory, q->addr, buf, len);
 +        dma_memory_read(s->mem_as, q->addr, buf, len);
          trace_pl330_exec_cycle(q->addr, len);
          if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
              pl330_hexdump(buf, len);
@@ -XXX,XX +XXX,XX @@ static int pl330_exec_cycle(PL330Chan *channel)
              fifo_res = pl330_fifo_get(&s->fifo, buf, len, q->tag);
          }
          if (fifo_res == PL330_FIFO_OK || q->z) {
 -            dma_memory_write(&address_space_memory, q->addr, buf, len);
 +            dma_memory_write(s->mem_as, q->addr, buf, len);
              trace_pl330_exec_cycle(q->addr, len);
              if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
                  pl330_hexdump(buf, len);
@@ -XXX,XX +XXX,XX @@ static void pl330_realize(DeviceState *dev, Error **errp)
                            "dma", PL330_IOMEM_SIZE);
      sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
 +    if (!s->mem_mr) {
 +        error_setg(errp, "'memory' link is not set");
 +        return;
 +    } else if (s->mem_mr == get_system_memory()) {
 +        /* Avoid creating new AS for system memory. */
 +        s->mem_as = &address_space_memory;
 +    } else {
 +        s->mem_as = g_new0(AddressSpace, 1);
 +        address_space_init(s->mem_as, s->mem_mr,
 +                           memory_region_name(s->mem_mr));
 +    }
 +
      s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, pl330_exec_cycle_timer, s);
      s->cfg[0] = (s->mgr_ns_at_rst ? 0x4 : 0) |
@@ -XXX,XX +XXX,XX @@ static Property pl330_properties[] = {
      DEFINE_PROP_UINT8("rd_q_dep", PL330State, rd_q_dep, 16),
      DEFINE_PROP_UINT16("data_buffer_dep", PL330State, data_buffer_dep, 256),
 +    DEFINE_PROP_LINK("memory", PL330State, mem_mr,
 +                     TYPE_MEMORY_REGION, MemoryRegion *),
 +
      DEFINE_PROP_END_OF_LIST(),
  };
 --
-.20.1
+.34.1

-[PULL 33/44] target/arm: Implement MVE scatter-gather insns
+[PULL 02/35] linux-user/aarch64: Choose SYNC as the preferred MTE mode
-Implement the MVE gather-loads and scatter-stores which
+From: Richard Henderson <richard.henderson@linaro.org>
 form the address by adding a base value from a scalar
 register to an offset in each element of a vector.
+The API does not generate an error for setting ASYNC | SYNC; that merely
+constrains the selection vs the per-cpu default.  For qemu linux-user,
+choose SYNC as the default.
+Cc: qemu-stable@nongnu.org
+Reported-by: Gustavo Romero <gustavo.romero@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/helper-mve.h    |  32 +++++++++
+ linux-user/aarch64/target_prctl.h | 29 +++++++++++++++++------------
- target/arm/mve.decode      |  12 ++++
+file changed, 17 insertions(+), 12 deletions(-)
  target/arm/mve_helper.c    | 129 +++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  97 ++++++++++++++++++++++++++++
 files changed, 270 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/linux-user/aarch64/target_prctl.h
-+++ b/target/arm/helper-mve.h
++++ b/linux-user/aarch64/target_prctl.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_set_tagged_addr_ctrl(CPUArchState *env, abi_long arg2)
- DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
+     env->tagged_addr_enable = arg2 & PR_TAGGED_ADDR_ENABLE;
- DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
+     if (cpu_isar_feature(aa64_mte, cpu)) {
-+DEF_HELPER_FLAGS_4(mve_vldrb_sg_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-        switch (arg2 & PR_MTE_TCF_MASK) {
-+DEF_HELPER_FLAGS_4(mve_vldrb_sg_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-        case PR_MTE_TCF_NONE:
-+DEF_HELPER_FLAGS_4(mve_vldrh_sg_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-        case PR_MTE_TCF_SYNC:
-+
+-        case PR_MTE_TCF_ASYNC:
-+DEF_HELPER_FLAGS_4(mve_vldrb_sg_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-            break;
-+DEF_HELPER_FLAGS_4(mve_vldrb_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-        default:
-+DEF_HELPER_FLAGS_4(mve_vldrb_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-            return -EINVAL;
-+DEF_HELPER_FLAGS_4(mve_vldrh_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-        }
-+DEF_HELPER_FLAGS_4(mve_vldrh_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-
-+DEF_HELPER_FLAGS_4(mve_vldrw_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+         /*
-+DEF_HELPER_FLAGS_4(mve_vldrd_sg_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+          * Write PR_MTE_TCF to SCTLR_EL1[TCF0].
-+
+-         * Note that the syscall values are consistent with hw.
-+DEF_HELPER_FLAGS_4(mve_vstrb_sg_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++         *
-+DEF_HELPER_FLAGS_4(mve_vstrb_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++         * The kernel has a per-cpu configuration for the sysadmin,
-+DEF_HELPER_FLAGS_4(mve_vstrb_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++         * /sys/devices/system/cpu/cpu<N>/mte_tcf_preferred,
-+DEF_HELPER_FLAGS_4(mve_vstrh_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++         * which qemu does not implement.
-+DEF_HELPER_FLAGS_4(mve_vstrh_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++         *
-+DEF_HELPER_FLAGS_4(mve_vstrw_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++         * Because there is no performance difference between the modes, and
-+DEF_HELPER_FLAGS_4(mve_vstrd_sg_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++         * because SYNC is most useful for debugging MTE errors, choose SYNC
-+
++         * as the preferred mode.  With this preference, and the way the API
-+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++         * uses only two bits, there is no way for the program to select
-+
++         * ASYMM mode.
-+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+          */
-+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-        env->cp15.sctlr_el[1] =
-+DEF_HELPER_FLAGS_4(mve_vldrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-            deposit64(env->cp15.sctlr_el[1], 38, 2, arg2 >> PR_MTE_TCF_SHIFT);
-+DEF_HELPER_FLAGS_4(mve_vldrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++        unsigned tcf = 0;
-+
++        if (arg2 & PR_MTE_TCF_SYNC) {
-+DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++            tcf = 1;
-+DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++        } else if (arg2 & PR_MTE_TCF_ASYNC) {
-+DEF_HELPER_FLAGS_4(mve_vstrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++            tcf = 2;
 +DEF_HELPER_FLAGS_4(mve_vstrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  &shl_scalar qda rm size
  &vmaxv qm rda size
  &vabav qn qm rda size
 +&vldst_sg qd qm rn size msize os
 +
 +# scatter-gather memory size is in bits 6:4
 +%sg_msize 6:1 4:1
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
  @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
 +@vldst_sg .... .... .... rn:4 .... ... size:2 ... ... os:1 &vldst_sg \
 +          qd=%qd qm=%qm msize=%sg_msize
 +
  @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
  @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
  @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
  VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
                   size=2 p=1
 +# gather loads/scatter stores
 +VLDR_S_sg        111 0 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
 +VLDR_U_sg        111 1 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
 +VSTR_sg          111 0 1100 1 . 00 .... ... 0 111 . .... .... @vldst_sg
 +
  # Moves between 2 32-bit vector lanes and 2 general purpose registers
  VMOV_to_2gp      1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
  VMOV_from_2gp    1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
  #undef DO_VLDR
  #undef DO_VSTR
 +/*
 + * Gather loads/scatter stores. Here each element of Qm specifies
 + * an offset to use from the base register Rm. In the _os_ versions
 + * that offset is scaled by the element size.
 + * For loads, predicated lanes are zeroed instead of retaining
 + * their previous values.
 + */
 +#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN)            \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        TYPE *d = vd;                                                   \
 +        OFFTYPE *m = vm;                                                \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        uint16_t eci_mask = mve_eci_mask(env);                          \
 +        unsigned e;                                                     \
 +        uint32_t addr;                                                  \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \
 +            if (!(eci_mask & 1)) {                                      \
 +                continue;                                               \
 +            }                                                           \
 +            addr = ADDRFN(base, m[H##ESIZE(e)]);                        \
 +            d[H##ESIZE(e)] = (mask & 1) ?                               \
 +                cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0;         \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +/* We know here TYPE is unsigned so always the same as the offset type */
 +#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN)                     \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        TYPE *d = vd;                                                   \
 +        TYPE *m = vm;                                                   \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        uint32_t addr;                                                  \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            addr = ADDRFN(base, m[H##ESIZE(e)]);                        \
 +            if (mask & 1) {                                             \
 +                cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
 +            }                                                           \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +/*
 + * 64-bit accesses are slightly different: they are done as two 32-bit
 + * accesses, controlled by the predicate mask for the relevant beat,
 + * and with a single 32-bit offset in the first of the two Qm elements.
 + * Note that for QEMU our IMPDEF AIRCR.ENDIANNESS is always 0 (little).
 + */
 +#define DO_VLDR64_SG(OP, ADDRFN)                                        \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        uint32_t *d = vd;                                               \
 +        uint32_t *m = vm;                                               \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        uint16_t eci_mask = mve_eci_mask(env);                          \
 +        unsigned e;                                                     \
 +        uint32_t addr;                                                  \
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) {      \
 +            if (!(eci_mask & 1)) {                                      \
 +                continue;                                               \
 +            }                                                           \
 +            addr = ADDRFN(base, m[H4(e & ~1)]);                         \
 +            addr += 4 * (e & 1);                                        \
 +            d[H4(e)] = (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) : 0; \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define DO_VSTR64_SG(OP, ADDRFN)                                        \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        uint32_t *d = vd;                                               \
 +        uint32_t *m = vm;                                               \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        uint32_t addr;                                                  \
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
 +            addr = ADDRFN(base, m[H4(e & ~1)]);                         \
 +            addr += 4 * (e & 1);                                        \
 +            if (mask & 1) {                                             \
 +                cpu_stl_data_ra(env, addr, d[H4(e)], GETPC());          \
 +            }                                                           \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define ADDR_ADD(BASE, OFFSET) ((BASE) + (OFFSET))
 +#define ADDR_ADD_OSH(BASE, OFFSET) ((BASE) + ((OFFSET) << 1))
 +#define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2))
 +#define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3))
 +
 +DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD)
 +DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD)
 +DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD)
 +
 +DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD)
 +DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD)
 +DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD)
 +DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD)
 +DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD)
 +DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD)
 +DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD)
 +
 +DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH)
 +DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH)
 +DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH)
 +DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW)
 +DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD)
 +
 +DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD)
 +DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD)
 +DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD)
 +DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD)
 +DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD)
 +DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD)
 +DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD)
 +
 +DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH)
 +DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH)
 +DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW)
 +DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD)
 +
  /*
   * The mergemask(D, R, M) macro performs the operation "*D = R" but
   * storing only the bytes which correspond to 1 bits in M,
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static inline int vidup_imm(DisasContext *s, int x)
  #include "decode-mve.c.inc"
  typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenLdStSGFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
@@ -XXX,XX +XXX,XX @@ DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h, MO_8)
  DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w, MO_8)
  DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w, MO_16)
 +static bool do_ldst_sg(DisasContext *s, arg_vldst_sg *a, MVEGenLdStSGFn fn)
 +{
 +    TCGv_i32 addr;
 +    TCGv_ptr qd, qm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd | a->qm) ||
 +        !fn || a->rn == 15) {
 +        /* Rn case is UNPREDICTABLE */
 +        return false;
 +    }
 +
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    addr = load_reg(s, a->rn);
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    qm = mve_qreg_ptr(a->qm);
 +    fn(cpu_env, qd, qm, addr);
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_ptr(qm);
 +    tcg_temp_free_i32(addr);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +/*
 + * The naming scheme here is "vldrb_sg_sh == in-memory byte loads
 + * signextended to halfword elements in register". _os_ indicates that
 + * the offsets in Qm should be scaled by the element size.
 + */
 +/* This macro is just to make the arrays more compact in these functions */
 +#define F(N) gen_helper_mve_##N
 +
 +/* VLDRB/VSTRB (ie msize 1) with OS=1 is UNPREDICTABLE; we UNDEF */
 +static bool trans_VLDR_S_sg(DisasContext *s, arg_vldst_sg *a)
 +{
 +    static MVEGenLdStSGFn * const fns[2][4][4] = { {
 +            { NULL, F(vldrb_sg_sh), F(vldrb_sg_sw), NULL },
 +            { NULL, NULL,           F(vldrh_sg_sw), NULL },
 +            { NULL, NULL,           NULL,           NULL },
 +            { NULL, NULL,           NULL,           NULL }
 +        }, {
 +            { NULL, NULL,              NULL,              NULL },
 +            { NULL, NULL,              F(vldrh_sg_os_sw), NULL },
 +            { NULL, NULL,              NULL,              NULL },
 +            { NULL, NULL,              NULL,              NULL }
 +        }
-+    };
++        env->cp15.sctlr_el[1] = deposit64(env->cp15.sctlr_el[1], 38, 2, tcf);
-+    if (a->qd == a->qm) {
-+        return false; /* UNPREDICTABLE */
+         /*
-+    }
+          * Write PR_MTE_TAG to GCR_EL1[Exclude].
 +    return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
 +}
 +
 +static bool trans_VLDR_U_sg(DisasContext *s, arg_vldst_sg *a)
 +{
 +    static MVEGenLdStSGFn * const fns[2][4][4] = { {
 +            { F(vldrb_sg_ub), F(vldrb_sg_uh), F(vldrb_sg_uw), NULL },
 +            { NULL,           F(vldrh_sg_uh), F(vldrh_sg_uw), NULL },
 +            { NULL,           NULL,           F(vldrw_sg_uw), NULL },
 +            { NULL,           NULL,           NULL,           F(vldrd_sg_ud) }
 +        }, {
 +            { NULL, NULL,              NULL,              NULL },
 +            { NULL, F(vldrh_sg_os_uh), F(vldrh_sg_os_uw), NULL },
 +            { NULL, NULL,              F(vldrw_sg_os_uw), NULL },
 +            { NULL, NULL,              NULL,              F(vldrd_sg_os_ud) }
 +        }
 +    };
 +    if (a->qd == a->qm) {
 +        return false; /* UNPREDICTABLE */
 +    }
 +    return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
 +}
 +
 +static bool trans_VSTR_sg(DisasContext *s, arg_vldst_sg *a)
 +{
 +    static MVEGenLdStSGFn * const fns[2][4][4] = { {
 +            { F(vstrb_sg_ub), F(vstrb_sg_uh), F(vstrb_sg_uw), NULL },
 +            { NULL,           F(vstrh_sg_uh), F(vstrh_sg_uw), NULL },
 +            { NULL,           NULL,           F(vstrw_sg_uw), NULL },
 +            { NULL,           NULL,           NULL,           F(vstrd_sg_ud) }
 +        }, {
 +            { NULL, NULL,              NULL,              NULL },
 +            { NULL, F(vstrh_sg_os_uh), F(vstrh_sg_os_uw), NULL },
 +            { NULL, NULL,              F(vstrw_sg_os_uw), NULL },
 +            { NULL, NULL,              NULL,              F(vstrd_sg_os_ud) }
 +        }
 +    };
 +    return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
 +}
 +
 +#undef F
 +
  static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
  {
      TCGv_ptr qd;
 --
-.20.1
+.34.1

-[PULL 43/44] fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
+[PULL 03/35] target/arm: Fix nregs computation in do_{ld,st}_zpa
-From: Guenter Roeck <linux@roeck-us.net>
+From: Richard Henderson <richard.henderson@linaro.org>
-Instantiate SAI1/2/3 as unimplemented devices to avoid Linux kernel crashes
+The field is encoded as [0-3], which is convenient for
-such as the following.
+indexing our array of function pointers, but the true
 value is [1-4].  Adjust before calling do_mem_zpa.
-Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000
+Add an assert, and move the comment re passing ZT to
-pgd = (ptrval)
+the helper back next to the relevant code.
 [d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453
 Internal error: : 808 [#1] SMP ARM
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc5 #1
 ...
 [<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
 [<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
 [<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60)
 [<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec)
 [<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
 [<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
 [<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
 [<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
 [<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c)
 [<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
 [<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
 [<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
 [<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
 [<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
 [<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
 [<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
 [<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
 [<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
 [<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
 [<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
 [<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
-Signed-off-by: Guenter Roeck <linux@roeck-us.net>
+Cc: qemu-stable@nongnu.org
-Message-id: 20210810175607.538090-1-linux@roeck-us.net
+Fixes: 206adacfb8d ("target/arm: Add mte helpers for sve scalar + int loads")
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
 Message-id: 20240207025210.8837-3-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/arm/fsl-imx7.h | 5 +++++
+ target/arm/tcg/translate-sve.c | 16 ++++++++--------
- hw/arm/fsl-imx7.c         | 7 +++++++
+file changed, 8 insertions(+), 8 deletions(-)
 files changed, 12 insertions(+)
-diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/fsl-imx7.h
+--- a/target/arm/tcg/translate-sve.c
-+++ b/include/hw/arm/fsl-imx7.h
++++ b/target/arm/tcg/translate-sve.c
-@@ -XXX,XX +XXX,XX @@ enum FslIMX7MemoryMap {
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-     FSL_IMX7_UART6_ADDR           = 0x30A80000,
+     TCGv_ptr t_pg;
-     FSL_IMX7_UART7_ADDR           = 0x30A90000,
+     int desc = 0;
-+    FSL_IMX7_SAI1_ADDR            = 0x308A0000,
+-    /*
-+    FSL_IMX7_SAI2_ADDR            = 0x308B0000,
+-     * For e.g. LD4, there are not enough arguments to pass all 4
-+    FSL_IMX7_SAI3_ADDR            = 0x308C0000,
+-     * registers as pointers, so encode the regno into the data field.
-+    FSL_IMX7_SAIn_SIZE            = 0x10000,
+-     * For consistency, do this even for LD1.
-+
+-     */
-     FSL_IMX7_ENET1_ADDR           = 0x30BE0000,
++    assert(mte_n >= 1 && mte_n <= 4);
-     FSL_IMX7_ENET2_ADDR           = 0x30BF0000,
+     if (s->mte_active[0]) {
+         int msz = dtype_msz(dtype);
-diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
---- a/hw/arm/fsl-imx7.c
+         addr = clean_data_tbi(s, addr);
-+++ b/hw/arm/fsl-imx7.c
+     }
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      create_unimplemented_device("can1", FSL_IMX7_CAN1_ADDR, FSL_IMX7_CANn_SIZE);
      create_unimplemented_device("can2", FSL_IMX7_CAN2_ADDR, FSL_IMX7_CANn_SIZE);
 +    /*
-+     * SAI (Audio SSI (Synchronous Serial Interface))
++     * For e.g. LD4, there are not enough arguments to pass all 4
 +     * registers as pointers, so encode the regno into the data field.
 +     * For consistency, do this even for LD1.
 +     */
-+    create_unimplemented_device("sai1", FSL_IMX7_SAI1_ADDR, FSL_IMX7_SAIn_SIZE);
+     desc = simd_desc(vsz, vsz, zt | desc);
-+    create_unimplemented_device("sai2", FSL_IMX7_SAI2_ADDR, FSL_IMX7_SAIn_SIZE);
+     t_pg = tcg_temp_new_ptr();
-+    create_unimplemented_device("sai2", FSL_IMX7_SAI3_ADDR, FSL_IMX7_SAIn_SIZE);
-+
+@@ -XXX,XX +XXX,XX @@ static void do_ld_zpa(DisasContext *s, int zt, int pg,
-     /*
+      * accessible via the instruction encoding.
       * OCOTP
       */
+     assert(fn != NULL);
+-    do_mem_zpa(s, zt, pg, addr, dtype, nreg, false, fn);
++    do_mem_zpa(s, zt, pg, addr, dtype, nreg + 1, false, fn);
+ }
+ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a)
+@@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
+     if (nreg == 0) {
+         /* ST1 */
+         fn = fn_single[s->mte_active[0]][be][msz][esz];
+-        nreg = 1;
+     } else {
+         /* ST2, ST3, ST4 -- msz == esz, enforced by encoding */
+         assert(msz == esz);
+         fn = fn_multiple[s->mte_active[0]][be][nreg - 1][msz];
+     }
+     assert(fn != NULL);
+-    do_mem_zpa(s, zt, pg, addr, msz_dtype(s, msz), nreg, true, fn);
++    do_mem_zpa(s, zt, pg, addr, msz_dtype(s, msz), nreg + 1, true, fn);
+ }
+ static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a)
 --
-.20.1
+.34.1

-[PULL 42/44] sbsa-ref: Rename SBSA_GWDT enum value
+[PULL 04/35] target/arm: Adjust and validate mtedesc sizem1
-From: Eduardo Habkost <ehabkost@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-The SBSA_GWDT enum value conflicts with the SBSA_GWDT() QOM type
+When we added SVE_MTEDESC_SHIFT, we effectively limited the
-checking helper, preventing us from using a OBJECT_DEFINE* or
+maximum size of MTEDESC.  Adjust SIZEM1 to consume the remaining
-DEFINE_INSTANCE_CHECKER macro for the SBSA_GWDT() wrapper.
+bits (32 - 10 - 5 - 12 == 5).  Assert that the data to be stored
 fits within the field (expecting 8 * 4 - 1 == 31, exact fit).
-If I understand the SBSA 6.0 specification correctly, the signal
+Cc: qemu-stable@nongnu.org
 being connected to IRQ 16 is the WS0 output signal from the
 Generic Watchdog.  Rename the enum value to SBSA_GWDT_WS0 to be
 more explicit and avoid the name conflict.
 Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
 Message-id: 20210806023119.431680-1-ehabkost@redhat.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/sbsa-ref.c | 6 +++---
+ target/arm/internals.h         | 2 +-
-file changed, 3 insertions(+), 3 deletions(-)
+ target/arm/tcg/translate-sve.c | 7 ++++---
 files changed, 5 insertions(+), 4 deletions(-)
-diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/sbsa-ref.c
+--- a/target/arm/internals.h
-+++ b/hw/arm/sbsa-ref.c
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ enum {
+@@ -XXX,XX +XXX,XX @@ FIELD(MTEDESC, TBI,   4, 2)
-     SBSA_GIC_DIST,
+ FIELD(MTEDESC, TCMA,  6, 2)
-     SBSA_GIC_REDIST,
+ FIELD(MTEDESC, WRITE, 8, 1)
-     SBSA_SECURE_EC,
+ FIELD(MTEDESC, ALIGN, 9, 3)
--    SBSA_GWDT,
+-FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - 12)  /* size - 1 */
-+    SBSA_GWDT_WS0,
++FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - SVE_MTEDESC_SHIFT - 12)  /* size - 1 */
-     SBSA_GWDT_REFRESH,
-     SBSA_GWDT_CONTROL,
+ bool mte_probe(CPUARMState *env, uint32_t desc, uint64_t ptr);
-     SBSA_SMMU,
+ uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra);
-@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
-     [SBSA_AHCI] = 10,
+index XXXXXXX..XXXXXXX 100644
-     [SBSA_EHCI] = 11,
+--- a/target/arm/tcg/translate-sve.c
-     [SBSA_SMMU] = 12, /* ... to 15 */
++++ b/target/arm/tcg/translate-sve.c
--    [SBSA_GWDT] = 16,
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-+    [SBSA_GWDT_WS0] = 16,
+ {
- };
+     unsigned vsz = vec_full_reg_size(s);
+     TCGv_ptr t_pg;
- static const char * const valid_cpus[] = {
++    uint32_t sizem1;
-@@ -XXX,XX +XXX,XX @@ static void create_wdt(const SBSAMachineState *sms)
+     int desc = 0;
-     hwaddr cbase = sbsa_ref_memmap[SBSA_GWDT_CONTROL].base;
-     DeviceState *dev = qdev_new(TYPE_WDT_SBSA);
+     assert(mte_n >= 1 && mte_n <= 4);
-     SysBusDevice *s = SYS_BUS_DEVICE(dev);
++    sizem1 = (mte_n << dtype_msz(dtype)) - 1;
--    int irq = sbsa_ref_irqmap[SBSA_GWDT];
++    assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
-+    int irq = sbsa_ref_irqmap[SBSA_GWDT_WS0];
+     if (s->mte_active[0]) {
+-        int msz = dtype_msz(dtype);
-     sysbus_realize_and_unref(s, &error_fatal);
+-
-     sysbus_mmio_map(s, 0, rbase);
+         desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
          desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
          desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
          desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
 -        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (mte_n << msz) - 1);
 +        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
          desc <<= SVE_MTEDESC_SHIFT;
      } else {
          addr = clean_data_tbi(s, addr);
 --
-.20.1
+.34.1

-[PULL 31/44] target/arm: Implement MVE VPNOT
+[PULL 05/35] target/arm: Split out make_svemte_desc
-Implement the MVE VPNOT insn, which inverts the bits in VPR.P0
+From: Richard Henderson <richard.henderson@linaro.org>
 (subject to both predication and to beatwise execution).
+Share code that creates mtedesc and embeds within simd_desc.
+Cc: qemu-stable@nongnu.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-5-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/helper-mve.h    |  1 +
+ target/arm/tcg/translate-a64.h |  2 ++
- target/arm/mve.decode      |  1 +
+ target/arm/tcg/translate-sme.c | 15 +++--------
- target/arm/mve_helper.c    | 17 +++++++++++++++++
+ target/arm/tcg/translate-sve.c | 47 ++++++++++++++++++----------------
- target/arm/translate-mve.c | 19 +++++++++++++++++++
+files changed, 31 insertions(+), 33 deletions(-)
 files changed, 38 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/target/arm/tcg/translate-a64.h
-+++ b/target/arm/helper-mve.h
++++ b/target/arm/tcg/translate-a64.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
- DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ bool sve_access_check(DisasContext *s);
+ bool sme_enabled_check(DisasContext *s);
- DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ bool sme_enabled_check_with_svcr(DisasContext *s, unsigned);
-+DEF_HELPER_FLAGS_1(mve_vpnot, TCG_CALL_NO_WG, void, env)
++uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs,
++                          uint32_t msz, bool is_write, uint32_t data);
- DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ /* This function corresponds to CheckStreamingSVEEnabled. */
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+ static inline bool sme_sm_enabled_check(DisasContext *s)
 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/target/arm/tcg/translate-sme.c
-+++ b/target/arm/mve.decode
++++ b/target/arm/tcg/translate-sme.c
-@@ -XXX,XX +XXX,XX @@ VCMPGT            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
+@@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
- VCMPLE            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
+     TCGv_ptr t_za, t_pg;
      TCGv_i64 addr;
 -    int svl, desc = 0;
 +    uint32_t desc;
      bool be = s->be_data == MO_BE;
      bool mte = s->mte_active[0];
@@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
      tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz);
      tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
 -    if (mte) {
 -        desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
 -        desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
 -        desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
 -        desc = FIELD_DP32(desc, MTEDESC, WRITE, a->st);
 -        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << a->esz) - 1);
 -        desc <<= SVE_MTEDESC_SHIFT;
 -    } else {
 +    if (!mte) {
          addr = clean_data_tbi(s, addr);
      }
 -    svl = streaming_vec_reg_size(s);
 -    desc = simd_desc(svl, svl, desc);
 +
 +    desc = make_svemte_desc(s, streaming_vec_reg_size(s), 1, a->esz, a->st, 0);
      fns[a->esz][be][a->v][mte][a->st](tcg_env, t_za, t_pg, addr,
                                        tcg_constant_i32(desc));
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sve.c
 +++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static const uint8_t dtype_esz[16] = {
 , 2, 1, 3
  };
 -static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
 -                       int dtype, uint32_t mte_n, bool is_write,
 -                       gen_helper_gvec_mem *fn)
 +uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs,
 +                          uint32_t msz, bool is_write, uint32_t data)
  {
-+  VPNOT           1111 1110 0 0 11 000 1 000 0 1111 0100 1101
+-    unsigned vsz = vec_full_reg_size(s);
-   VPST            1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
+-    TCGv_ptr t_pg;
-   VCMPEQ_scalar   1111 1110 0 . .. ... 1 ... 0 1111 0 1 0 0 .... @vcmp_scalar
+     uint32_t sizem1;
- }
+-    int desc = 0;
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
++    uint32_t desc = 0;
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
+-    assert(mte_n >= 1 && mte_n <= 4);
-+++ b/target/arm/mve_helper.c
+-    sizem1 = (mte_n << dtype_msz(dtype)) - 1;
-@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
++    /* Assert all of the data fits, with or without MTE enabled. */
-     mve_advance_vpt(env);
++    assert(nregs >= 1 && nregs <= 4);
- }
++    sizem1 = (nregs << msz) - 1;
+     assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
-+void HELPER(mve_vpnot)(CPUARMState *env)
++    assert(data < 1u << SVE_MTEDESC_SHIFT);
-+{
++
-+    /*
+     if (s->mte_active[0]) {
-+     * P0 bits for unexecuted beats (where eci_mask is 0) are unchanged.
+         desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
-+     * P0 bits for predicated lanes in executed bits (where mask is 0) are 0.
+         desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
-+     * P0 bits otherwise are inverted.
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-+     * (This is the same logic as VCMP.)
+         desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
-+     * This insn is itself subject to predication and to beat-wise execution,
+         desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
-+     * and after it executes VPT state advances in the usual way.
+         desc <<= SVE_MTEDESC_SHIFT;
-+     */
+-    } else {
-+    uint16_t mask = mve_element_mask(env);
++    }
-+    uint16_t eci_mask = mve_eci_mask(env);
++    return simd_desc(vsz, vsz, desc | data);
 +    uint16_t beatpred = ~env->v7m.vpr & mask;
 +    env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | (beatpred & eci_mask);
 +    mve_advance_vpt(env);
 +}
 +
- #define DO_1OP_SAT(OP, ESIZE, TYPE, FN)                                 \
++static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
++                       int dtype, uint32_t nregs, bool is_write,
-     {                                                                   \
++                       gen_helper_gvec_mem *fn)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
++{
-index XXXXXXX..XXXXXXX 100644
++    TCGv_ptr t_pg;
---- a/target/arm/translate-mve.c
++    uint32_t desc;
-+++ b/target/arm/translate-mve.c
++
-@@ -XXX,XX +XXX,XX @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
++    if (!s->mte_active[0]) {
-     return true;
+         addr = clean_data_tbi(s, addr);
      }
@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
       * registers as pointers, so encode the regno into the data field.
       * For consistency, do this even for LD1.
       */
 -    desc = simd_desc(vsz, vsz, zt | desc);
 +    desc = make_svemte_desc(s, vec_full_reg_size(s), nregs,
 +                            dtype_msz(dtype), is_write, zt);
      t_pg = tcg_temp_new_ptr();
      tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
@@ -XXX,XX +XXX,XX @@ static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm,
                         int scale, TCGv_i64 scalar, int msz, bool is_write,
                         gen_helper_gvec_mem_scatter *fn)
  {
 -    unsigned vsz = vec_full_reg_size(s);
      TCGv_ptr t_zm = tcg_temp_new_ptr();
      TCGv_ptr t_pg = tcg_temp_new_ptr();
      TCGv_ptr t_zt = tcg_temp_new_ptr();
 -    int desc = 0;
 -
 -    if (s->mte_active[0]) {
 -        desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
 -        desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
 -        desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
 -        desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
 -        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << msz) - 1);
 -        desc <<= SVE_MTEDESC_SHIFT;
 -    }
 -    desc = simd_desc(vsz, vsz, desc | scale);
 +    uint32_t desc;
      tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
      tcg_gen_addi_ptr(t_zm, tcg_env, vec_full_reg_offset(s, zm));
      tcg_gen_addi_ptr(t_zt, tcg_env, vec_full_reg_offset(s, zt));
 +
 +    desc = make_svemte_desc(s, vec_full_reg_size(s), 1, msz, is_write, scale);
      fn(tcg_env, t_zt, t_pg, t_zm, scalar, tcg_constant_i32(desc));
  }
-+static bool trans_VPNOT(DisasContext *s, arg_VPNOT *a)
-+{
-+    /*
-+     * Invert the predicate in VPR.P0. We have call out to
-+     * a helper because this insn itself is beatwise and can
-+     * be predicated.
-+     */
-+    if (!dc_isar_feature(aa32_mve, s)) {
-+        return false;
-+    }
-+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    gen_helper_mve_vpnot(cpu_env);
-+    mve_update_eci(s);
-+    return true;
-+}
-+
- static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
- {
-     /* VADDV: vector add across vector */
 --
-.20.1
+.34.1

-[PULL 30/44] target/arm: Implement MVE VMOV to/from 2 general-purpose registers
+[PULL 06/35] target/arm: Handle mte in do_ldrq, do_ldro
-Implement the MVE VMOV forms that move data between 2 general-purpose
+From: Richard Henderson <richard.henderson@linaro.org>
 registers and 2 32-bit lanes in a vector register.
+These functions "use the standard load helpers", but
+fail to clean_data_tbi or populate mtedesc.
+Cc: qemu-stable@nongnu.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-6-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/translate-a32.h |  1 +
+ target/arm/tcg/translate-sve.c | 15 +++++++++++++--
- target/arm/mve.decode      |  4 ++
+file changed, 13 insertions(+), 2 deletions(-)
  target/arm/translate-mve.c | 85 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-vfp.c |  2 +-
 files changed, 91 insertions(+), 1 deletion(-)
-diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a32.h
+--- a/target/arm/tcg/translate-sve.c
-+++ b/target/arm/translate-a32.h
++++ b/target/arm/tcg/translate-sve.c
-@@ -XXX,XX +XXX,XX @@ void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
+@@ -XXX,XX +XXX,XX @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
- void clear_eci_state(DisasContext *s);
+     unsigned vsz = vec_full_reg_size(s);
- bool mve_eci_check(DisasContext *s);
+     TCGv_ptr t_pg;
- void mve_update_and_store_eci(DisasContext *s);
+     int poff;
-+bool mve_skip_vmov(DisasContext *s, int vn, int index, int size);
++    uint32_t desc;
- static inline TCGv_i32 load_cpu_offset(int offset)
+     /* Load the first quadword using the normal predicated load helpers.  */
- {
++    if (!s->mte_active[0]) {
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
++        addr = clean_data_tbi(s, addr);
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
  VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
                   size=2 p=1
 +# Moves between 2 32-bit vector lanes and 2 general purpose registers
 +VMOV_to_2gp      1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
 +VMOV_from_2gp    1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
 +
  # Vector 2-op
  VAND             1110 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
  VBIC             1110 1111 0 . 01 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool do_vabav(DisasContext *s, arg_vabav *a, MVEGenVABAVFn *fn)
  DO_VABAV(VABAV_S, vabavs)
  DO_VABAV(VABAV_U, vabavu)
 +
 +static bool trans_VMOV_to_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
 +{
 +    /*
 +     * VMOV two 32-bit vector lanes to two general-purpose registers.
 +     * This insn is not predicated but it is subject to beat-wise
 +     * execution if it is not in an IT block. For us this means
 +     * only that if PSR.ECI says we should not be executing the beat
 +     * corresponding to the lane of the vector register being accessed
 +     * then we should skip perfoming the move, and that we need to do
 +     * the usual check for bad ECI state and advance of ECI state.
 +     * (If PSR.ECI is non-zero then we cannot be in an IT block.)
 +     */
 +    TCGv_i32 tmp;
 +    int vd;
 +
 +    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
 +        a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15 ||
 +        a->rt == a->rt2) {
 +        /* Rt/Rt2 cases are UNPREDICTABLE */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    /* Convert Qreg index to Dreg for read_neon_element32() etc */
+     poff = pred_full_reg_offset(s, pg);
-+    vd = a->qd * 2;
+     if (vsz > 16) {
-+
+         /*
-+    if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
+@@ -XXX,XX +XXX,XX @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
-+        tmp = tcg_temp_new_i32();
-+        read_neon_element32(tmp, vd, a->idx, MO_32);
+     gen_helper_gvec_mem *fn
-+        store_reg(s, a->rt, tmp);
+         = ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
 -    fn(tcg_env, t_pg, addr, tcg_constant_i32(simd_desc(16, 16, zt)));
 +    desc = make_svemte_desc(s, 16, 1, dtype_msz(dtype), false, zt);
 +    fn(tcg_env, t_pg, addr, tcg_constant_i32(desc));
      /* Replicate that first quadword.  */
      if (vsz > 16) {
@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
      unsigned vsz_r32;
      TCGv_ptr t_pg;
      int poff, doff;
 +    uint32_t desc;
      if (vsz < 32) {
          /*
@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
      }
      /* Load the first octaword using the normal predicated load helpers.  */
 +    if (!s->mte_active[0]) {
 +        addr = clean_data_tbi(s, addr);
 +    }
-+    if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
-+        tmp = tcg_temp_new_i32();
+     poff = pred_full_reg_offset(s, pg);
-+        read_neon_element32(tmp, vd + 1, a->idx, MO_32);
+     if (vsz > 32) {
-+        store_reg(s, a->rt2, tmp);
+@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
-+    }
-+
+     gen_helper_gvec_mem *fn
-+    mve_update_and_store_eci(s);
+         = ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
-+    return true;
+-    fn(tcg_env, t_pg, addr, tcg_constant_i32(simd_desc(32, 32, zt)));
-+}
++    desc = make_svemte_desc(s, 32, 1, dtype_msz(dtype), false, zt);
-+
++    fn(tcg_env, t_pg, addr, tcg_constant_i32(desc));
-+static bool trans_VMOV_from_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
 +{
 +    /*
 +     * VMOV two general-purpose registers to two 32-bit vector lanes.
 +     * This insn is not predicated but it is subject to beat-wise
 +     * execution if it is not in an IT block. For us this means
 +     * only that if PSR.ECI says we should not be executing the beat
 +     * corresponding to the lane of the vector register being accessed
 +     * then we should skip perfoming the move, and that we need to do
 +     * the usual check for bad ECI state and advance of ECI state.
 +     * (If PSR.ECI is non-zero then we cannot be in an IT block.)
 +     */
 +    TCGv_i32 tmp;
 +    int vd;
 +
 +    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
 +        a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15) {
 +        /* Rt/Rt2 cases are UNPREDICTABLE */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    /* Convert Qreg idx to Dreg for read_neon_element32() etc */
 +    vd = a->qd * 2;
 +
 +    if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
 +        tmp = load_reg(s, a->rt);
 +        write_neon_element32(tmp, vd, a->idx, MO_32);
 +        tcg_temp_free_i32(tmp);
 +    }
 +    if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
 +        tmp = load_reg(s, a->rt2);
 +        write_neon_element32(tmp, vd + 1, a->idx, MO_32);
 +        tcg_temp_free_i32(tmp);
 +    }
 +
 +    mve_update_and_store_eci(s);
 +    return true;
 +}
 diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c
 +++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
      return true;
  }
 -static bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
 +bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
  {
      /*
-      * In a CPU with MVE, the VMOV (vector lane to general-purpose register)
+      * Replicate that first octaword.
 --
-.20.1
+.34.1

-[PULL 16/44] target/arm: Implement MVE integer vector-vs-scalar comparisons
+[PULL 07/35] target/arm: Fix SVE/SME gross MTE suppression checks
-Implement the MVE integer vector comparison instructions that compare
+From: Richard Henderson <richard.henderson@linaro.org>
 each element against a scalar from a general purpose register.  These
 are "VCMP (vector)" encodings T4, T5 and T6 and "VPT (vector)"
 encodings T4, T5 and T6.
-We have to move the decodetree pattern for VPST, because it
+The TBI and TCMA bits are located within mtedesc, not desc.
 overlaps with VCMP T4 with size = 0b11.
+Cc: qemu-stable@nongnu.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-7-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/helper-mve.h    | 32 +++++++++++++++++++++++++++
+ target/arm/tcg/sme_helper.c |  8 ++++----
- target/arm/mve.decode      | 18 +++++++++++++---
+ target/arm/tcg/sve_helper.c | 12 ++++++------
- target/arm/mve_helper.c    | 44 +++++++++++++++++++++++++++++++-------
+files changed, 10 insertions(+), 10 deletions(-)
  target/arm/translate-mve.c | 43 +++++++++++++++++++++++++++++++++++++
 files changed, 126 insertions(+), 11 deletions(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/target/arm/tcg/sme_helper.c
-+++ b/target/arm/helper-mve.h
++++ b/target/arm/tcg/sme_helper.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vcmpgtw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ void sme_ld1_mte(CPUARMState *env, void *za, uint64_t *vg,
- DEF_HELPER_FLAGS_3(mve_vcmpleb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
- DEF_HELPER_FLAGS_3(mve_vcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vcmplew, TCG_CALL_NO_WG, void, env, ptr, ptr)
+     /* Perform gross MTE suppression early. */
-+
+-    if (!tbi_check(desc, bit55) ||
-+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
-+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
++    if (!tbi_check(mtedesc, bit55) ||
-+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
-+
+         mtedesc = 0;
-+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
+     }
-+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ void sme_st1_mte(CPUARMState *env, void *za, uint64_t *vg, target_ulong addr,
-+
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
-+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+     /* Perform gross MTE suppression early. */
-+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+-    if (!tbi_check(desc, bit55) ||
-+
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
-+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
++    if (!tbi_check(mtedesc, bit55) ||
-+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
-+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+         mtedesc = 0;
-+
+     }
-+DEF_HELPER_FLAGS_3(mve_vcmpge_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vcmpge_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
 +DEF_HELPER_FLAGS_3(mve_vcmpge_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmplt_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vcmplt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vcmplt_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmple_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vcmple_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vcmple_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/target/arm/tcg/sve_helper.c
-+++ b/target/arm/mve.decode
++++ b/target/arm/tcg/sve_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ void sve_ldN_r_mte(CPUARMState *env, uint64_t *vg, target_ulong addr,
- &vidup qd rn size imm
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
- &viwdup qd rn rm size imm
- &vcmp qm qn size mask
+     /* Perform gross MTE suppression early. */
-+&vcmp_scalar qn rm size mask
+-    if (!tbi_check(desc, bit55) ||
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
- @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
++    if (!tbi_check(mtedesc, bit55) ||
- # Note that both Rn and Qd are 3 bits only (no D bit)
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
-@@ -XXX,XX +XXX,XX @@
+         mtedesc = 0;
  # Vector comparison; 4-bit Qm but 3-bit Qn
  %mask_22_13      22:1 13:3
  @vcmp    .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
 +@vcmp_scalar .... .... .. size:2 qn:3 . .... .... .... rm:4 &vcmp_scalar \
 +             mask=%mask_22_13
  # Vector loads and stores
@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
                   rdahi=%rdahi rdalo=%rdalo
  }
 -# Predicate operations
 -VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 -
  # Logical immediate operations (1 reg and modified-immediate)
  # The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
@@ -XXX,XX +XXX,XX @@ VCMPGE            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
  VCMPLT            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
  VCMPGT            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
  VCMPLE            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
 +
 +{
 +  VPST            1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 +  VCMPEQ_scalar   1111 1110 0 . .. ... 1 ... 0 1111 0 1 0 0 .... @vcmp_scalar
 +}
 +VCMPNE_scalar     1111 1110 0 . .. ... 1 ... 0 1111 1 1 0 0 .... @vcmp_scalar
 +VCMPCS_scalar     1111 1110 0 . .. ... 1 ... 0 1111 0 1 1 0 .... @vcmp_scalar
 +VCMPHI_scalar     1111 1110 0 . .. ... 1 ... 0 1111 1 1 1 0 .... @vcmp_scalar
 +VCMPGE_scalar     1111 1110 0 . .. ... 1 ... 1 1111 0 1 0 0 .... @vcmp_scalar
 +VCMPLT_scalar     1111 1110 0 . .. ... 1 ... 1 1111 1 1 0 0 .... @vcmp_scalar
 +VCMPGT_scalar     1111 1110 0 . .. ... 1 ... 1 1111 0 1 1 0 .... @vcmp_scalar
 +VCMPLE_scalar     1111 1110 0 . .. ... 1 ... 1 1111 1 1 1 0 .... @vcmp_scalar
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
          mve_advance_vpt(env);                                           \
      }
--#define DO_VCMP_S(OP, FN)                       \
+@@ -XXX,XX +XXX,XX @@ void sve_ldnfff1_r_mte(CPUARMState *env, void *vg, target_ulong addr,
--    DO_VCMP(OP##b, 1, int8_t, FN)               \
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
--    DO_VCMP(OP##h, 2, int16_t, FN)              \
--    DO_VCMP(OP##w, 4, int32_t, FN)
+     /* Perform gross MTE suppression early. */
-+#define DO_VCMP_SCALAR(OP, ESIZE, TYPE, FN)                             \
+-    if (!tbi_check(desc, bit55) ||
-+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,             \
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
-+                                uint32_t rm)                            \
++    if (!tbi_check(mtedesc, bit55) ||
-+    {                                                                   \
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
-+        TYPE *n = vn;                                                   \
+         mtedesc = 0;
 +        uint16_t mask = mve_element_mask(env);                          \
 +        uint16_t eci_mask = mve_eci_mask(env);                          \
 +        uint16_t beatpred = 0;                                          \
 +        uint16_t emask = MAKE_64BIT_MASK(0, ESIZE);                     \
 +        unsigned e;                                                     \
 +        for (e = 0; e < 16 / ESIZE; e++) {                              \
 +            bool r = FN(n[H##ESIZE(e)], (TYPE)rm);                      \
 +            /* Comparison sets 0/1 bits for each byte in the element */ \
 +            beatpred |= r * emask;                                      \
 +            emask <<= ESIZE;                                            \
 +        }                                                               \
 +        beatpred &= mask;                                               \
 +        env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) |           \
 +            (beatpred & eci_mask);                                      \
 +        mve_advance_vpt(env);                                           \
 +    }
 -#define DO_VCMP_U(OP, FN)                       \
 -    DO_VCMP(OP##b, 1, uint8_t, FN)              \
 -    DO_VCMP(OP##h, 2, uint16_t, FN)             \
 -    DO_VCMP(OP##w, 4, uint32_t, FN)
 +#define DO_VCMP_S(OP, FN)                               \
 +    DO_VCMP(OP##b, 1, int8_t, FN)                       \
 +    DO_VCMP(OP##h, 2, int16_t, FN)                      \
 +    DO_VCMP(OP##w, 4, int32_t, FN)                      \
 +    DO_VCMP_SCALAR(OP##_scalarb, 1, int8_t, FN)         \
 +    DO_VCMP_SCALAR(OP##_scalarh, 2, int16_t, FN)        \
 +    DO_VCMP_SCALAR(OP##_scalarw, 4, int32_t, FN)
 +
 +#define DO_VCMP_U(OP, FN)                               \
 +    DO_VCMP(OP##b, 1, uint8_t, FN)                      \
 +    DO_VCMP(OP##h, 2, uint16_t, FN)                     \
 +    DO_VCMP(OP##w, 4, uint32_t, FN)                     \
 +    DO_VCMP_SCALAR(OP##_scalarb, 1, uint8_t, FN)        \
 +    DO_VCMP_SCALAR(OP##_scalarh, 2, uint16_t, FN)       \
 +    DO_VCMP_SCALAR(OP##_scalarw, 4, uint32_t, FN)
  #define DO_EQ(N, M) ((N) == (M))
  #define DO_NE(N, M) ((N) != (M))
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
  typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
  typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
  typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 +typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
      return true;
  }
 +static bool do_vcmp_scalar(DisasContext *s, arg_vcmp_scalar *a,
 +                           MVEGenScalarCmpFn *fn)
 +{
 +    TCGv_ptr qn;
 +    TCGv_i32 rm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) || !fn || a->rm == 13) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qn = mve_qreg_ptr(a->qn);
 +    if (a->rm == 15) {
 +        /* Encoding Rm=0b1111 means "constant zero" */
 +        rm = tcg_constant_i32(0);
 +    } else {
 +        rm = load_reg(s, a->rm);
 +    }
 +    fn(cpu_env, qn, rm);
 +    tcg_temp_free_ptr(qn);
 +    tcg_temp_free_i32(rm);
 +    if (a->mask) {
 +        /* VPT */
 +        gen_vpst(s, a->mask);
 +    }
 +    mve_update_eci(s);
 +    return true;
 +}
 +
  #define DO_VCMP(INSN, FN)                                       \
      static bool trans_##INSN(DisasContext *s, arg_vcmp *a)      \
      {                                                           \
@@ -XXX,XX +XXX,XX @@ static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
              NULL,                                               \
          };                                                      \
          return do_vcmp(s, a, fns[a->size]);                     \
 +    }                                                           \
 +    static bool trans_##INSN##_scalar(DisasContext *s,          \
 +                                      arg_vcmp_scalar *a)       \
 +    {                                                           \
 +        static MVEGenScalarCmpFn * const fns[] = {              \
 +            gen_helper_mve_##FN##_scalarb,                      \
 +            gen_helper_mve_##FN##_scalarh,                      \
 +            gen_helper_mve_##FN##_scalarw,                      \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_vcmp_scalar(s, a, fns[a->size]);              \
      }
- DO_VCMP(VCMPEQ, vcmpeq)
+@@ -XXX,XX +XXX,XX @@ void sve_stN_r_mte(CPUARMState *env, uint64_t *vg, target_ulong addr,
      desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
      /* Perform gross MTE suppression early. */
 -    if (!tbi_check(desc, bit55) ||
 -        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
 +    if (!tbi_check(mtedesc, bit55) ||
 +        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
          mtedesc = 0;
      }
 --
-.20.1
+.34.1

-[PULL 20/44] target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats
+[PULL 08/35] hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
-All the users of the vmlaldav formats have an 'x bit in bit 12 and an
+The raven_io_ops MemoryRegionOps is the only one in the source tree
-'a' bit in bit 5; move these to the format rather than specifying them
+which sets .valid.unaligned to indicate that it should support
-in each insn pattern.
+unaligned accesses and which does not also set .impl.unaligned to
 indicate that its read and write functions can do the unaligned
 handling themselves.  This is a problem, because at the moment the
 core memory system does not implement the support for handling
 unaligned accesses by doing a series of aligned accesses and
 combining them (system/memory.c:access_with_adjusted_size() has a
 TODO comment noting this).
+Fortunately raven_io_read() and raven_io_write() will correctly deal
+with the case of being passed an unaligned address, so we can fix the
+missing unaligned access support by setting .impl.unaligned in the
+MemoryRegionOps struct.
+Fixes: 9a1839164c9c8f06 ("raven: Implement non-contiguous I/O region")
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Cédric Le Goater <clg@redhat.com>
 Reviewed-by: Cédric Le Goater <clg@redhat.com>
 Message-id: 20240112134640.1775041-1-peter.maydell@linaro.org
 ---
- target/arm/mve.decode | 16 ++++++++--------
+ hw/pci-host/raven.c | 1 +
-file changed, 8 insertions(+), 8 deletions(-)
+file changed, 1 insertion(+)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+diff --git a/hw/pci-host/raven.c b/hw/pci-host/raven.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/hw/pci-host/raven.c
-+++ b/target/arm/mve.decode
++++ b/hw/pci-host/raven.c
-@@ -XXX,XX +XXX,XX @@ VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
+@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps raven_io_ops = {
+     .write = raven_io_write,
- &vmlaldav rdahi rdalo size qn qm x a
+     .endianness = DEVICE_LITTLE_ENDIAN,
+     .impl.max_access_size = 4,
--@vmlaldav        .... .... . ... ... . ... . .... .... qm:3 . \
++    .impl.unaligned = true,
-+@vmlaldav        .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
+     .valid.unaligned = true,
-                  qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
+ };
 -@vmlaldav_nosz   .... .... . ... ... . ... . .... .... qm:3 . \
 +@vmlaldav_nosz   .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
                   qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
 -VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
 -VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
 +VMLALDAV_S       1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
 +VMLALDAV_U       1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
 -VMLSLDAV         1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav
 +VMLSLDAV         1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
 -VRMLALDAVH_S     1110 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
 -VRMLALDAVH_U     1111 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
 +VRMLALDAVH_S     1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
 +VRMLALDAVH_U     1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
 -VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_nosz
 +VRMLSLDAVH       1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
  # Scalar operations
 --
-.20.1
+.34.1

-[PULL 05/44] target/arm: Fix mask handling for MVE narrowing operations
+[PULL 09/35] hw/block/tc58128: Don't emit deprecation warning under qtest
-In the MVE helpers for the narrowing operations (DO_VSHRN and
+Suppress the deprecation warning when we're running under qtest,
-DO_VSHRN_SAT) we were using the wrong bits of the predicate mask for
+to avoid "make check" including warning messages in its output.
 the 'top' versions of the insn.  This is because the loop works over
 the double-sized input elements and shifts the predicate mask by that
 many bits each time, but when we write out the half-sized output we
 must look at the mask bits for whichever half of the element we are
 writing to.
 Correct this by shifting the whole mask right by ESIZE bits for the
 'top' insns.  This allows us also to simplify the saturation bit
 checking (where we had noticed that we needed to look at a different
 mask bit for the 'top' insn.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206154151.155620-1-peter.maydell@linaro.org
 ---
- target/arm/mve_helper.c | 4 +++-
+ hw/block/tc58128.c | 4 +++-
 file changed, 3 insertions(+), 1 deletion(-)
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+diff --git a/hw/block/tc58128.c b/hw/block/tc58128.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
+--- a/hw/block/tc58128.c
-+++ b/target/arm/mve_helper.c
++++ b/hw/block/tc58128.c
-@@ -XXX,XX +XXX,XX @@ DO_VSHLL_ALL(vshllt, true)
+@@ -XXX,XX +XXX,XX @@ static sh7750_io_device tc58128 = {
-         TYPE *d = vd;                                           \
-         uint16_t mask = mve_element_mask(env);                  \
+ int tc58128_init(struct SH7750State *s, const char *zone1, const char *zone2)
-         unsigned le;                                            \
+ {
-+        mask >>= ESIZE * TOP;                                   \
+-    warn_report_once("The TC58128 flash device is deprecated");
-         for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
++    if (!qtest_enabled()) {
-             TYPE r = FN(m[H##LESIZE(le)], shift);               \
++        warn_report_once("The TC58128 flash device is deprecated");
-             mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
++    }
-@@ -XXX,XX +XXX,XX @@ static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max,
+     init_dev(&tc58128_devs[0], zone1);
-         uint16_t mask = mve_element_mask(env);                  \
+     init_dev(&tc58128_devs[1], zone2);
-         bool qc = false;                                        \
+     return sh7750_register_io_device(s, &tc58128);
          unsigned le;                                            \
 +        mask >>= ESIZE * TOP;                                   \
          for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
              bool sat = false;                                   \
              TYPE r = FN(m[H##LESIZE(le)], shift, &sat);         \
              mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
 -            qc |= sat && (mask & 1 << (TOP * ESIZE));           \
 +            qc |= sat & mask & 1;                               \
          }                                                       \
          if (qc) {                                               \
              env->vfp.qc[0] = qc;                                \
 --
-.20.1
+.34.1

-[PULL 29/44] target/arm: Implement MVE VMAXA, VMINA
+[PULL 10/35] tests/qtest/meson.build: Don't include qtests_npcm7xx in qtests_aarch64
-Implement the MVE VMAXA and VMINA insns, which take the absolute
+We deliberately don't include qtests_npcm7xx in qtests_aarch64,
-value of the signed elements in the input vector and then accumulate
+because we already get the coverage of those tests via qtests_arm,
-the unsigned max or min into the destination vector.
+and we don't want to use extra CI minutes testing them twice.
+In commit 327b680877b79c4b we added it to qtests_aarch64; revert
+that change.
+Fixes: 327b680877b79c4b ("tests/qtest: Creating qtest for GMAC Module")
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206163043.315535-1-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  8 ++++++++
+ tests/qtest/meson.build | 1 -
- target/arm/mve.decode      |  4 ++++
+file changed, 1 deletion(-)
  target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 40 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/tests/qtest/meson.build
-+++ b/target/arm/helper-mve.h
++++ b/tests/qtest/meson.build
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vqnegb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ qtests_aarch64 = \
- DEF_HELPER_FLAGS_3(mve_vqnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+   (config_all_devices.has_key('CONFIG_RASPI') ? ['bcm2835-dma-test'] : []) +  \
- DEF_HELPER_FLAGS_3(mve_vqnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+   (config_all_accel.has_key('CONFIG_TCG') and                                            \
+    config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : []) + \
-+DEF_HELPER_FLAGS_3(mve_vmaxab, TCG_CALL_NO_WG, void, env, ptr, ptr)
+-  (config_all_devices.has_key('CONFIG_NPCM7XX') ? qtests_npcm7xx : []) + \
-+DEF_HELPER_FLAGS_3(mve_vmaxah, TCG_CALL_NO_WG, void, env, ptr, ptr)
+   ['arm-cpu-features',
-+DEF_HELPER_FLAGS_3(mve_vmaxaw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+    'numa-test',
-+
+    'boot-serial-test',
 +DEF_HELPER_FLAGS_3(mve_vminab, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vminah, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vminaw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
  DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
    VQMOVUNB       111 0 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
    VQMOVN_BS      111 0 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
 +  VMAXA          111 0 1110 0 . 11 .. 11 ... 0 1110 1 0 . 0 ... 1 @1op
 +
    VMULH_S        111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
  }
@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
    VQMOVUNT       111 0 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
    VQMOVN_TS      111 0 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
 +  VMINA          111 0 1110 0 . 11 .. 11 ... 1 1110 1 0 . 0 ... 1 @1op
 +
    VRMULH_S       111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
  }
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_1OP_SAT(vqabsw, 4, int32_t, DO_VQABS_W)
  DO_1OP_SAT(vqnegb, 1, int8_t, DO_VQNEG_B)
  DO_1OP_SAT(vqnegh, 2, int16_t, DO_VQNEG_H)
  DO_1OP_SAT(vqnegw, 4, int32_t, DO_VQNEG_W)
 +
 +/*
 + * VMAXA, VMINA: vd is unsigned; vm is signed, and we take its
 + * absolute value; we then do an unsigned comparison.
 + */
 +#define DO_VMAXMINA(OP, ESIZE, STYPE, UTYPE, FN)                        \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
 +    {                                                                   \
 +        UTYPE *d = vd;                                                  \
 +        STYPE *m = vm;                                                  \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            UTYPE r = DO_ABS(m[H##ESIZE(e)]);                           \
 +            r = FN(d[H##ESIZE(e)], r);                                  \
 +            mergemask(&d[H##ESIZE(e)], r, mask);                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +DO_VMAXMINA(vmaxab, 1, int8_t, uint8_t, DO_MAX)
 +DO_VMAXMINA(vmaxah, 2, int16_t, uint16_t, DO_MAX)
 +DO_VMAXMINA(vmaxaw, 4, int32_t, uint32_t, DO_MAX)
 +DO_VMAXMINA(vminab, 1, int8_t, uint8_t, DO_MIN)
 +DO_VMAXMINA(vminah, 2, int16_t, uint16_t, DO_MIN)
 +DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_1OP(VABS, vabs)
  DO_1OP(VNEG, vneg)
  DO_1OP(VQABS, vqabs)
  DO_1OP(VQNEG, vqneg)
 +DO_1OP(VMAXA, vmaxa)
 +DO_1OP(VMINA, vmina)
  /* Narrowing moves: only size 0 and 1 are valid */
  #define DO_VMOVN(INSN, FN) \
 --
-.20.1
+.34.1

-[PULL 01/44] target/arm: Note that we handle VMOVL as a special case of VSHLL
+[PULL 11/35] tests/qtest/bios-tables-test: Allow changes to virt GTDT
-Although the architecture doesn't define it as an alias, VMOVL
+Allow changes to the virt GTDT -- we are going to add the IRQ
-(vector move long) is encoded as a VSHLL with a zero shift.
+entry for a new timer to it.
 Add a comment in the decode file noting that we handle VMOVL
 as part of VSHLL.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
 Message-id: 20240122143537.233498-2-peter.maydell@linaro.org
 ---
- target/arm/mve.decode | 2 ++
+ tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
 file changed, 2 insertions(+)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/tests/qtest/bios-tables-test-allowed-diff.h
-+++ b/target/arm/mve.decode
++++ b/tests/qtest/bios-tables-test-allowed-diff.h
-@@ -XXX,XX +XXX,XX @@ VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
+@@ -1 +1,3 @@
- VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
+ /* List of comma-separated changed AML files to ignore */
++"tests/data/acpi/virt/FACP",
- # VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
++"tests/data/acpi/virt/GTDT",
 +# Note that VMOVL is encoded as "VSHLL with a zero shift count"; we
 +# implement it that way rather than special-casing it in the decode.
  VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
  VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
 --
-.20.1
+.34.1

-[PULL 21/44] target/arm: Implement MVE integer min/max across vector
+[PULL 12/35] hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
-Implement the MVE integer min/max across vector insns
+Armv8.1+ CPUs have the Virtual Host Extension (VHE) which adds a
-VMAXV, VMINV, VMAXAV and VMINAV, which find the maximum
+non-secure EL2 virtual timer.  We implemented the timer itself in the
-from the vector elements and a general purpose register,
+CPU model, but never wired up its IRQ line to the GIC.
-and store the maximum back into the general purpose
-register.
+Wire up the IRQ line (this is always safe whether the CPU has the
+interrupt or not, since it always creates the outbound IRQ line).
-These insns overlap with VRMLALDAVH (they use what would
+Report it to the guest via dtb and ACPI if the CPU has the feature.
-be RdaHi=0b110).
 The DTB binding is documented in the kernel's
 Documentation/devicetree/bindings/timer/arm\,arch_timer.yaml
 and the ACPI table entries are documented in the ACPI specification
 version 6.3 or later.
 Because the IRQ line ACPI binding is new in 6.3, we need to bump the
 FADT table rev to show that we might be using 6.3 features.
 Note that exposing this IRQ in the DTB will trigger a bug in EDK2
 versions prior to edk2-stable202311, for users who use the virt board
 with 'virtualization=on' to enable EL2 emulation and are booting an
 EDK2 guest BIOS, if that EDK2 has assertions enabled.  The effect is
 that EDK2 will assert on bootup:
  ASSERT [ArmTimerDxe] /home/kraxel/projects/qemu/roms/edk2/ArmVirtPkg/Library/ArmVirtTimerFdtClientLib/ArmVirtTimerFdtClientLib.c(72): PropSize == 36 || PropSize == 48
 If you see that assertion you should do one of:
  * update your EDK2 binaries to edk2-stable202311 or newer
  * use the 'virt-8.2' versioned machine type
  * not use 'virtualization=on'
 (The versions shipped with QEMU itself have the fix.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
 Message-id: 20240122143537.233498-3-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 20 ++++++++++++
+ include/hw/arm/virt.h    |  2 ++
- target/arm/mve.decode      | 18 +++++++++--
+ hw/arm/virt-acpi-build.c | 20 ++++++++++----
- target/arm/mve_helper.c    | 66 ++++++++++++++++++++++++++++++++++++++
+ hw/arm/virt.c            | 60 ++++++++++++++++++++++++++++++++++------
- target/arm/translate-mve.c | 48 +++++++++++++++++++++++++++
+files changed, 67 insertions(+), 15 deletions(-)
-files changed, 150 insertions(+), 2 deletions(-)
+diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
 diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/include/hw/arm/virt.h
-+++ b/target/arm/helper-mve.h
++++ b/include/hw/arm/virt.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ struct VirtMachineClass {
- DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
+     /* Machines < 6.2 have no support for describing cpu topology to guest */
- DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
+     bool no_cpu_topology;
+     bool no_tcg_lpa2;
-+DEF_HELPER_FLAGS_3(mve_vmaxvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
++    bool no_ns_el2_virt_timer_irq;
-+DEF_HELPER_FLAGS_3(mve_vmaxvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+ };
-+DEF_HELPER_FLAGS_3(mve_vmaxvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vmaxvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
+ struct VirtMachineState {
-+DEF_HELPER_FLAGS_3(mve_vmaxvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ struct VirtMachineState {
-+DEF_HELPER_FLAGS_3(mve_vmaxvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
+     PCIBus *bus;
-+DEF_HELPER_FLAGS_3(mve_vmaxavb, TCG_CALL_NO_WG, i32, env, ptr, i32)
+     char *oem_id;
-+DEF_HELPER_FLAGS_3(mve_vmaxavh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+     char *oem_table_id;
-+DEF_HELPER_FLAGS_3(mve_vmaxavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
++    bool ns_el2_virt_timer_irq;
-+
+ };
-+DEF_HELPER_FLAGS_3(mve_vminvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vminvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+ #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
-+DEF_HELPER_FLAGS_3(mve_vminvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
+diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
 +DEF_HELPER_FLAGS_3(mve_vminvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminavb, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminavh, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +
  DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
  DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/hw/arm/virt-acpi-build.c
-+++ b/target/arm/mve.decode
++++ b/hw/arm/virt-acpi-build.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
- &vcmp qm qn size mask
+ }
- &vcmp_scalar qn rm size mask
- &shl_scalar qda rm size
+ /*
-+&vmaxv qm rda size
+- * ACPI spec, Revision 5.1
+- * 5.2.24 Generic Timer Description Table (GTDT)
- @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
++ * ACPI spec, Revision 6.5
- # Note that both Rn and Qd are 3 bits only (no D bit)
++ * 5.2.25 Generic Timer Description Table (GTDT)
-@@ -XXX,XX +XXX,XX @@
+  */
- @vcmp_scalar .... .... .. size:2 qn:3 . .... .... .... rm:4 &vcmp_scalar \
+ static void
-              mask=%mask_22_13
+ build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
+@@ -XXX,XX +XXX,XX @@ build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
-+@vmaxv .... .... .... size:2 .. rda:4 .... .... .... &vmaxv qm=%qm
+     uint32_t irqflags = vmc->claim_edge_triggered_timers ?
-+
+: /* Interrupt is Edge triggered */
- # Vector loads and stores
+;  /* Interrupt is Level triggered  */
+-    AcpiTable table = { .sig = "GTDT", .rev = 2, .oem_id = vms->oem_id,
- # Widening loads and narrowing stores:
++    AcpiTable table = { .sig = "GTDT", .rev = 3, .oem_id = vms->oem_id,
-@@ -XXX,XX +XXX,XX @@ VMLALDAV_U       1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
+                         .oem_table_id = vms->oem_table_id };
- VMLSLDAV         1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
+     acpi_table_begin(&table, table_data);
+@@ -XXX,XX +XXX,XX @@ build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
--VRMLALDAVH_S     1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
+     build_append_int_noprefix(table_data, 0, 4);
--VRMLALDAVH_U     1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
+     /* Platform Timer Offset */
      build_append_int_noprefix(table_data, 0, 4);
 -
 +    if (vms->ns_el2_virt_timer_irq) {
 +        /* Virtual EL2 Timer GSIV */
 +        build_append_int_noprefix(table_data, ARCH_TIMER_NS_EL2_VIRT_IRQ, 4);
 +        /* Virtual EL2 Timer Flags */
 +        build_append_int_noprefix(table_data, irqflags, 4);
 +    } else {
 +        build_append_int_noprefix(table_data, 0, 4);
 +        build_append_int_noprefix(table_data, 0, 4);
 +    }
      acpi_table_end(linker, &table);
  }
@@ -XXX,XX +XXX,XX @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
  static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
                              VirtMachineState *vms, unsigned dsdt_tbl_offset)
  {
 -    /* ACPI v6.0 */
 +    /* ACPI v6.3 */
      AcpiFadtData fadt = {
          .rev = 6,
 -        .minor_ver = 0,
 +        .minor_ver = 3,
          .flags = 1 << ACPI_FADT_F_HW_REDUCED_ACPI,
          .xdsdt_tbl_offset = &dsdt_tbl_offset,
      };
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void create_randomness(MachineState *ms, const char *node)
      qemu_fdt_setprop(ms->fdt, node, "rng-seed", seed.rng, sizeof(seed.rng));
  }
 +/*
 + * The CPU object always exposes the NS EL2 virt timer IRQ line,
 + * but we don't want to advertise it to the guest in the dtb or ACPI
 + * table unless it's really going to do something.
 + */
 +static bool ns_el2_virt_timer_present(void)
 +{
-+  VMAXV_S        1110 1110 1110  .. 10 ....  1111 0 0 . 0 ... 0 @vmaxv
++    ARMCPU *cpu = ARM_CPU(qemu_get_cpu(0));
-+  VMINV_S        1110 1110 1110  .. 10 ....  1111 1 0 . 0 ... 0 @vmaxv
++    CPUARMState *env = &cpu->env;
-+  VMAXAV         1110 1110 1110  .. 00 ....  1111 0 0 . 0 ... 0 @vmaxv
++
-+  VMINAV         1110 1110 1110  .. 00 ....  1111 1 0 . 0 ... 0 @vmaxv
++    return arm_feature(env, ARM_FEATURE_AARCH64) &&
-+  VRMLALDAVH_S   1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
++        arm_feature(env, ARM_FEATURE_EL2) && cpu_isar_feature(aa64_vh, cpu);
 +}
 +
-+{
+ static void create_fdt(VirtMachineState *vms)
-+  VMAXV_U        1111 1110 1110  .. 10 ....  1111 0 0 . 0 ... 0 @vmaxv
+ {
-+  VMINV_U        1111 1110 1110  .. 10 ....  1111 1 0 . 0 ... 0 @vmaxv
+     MachineState *ms = MACHINE(vms);
-+  VRMLALDAVH_U   1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
+@@ -XXX,XX +XXX,XX @@ static void fdt_add_timer_nodes(const VirtMachineState *vms)
-+}
+                                 "arm,armv7-timer");
+     }
- VRMLSLDAVH       1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
+     qemu_fdt_setprop(ms->fdt, "/timer", "always-on", NULL, 0);
+-    qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+-                           GIC_FDT_IRQ_TYPE_PPI,
-index XXXXXXX..XXXXXXX 100644
+-                           INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
---- a/target/arm/mve_helper.c
+-                           GIC_FDT_IRQ_TYPE_PPI,
-+++ b/target/arm/mve_helper.c
+-                           INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
-@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvub, 1, uint8_t)
+-                           GIC_FDT_IRQ_TYPE_PPI,
- DO_VADDV(vaddvuh, 2, uint16_t)
+-                           INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
- DO_VADDV(vaddvuw, 4, uint32_t)
+-                           GIC_FDT_IRQ_TYPE_PPI,
+-                           INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags);
-+/*
++    if (vms->ns_el2_virt_timer_irq) {
-+ * Vector max/min across vector. Unlike VADDV, we must
++        qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
-+ * read ra as the element size, not its full width.
++                               GIC_FDT_IRQ_TYPE_PPI,
-+ * We work with int64_t internally for simplicity.
++                               INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
-+ */
++                               GIC_FDT_IRQ_TYPE_PPI,
-+#define DO_VMAXMINV(OP, ESIZE, TYPE, RATYPE, FN)                \
++                               INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
-+    uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
++                               GIC_FDT_IRQ_TYPE_PPI,
-+                                    uint32_t ra_in)             \
++                               INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
-+    {                                                           \
++                               GIC_FDT_IRQ_TYPE_PPI,
-+        uint16_t mask = mve_element_mask(env);                  \
++                               INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags,
-+        unsigned e;                                             \
++                               GIC_FDT_IRQ_TYPE_PPI,
-+        TYPE *m = vm;                                           \
++                               INTID_TO_PPI(ARCH_TIMER_NS_EL2_VIRT_IRQ), irqflags);
-+        int64_t ra = (RATYPE)ra_in;                             \
++    } else {
-+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
++        qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
-+            if (mask & 1) {                                     \
++                               GIC_FDT_IRQ_TYPE_PPI,
-+                ra = FN(ra, m[H##ESIZE(e)]);                    \
++                               INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
-+            }                                                   \
++                               GIC_FDT_IRQ_TYPE_PPI,
-+        }                                                       \
++                               INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
-+        mve_advance_vpt(env);                                   \
++                               GIC_FDT_IRQ_TYPE_PPI,
-+        return ra;                                              \
++                               INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
-+    }                                                           \
++                               GIC_FDT_IRQ_TYPE_PPI,
-+
++                               INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags);
 +#define DO_VMAXMINV_U(INSN, FN)                         \
 +    DO_VMAXMINV(INSN##b, 1, uint8_t, uint8_t, FN)       \
 +    DO_VMAXMINV(INSN##h, 2, uint16_t, uint16_t, FN)     \
 +    DO_VMAXMINV(INSN##w, 4, uint32_t, uint32_t, FN)
 +#define DO_VMAXMINV_S(INSN, FN)                         \
 +    DO_VMAXMINV(INSN##b, 1, int8_t, int8_t, FN)         \
 +    DO_VMAXMINV(INSN##h, 2, int16_t, int16_t, FN)       \
 +    DO_VMAXMINV(INSN##w, 4, int32_t, int32_t, FN)
 +
 +/*
 + * Helpers for max and min of absolute values across vector:
 + * note that we only take the absolute value of 'm', not 'n'
 + */
 +static int64_t do_maxa(int64_t n, int64_t m)
 +{
 +    if (m < 0) {
 +        m = -m;
 +    }
-+    return MAX(n, m);
+ }
-+}
-+
+ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
-+static int64_t do_mina(int64_t n, int64_t m)
+@@ -XXX,XX +XXX,XX @@ static void create_gic(VirtMachineState *vms, MemoryRegion *mem)
-+{
+             [GTIMER_VIRT] = ARCH_TIMER_VIRT_IRQ,
-+    if (m < 0) {
+             [GTIMER_HYP]  = ARCH_TIMER_NS_EL2_IRQ,
-+        m = -m;
+             [GTIMER_SEC]  = ARCH_TIMER_S_EL1_IRQ,
-+    }
++            [GTIMER_HYPVIRT] = ARCH_TIMER_NS_EL2_VIRT_IRQ,
-+    return MIN(n, m);
+         };
-+}
-+
+         for (unsigned irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
-+DO_VMAXMINV_S(vmaxvs, DO_MAX)
+@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
-+DO_VMAXMINV_U(vmaxvu, DO_MAX)
+         qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
-+DO_VMAXMINV_S(vminvs, DO_MIN)
+         object_unref(cpuobj);
-+DO_VMAXMINV_U(vminvu, DO_MIN)
+     }
-+/*
++
-+ * VMAXAV, VMINAV treat the general purpose input as unsigned
++    /* Now we've created the CPUs we can see if they have the hypvirt timer */
-+ * and the vector elements as signed.
++    vms->ns_el2_virt_timer_irq = ns_el2_virt_timer_present() &&
-+ */
++        !vmc->no_ns_el2_virt_timer_irq;
-+DO_VMAXMINV(vmaxavb, 1, int8_t, uint8_t, do_maxa)
++
-+DO_VMAXMINV(vmaxavh, 2, int16_t, uint16_t, do_maxa)
+     fdt_add_timer_nodes(vms);
-+DO_VMAXMINV(vmaxavw, 4, int32_t, uint32_t, do_maxa)
+     fdt_add_cpu_nodes(vms);
-+DO_VMAXMINV(vminavb, 1, int8_t, uint8_t, do_mina)
-+DO_VMAXMINV(vminavh, 2, int16_t, uint16_t, do_mina)
+@@ -XXX,XX +XXX,XX @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 0)
-+DO_VMAXMINV(vminavw, 4, int32_t, uint32_t, do_mina)
-+
+ static void virt_machine_8_2_options(MachineClass *mc)
- #define DO_VADDLV(OP, TYPE, LTYPE)                              \
+ {
-     uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
++    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
-                                     uint64_t ra)                \
++
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+     virt_machine_9_0_options(mc);
-index XXXXXXX..XXXXXXX 100644
+     compat_props_add(mc->compat_props, hw_compat_8_2, hw_compat_8_2_len);
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_VCMP(VCMPGE, vcmpge)
  DO_VCMP(VCMPLT, vcmplt)
  DO_VCMP(VCMPGT, vcmpgt)
  DO_VCMP(VCMPLE, vcmple)
 +
 +static bool do_vmaxv(DisasContext *s, arg_vmaxv *a, MVEGenVADDVFn fn)
 +{
 +    /*
-+     * MIN/MAX operations across a vector: compute the min or
++     * Don't expose NS_EL2_VIRT timer IRQ in DTB on ACPI on 8.2 and
-+     * max of the initial value in a general purpose register
++     * earlier machines. (Exposing it tickles a bug in older EDK2
-+     * and all the elements in the vector, and store it back
++     * guest BIOS binaries.)
 +     * into the general purpose register.
 +     */
-+    TCGv_ptr qm;
++    vmc->no_ns_el2_virt_timer_irq = true;
-+    TCGv_i32 rda;
+ }
-+
+ DEFINE_VIRT_MACHINE(8, 2)
-+    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qm) ||
 +        !fn || a->rda == 13 || a->rda == 15) {
 +        /* Rda cases are UNPREDICTABLE */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qm = mve_qreg_ptr(a->qm);
 +    rda = load_reg(s, a->rda);
 +    fn(rda, cpu_env, qm, rda);
 +    store_reg(s, a->rda, rda);
 +    tcg_temp_free_ptr(qm);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +#define DO_VMAXV(INSN, FN)                                      \
 +    static bool trans_##INSN(DisasContext *s, arg_vmaxv *a)     \
 +    {                                                           \
 +        static MVEGenVADDVFn * const fns[] = {                  \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +            gen_helper_mve_##FN##w,                             \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_vmaxv(s, a, fns[a->size]);                    \
 +    }
 +
 +DO_VMAXV(VMAXV_S, vmaxvs)
 +DO_VMAXV(VMAXV_U, vmaxvu)
 +DO_VMAXV(VMAXAV, vmaxav)
 +DO_VMAXV(VMINV_S, vminvs)
 +DO_VMAXV(VMINV_U, vminvu)
 +DO_VMAXV(VMINAV, vminav)
 --
-.20.1
+.34.1

-[PULL 28/44] target/arm: Implement MVE VQABS, VQNEG
+[PULL 13/35] tests/qtest/bios-tables-tests: Update virt golden reference
-Implement the MVE 1-operand saturating operations VQABS and VQNEG.
+Update the virt golden reference files to say that the FACP is ACPI
 v6.3, and the GTDT table is a revision 3 table with space for the
 virtual EL2 timer.
 Diffs from iasl:
@@ -XXX,XX +XXX,XX @@
  /*
   * Intel ACPI Component Architecture
   * AML/ASL+ Disassembler version 20200925 (64-bit version)
   * Copyright (c) 2000 - 2020 Intel Corporation
   *
 - * Disassembly of tests/data/acpi/virt/FACP, Mon Jan 22 13:48:40 2024
 + * Disassembly of /tmp/aml-W8RZH2, Mon Jan 22 13:48:40 2024
   *
   * ACPI Data Table [FACP]
   *
   * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
   */
  [000h 0000   4]                    Signature : "FACP"    [Fixed ACPI Description Table (FADT)]
  [004h 0004   4]                 Table Length : 00000114
  [008h 0008   1]                     Revision : 06
 -[009h 0009   1]                     Checksum : 15
 +[009h 0009   1]                     Checksum : 12
  [00Ah 0010   6]                       Oem ID : "BOCHS "
  [010h 0016   8]                 Oem Table ID : "BXPC    "
  [018h 0024   4]                 Oem Revision : 00000001
  [01Ch 0028   4]              Asl Compiler ID : "BXPC"
  [020h 0032   4]        Asl Compiler Revision : 00000001
  [024h 0036   4]                 FACS Address : 00000000
  [028h 0040   4]                 DSDT Address : 00000000
  [02Ch 0044   1]                        Model : 00
  [02Dh 0045   1]                   PM Profile : 00 [Unspecified]
  [02Eh 0046   2]                SCI Interrupt : 0000
  [030h 0048   4]             SMI Command Port : 00000000
  [034h 0052   1]            ACPI Enable Value : 00
  [035h 0053   1]           ACPI Disable Value : 00
  [036h 0054   1]               S4BIOS Command : 00
  [037h 0055   1]              P-State Control : 00
@@ -XXX,XX +XXX,XX @@
       Use APIC Physical Destination Mode (V4) : 0
                         Hardware Reduced (V5) : 1
                        Low Power S0 Idle (V5) : 0
  [074h 0116  12]               Reset Register : [Generic Address Structure]
  [074h 0116   1]                     Space ID : 00 [SystemMemory]
  [075h 0117   1]                    Bit Width : 00
  [076h 0118   1]                   Bit Offset : 00
  [077h 0119   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [078h 0120   8]                      Address : 0000000000000000
  [080h 0128   1]         Value to cause reset : 00
  [081h 0129   2]    ARM Flags (decoded below) : 0003
                                PSCI Compliant : 1
                         Must use HVC for PSCI : 1
 -[083h 0131   1]          FADT Minor Revision : 00
 +[083h 0131   1]          FADT Minor Revision : 03
  [084h 0132   8]                 FACS Address : 0000000000000000
  [08Ch 0140   8]                 DSDT Address : 0000000000000000
  [094h 0148  12]             PM1A Event Block : [Generic Address Structure]
  [094h 0148   1]                     Space ID : 00 [SystemMemory]
  [095h 0149   1]                    Bit Width : 00
  [096h 0150   1]                   Bit Offset : 00
  [097h 0151   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [098h 0152   8]                      Address : 0000000000000000
  [0A0h 0160  12]             PM1B Event Block : [Generic Address Structure]
  [0A0h 0160   1]                     Space ID : 00 [SystemMemory]
  [0A1h 0161   1]                    Bit Width : 00
  [0A2h 0162   1]                   Bit Offset : 00
  [0A3h 0163   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [0A4h 0164   8]                      Address : 0000000000000000
@@ -XXX,XX +XXX,XX @@
  [0F5h 0245   1]                    Bit Width : 00
  [0F6h 0246   1]                   Bit Offset : 00
  [0F7h 0247   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [0F8h 0248   8]                      Address : 0000000000000000
  [100h 0256  12]        Sleep Status Register : [Generic Address Structure]
  [100h 0256   1]                     Space ID : 00 [SystemMemory]
  [101h 0257   1]                    Bit Width : 00
  [102h 0258   1]                   Bit Offset : 00
  [103h 0259   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [104h 0260   8]                      Address : 0000000000000000
  [10Ch 0268   8]                Hypervisor ID : 00000000554D4551
  Raw Table Data: Length 276 (0x114)
 -    0000: 46 41 43 50 14 01 00 00 06 15 42 4F 43 48 53 20  // FACP......BOCHS
 +    0000: 46 41 43 50 14 01 00 00 06 12 42 4F 43 48 53 20  // FACP......BOCHS
 : 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
 : 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 -    0080: 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 +    0080: 00 03 00 03 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 51 45 4D 55  // ............QEMU
 : 00 00 00 00                                      // ....
@@ -XXX,XX +XXX,XX @@
  /*
   * Intel ACPI Component Architecture
   * AML/ASL+ Disassembler version 20200925 (64-bit version)
   * Copyright (c) 2000 - 2020 Intel Corporation
   *
 - * Disassembly of tests/data/acpi/virt/GTDT, Mon Jan 22 13:48:40 2024
 + * Disassembly of /tmp/aml-XDSZH2, Mon Jan 22 13:48:40 2024
   *
   * ACPI Data Table [GTDT]
   *
   * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
   */
  [000h 0000   4]                    Signature : "GTDT"    [Generic Timer Description Table]
 -[004h 0004   4]                 Table Length : 00000060
 -[008h 0008   1]                     Revision : 02
 -[009h 0009   1]                     Checksum : 9C
 +[004h 0004   4]                 Table Length : 00000068
 +[008h 0008   1]                     Revision : 03
 +[009h 0009   1]                     Checksum : 93
  [00Ah 0010   6]                       Oem ID : "BOCHS "
  [010h 0016   8]                 Oem Table ID : "BXPC    "
  [018h 0024   4]                 Oem Revision : 00000001
  [01Ch 0028   4]              Asl Compiler ID : "BXPC"
  [020h 0032   4]        Asl Compiler Revision : 00000001
  [024h 0036   8]        Counter Block Address : FFFFFFFFFFFFFFFF
  [02Ch 0044   4]                     Reserved : 00000000
  [030h 0048   4]         Secure EL1 Interrupt : 0000001D
  [034h 0052   4]    EL1 Flags (decoded below) : 00000000
                                  Trigger Mode : 0
                                      Polarity : 0
                                     Always On : 0
  [038h 0056   4]     Non-Secure EL1 Interrupt : 0000001E
@@ -XXX,XX +XXX,XX @@
  [040h 0064   4]      Virtual Timer Interrupt : 0000001B
  [044h 0068   4]     VT Flags (decoded below) : 00000000
                                  Trigger Mode : 0
                                      Polarity : 0
                                     Always On : 0
  [048h 0072   4]     Non-Secure EL2 Interrupt : 0000001A
  [04Ch 0076   4]   NEL2 Flags (decoded below) : 00000000
                                  Trigger Mode : 0
                                      Polarity : 0
                                     Always On : 0
  [050h 0080   8]   Counter Read Block Address : FFFFFFFFFFFFFFFF
  [058h 0088   4]         Platform Timer Count : 00000000
  [05Ch 0092   4]        Platform Timer Offset : 00000000
 +[060h 0096   4]       Virtual EL2 Timer GSIV : 00000000
 +[064h 0100   4]      Virtual EL2 Timer Flags : 00000000
 -Raw Table Data: Length 96 (0x60)
 +Raw Table Data: Length 104 (0x68)
 -    0000: 47 54 44 54 60 00 00 00 02 9C 42 4F 43 48 53 20  // GTDT`.....BOCHS
 +    0000: 47 54 44 54 68 00 00 00 03 93 42 4F 43 48 53 20  // GTDTh.....BOCHS
 : 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
 : 01 00 00 00 FF FF FF FF FF FF FF FF 00 00 00 00  // ................
 : 1D 00 00 00 00 00 00 00 1E 00 00 00 04 00 00 00  // ................
 : 1B 00 00 00 00 00 00 00 1A 00 00 00 00 00 00 00  // ................
 : FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00  // ................
 +    0060: 00 00 00 00 00 00 00 00                          // ........
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
 Message-id: 20240122143537.233498-4-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  8 ++++++++
+ tests/qtest/bios-tables-test-allowed-diff.h |   2 --
- target/arm/mve.decode      |  3 +++
+ tests/data/acpi/virt/FACP                   | Bin 276 -> 276 bytes
- target/arm/mve_helper.c    | 37 +++++++++++++++++++++++++++++++++++++
+ tests/data/acpi/virt/GTDT                   | Bin 96 -> 104 bytes
- target/arm/translate-mve.c |  2 ++
+files changed, 2 deletions(-)
-files changed, 50 insertions(+)
+diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
 diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/tests/qtest/bios-tables-test-allowed-diff.h
-+++ b/target/arm/helper-mve.h
++++ b/tests/qtest/bios-tables-test-allowed-diff.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+@@ -1,3 +1 @@
- DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+ /* List of comma-separated changed AML files to ignore */
- DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
+-"tests/data/acpi/virt/FACP",
+-"tests/data/acpi/virt/GTDT",
-+DEF_HELPER_FLAGS_3(mve_vqabsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+diff --git a/tests/data/acpi/virt/FACP b/tests/data/acpi/virt/FACP
 +DEF_HELPER_FLAGS_3(mve_vqabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqabsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vqnegb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
  DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+GIT binary patch
-+++ b/target/arm/mve.decode
+delta 25
-@@ -XXX,XX +XXX,XX @@ VABS_fp          1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
+gcmbQjG=+)F&CxkPgpq-PO=u!l<;2F$$vli407<0<)c^nh
- VNEG             1111 1111 1 . 11 .. 01 ... 0 0011 11 . 0 ... 0 @1op
- VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
+delta 28
+kcmbQjG=+)F&CxkPgpq-PO>`nx<-|!<6Akz$^DuG%0AAS!ssI20
-+VQABS            1111 1111 1 . 11 .. 00 ... 0 0111 01 . 0 ... 0 @1op
-+VQNEG            1111 1111 1 . 11 .. 00 ... 0 0111 11 . 0 ... 0 @1op
+diff --git a/tests/data/acpi/virt/GTDT b/tests/data/acpi/virt/GTDT
 +
  &vdup qd rt size
  # Qd is in the fields usually named Qn
  @vdup            .... .... . . .. ... . rt:4 .... . . . . .... qd=%qn &vdup
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
+GIT binary patch
-+++ b/target/arm/mve_helper.c
+delta 25
-@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
+bcmYeu;BpUf3CUn!U|^m+kt>V?$N&QXMtB4L
-     }
-     mve_advance_vpt(env);
+delta 16
- }
+Xcmc~u;BpUf2}xjJU|^avkt+-UB60)u
-+
 +#define DO_1OP_SAT(OP, ESIZE, TYPE, FN)                                 \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
 +    {                                                                   \
 +        TYPE *d = vd, *m = vm;                                          \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        bool qc = false;                                                \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            bool sat = false;                                           \
 +            mergemask(&d[H##ESIZE(e)], FN(m[H##ESIZE(e)], &sat), mask); \
 +            qc |= sat & mask & 1;                                       \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define DO_VQABS_B(N, SATP) \
 +    do_sat_bhs(DO_ABS((int64_t)N), INT8_MIN, INT8_MAX, SATP)
 +#define DO_VQABS_H(N, SATP) \
 +    do_sat_bhs(DO_ABS((int64_t)N), INT16_MIN, INT16_MAX, SATP)
 +#define DO_VQABS_W(N, SATP) \
 +    do_sat_bhs(DO_ABS((int64_t)N), INT32_MIN, INT32_MAX, SATP)
 +
 +#define DO_VQNEG_B(N, SATP) do_sat_bhs(-(int64_t)N, INT8_MIN, INT8_MAX, SATP)
 +#define DO_VQNEG_H(N, SATP) do_sat_bhs(-(int64_t)N, INT16_MIN, INT16_MAX, SATP)
 +#define DO_VQNEG_W(N, SATP) do_sat_bhs(-(int64_t)N, INT32_MIN, INT32_MAX, SATP)
 +
 +DO_1OP_SAT(vqabsb, 1, int8_t, DO_VQABS_B)
 +DO_1OP_SAT(vqabsh, 2, int16_t, DO_VQABS_H)
 +DO_1OP_SAT(vqabsw, 4, int32_t, DO_VQABS_W)
 +
 +DO_1OP_SAT(vqnegb, 1, int8_t, DO_VQNEG_B)
 +DO_1OP_SAT(vqnegh, 2, int16_t, DO_VQNEG_H)
 +DO_1OP_SAT(vqnegw, 4, int32_t, DO_VQNEG_W)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_1OP(VCLZ, vclz)
  DO_1OP(VCLS, vcls)
  DO_1OP(VABS, vabs)
  DO_1OP(VNEG, vneg)
 +DO_1OP(VQABS, vqabs)
 +DO_1OP(VQNEG, vqneg)
  /* Narrowing moves: only size 0 and 1 are valid */
  #define DO_VMOVN(INSN, FN) \
 --
-.20.1
+.34.1

-[PULL 27/44] target/arm: Implement MVE saturating doubling multiply accumulates
+[PULL 14/35] hw/arm/npcm7xx: Call qemu_configure_nic_device() for GMAC modules
-Implement the MVE saturating doubling multiply accumulate insns
+The patchset adding the GMAC ethernet to this SoC crossed in the
-VQDMLAH, VQRDMLAH, VQDMLASH and VQRDMLASH.  These perform a multiply,
+mail with the patchset cleaning up the NIC handling. When we
-double, add the accumulator shifted by the element size, possibly
+create the GMAC modules we must call qemu_configure_nic_device()
-round, saturate to twice the element size, then take the high half of
+so that the user has the opportunity to use the -nic commandline
-the result.  The *MLAH insns do vector * scalar + vector, and the
+option to create a network backend and connect it to the GMACs.
 *MLASH insns do vector * vector + scalar.
+Add the missing call.
+Fixes: 21e5326a7c ("hw/arm: Add GMAC devices to NPCM7XX SoC")
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
 Message-id: 20240206171231.396392-2-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 16 +++++++
+ hw/arm/npcm7xx.c | 1 +
- target/arm/mve.decode      |  5 ++
+file changed, 1 insertion(+)
  target/arm/mve_helper.c    | 95 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  4 ++
 files changed, 120 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/npcm7xx.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/npcm7xx.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
- DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+     for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
- DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+         SysBusDevice *sbd = SYS_BUS_DEVICE(&s->gmac[i]);
-+DEF_HELPER_FLAGS_4(mve_vqdmlahb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++        qemu_configure_nic_device(DEVICE(sbd), false, NULL);
-+DEF_HELPER_FLAGS_4(mve_vqdmlahh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+         /*
-+DEF_HELPER_FLAGS_4(mve_vqdmlahw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+          * The device exists regardless of whether it's connected to a QEMU
-+
+          * netdev backend. So always instantiate it even if there is no
 +DEF_HELPER_FLAGS_4(mve_vqrdmlahb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlahh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlahw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqdmlashb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqdmlashh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqdmlashw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrdmlashb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlashh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlashw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VMLA             111- 1110 0 . .. ... 1 ... 0 1110 . 100 .... @2scalar
  VMLAS            111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
 +VQRDMLAH         1110 1110 0 . .. ... 0 ... 0 1110 . 100 .... @2scalar
 +VQRDMLASH        1110 1110 0 . .. ... 0 ... 1 1110 . 100 .... @2scalar
 +VQDMLAH          1110 1110 0 . .. ... 0 ... 0 1110 . 110 .... @2scalar
 +VQDMLASH         1110 1110 0 . .. ... 0 ... 1 1110 . 110 .... @2scalar
 +
  # Vector add across vector
  {
    VADDV          111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
          mve_advance_vpt(env);                                           \
      }
 +#define DO_2OP_SAT_ACC_SCALAR(OP, ESIZE, TYPE, FN)                      \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
 +                                uint32_t rm)                            \
 +    {                                                                   \
 +        TYPE *d = vd, *n = vn;                                          \
 +        TYPE m = rm;                                                    \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        bool qc = false;                                                \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            bool sat = false;                                           \
 +            mergemask(&d[H##ESIZE(e)],                                  \
 +                      FN(d[H##ESIZE(e)], n[H##ESIZE(e)], m, &sat),      \
 +                      mask);                                            \
 +            qc |= sat & mask & 1;                                       \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
  /* provide unsigned 2-op scalar helpers for all sizes */
  #define DO_2OP_SCALAR_U(OP, FN)                 \
      DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 +static int8_t do_vqdmlah_b(int8_t a, int8_t b, int8_t c, int round, bool *sat)
 +{
 +    int64_t r = (int64_t)a * b * 2 + ((int64_t)c << 8) + (round << 7);
 +    return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
 +}
 +
 +static int16_t do_vqdmlah_h(int16_t a, int16_t b, int16_t c,
 +                           int round, bool *sat)
 +{
 +    int64_t r = (int64_t)a * b * 2 + ((int64_t)c << 16) + (round << 15);
 +    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
 +}
 +
 +static int32_t do_vqdmlah_w(int32_t a, int32_t b, int32_t c,
 +                            int round, bool *sat)
 +{
 +    /*
 +     * Architecturally we should do the entire add, double, round
 +     * and then check for saturation. We do three saturating adds,
 +     * but we need to be careful about the order. If the first
 +     * m1 + m2 saturates then it's impossible for the *2+rc to
 +     * bring it back into the non-saturated range. However, if
 +     * m1 + m2 is negative then it's possible that doing the doubling
 +     * would take the intermediate result below INT64_MAX and the
 +     * addition of the rounding constant then brings it back in range.
 +     * So we add half the rounding constant and half the "c << esize"
 +     * before doubling rather than adding the rounding constant after
 +     * the doubling.
 +     */
 +    int64_t m1 = (int64_t)a * b;
 +    int64_t m2 = (int64_t)c << 31;
 +    int64_t r;
 +    if (sadd64_overflow(m1, m2, &r) ||
 +        sadd64_overflow(r, (round << 30), &r) ||
 +        sadd64_overflow(r, r, &r)) {
 +        *sat = true;
 +        return r < 0 ? INT32_MAX : INT32_MIN;
 +    }
 +    return r >> 32;
 +}
 +
 +/*
 + * The *MLAH insns are vector * scalar + vector;
 + * the *MLASH insns are vector * vector + scalar
 + */
 +#define DO_VQDMLAH_B(D, N, M, S) do_vqdmlah_b(N, M, D, 0, S)
 +#define DO_VQDMLAH_H(D, N, M, S) do_vqdmlah_h(N, M, D, 0, S)
 +#define DO_VQDMLAH_W(D, N, M, S) do_vqdmlah_w(N, M, D, 0, S)
 +#define DO_VQRDMLAH_B(D, N, M, S) do_vqdmlah_b(N, M, D, 1, S)
 +#define DO_VQRDMLAH_H(D, N, M, S) do_vqdmlah_h(N, M, D, 1, S)
 +#define DO_VQRDMLAH_W(D, N, M, S) do_vqdmlah_w(N, M, D, 1, S)
 +
 +#define DO_VQDMLASH_B(D, N, M, S) do_vqdmlah_b(N, D, M, 0, S)
 +#define DO_VQDMLASH_H(D, N, M, S) do_vqdmlah_h(N, D, M, 0, S)
 +#define DO_VQDMLASH_W(D, N, M, S) do_vqdmlah_w(N, D, M, 0, S)
 +#define DO_VQRDMLASH_B(D, N, M, S) do_vqdmlah_b(N, D, M, 1, S)
 +#define DO_VQRDMLASH_H(D, N, M, S) do_vqdmlah_h(N, D, M, 1, S)
 +#define DO_VQRDMLASH_W(D, N, M, S) do_vqdmlah_w(N, D, M, 1, S)
 +
 +DO_2OP_SAT_ACC_SCALAR(vqdmlahb, 1, int8_t, DO_VQDMLAH_B)
 +DO_2OP_SAT_ACC_SCALAR(vqdmlahh, 2, int16_t, DO_VQDMLAH_H)
 +DO_2OP_SAT_ACC_SCALAR(vqdmlahw, 4, int32_t, DO_VQDMLAH_W)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlahb, 1, int8_t, DO_VQRDMLAH_B)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlahh, 2, int16_t, DO_VQRDMLAH_H)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlahw, 4, int32_t, DO_VQRDMLAH_W)
 +
 +DO_2OP_SAT_ACC_SCALAR(vqdmlashb, 1, int8_t, DO_VQDMLASH_B)
 +DO_2OP_SAT_ACC_SCALAR(vqdmlashh, 2, int16_t, DO_VQDMLASH_H)
 +DO_2OP_SAT_ACC_SCALAR(vqdmlashw, 4, int32_t, DO_VQDMLASH_W)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlashb, 1, int8_t, DO_VQRDMLASH_B)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlashh, 2, int16_t, DO_VQRDMLASH_H)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlashw, 4, int32_t, DO_VQRDMLASH_W)
 +
  /* Vector by scalar plus vector */
  #define DO_VMLA(D, N, M) ((N) * (M) + (D))
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
  DO_2OP_SCALAR(VBRSR, vbrsr)
  DO_2OP_SCALAR(VMLA, vmla)
  DO_2OP_SCALAR(VMLAS, vmlas)
 +DO_2OP_SCALAR(VQDMLAH, vqdmlah)
 +DO_2OP_SCALAR(VQRDMLAH, vqrdmlah)
 +DO_2OP_SCALAR(VQDMLASH, vqdmlash)
 +DO_2OP_SCALAR(VQRDMLASH, vqrdmlash)
  static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
  {
 --
-.20.1
+.34.1

-[PULL 26/44] target/arm: Implement MVE VMLA
+[PULL 15/35] tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
-Implement the MVE VMLA insn, which multiplies a vector by a scalar
+Currently QEMU will warn if there is a NIC on the board that
-and accumulates into another vector.
+is not connected to a backend. By default the '-nic user' will
 get used for all NICs, but if you manually connect a specific
 NIC to a specific backend, then the other NICs on the board
 have no backend and will be warned about:
 qemu-system-arm: warning: nic npcm7xx-emc.1 has no peer
 qemu-system-arm: warning: nic npcm-gmac.0 has no peer
 qemu-system-arm: warning: nic npcm-gmac.1 has no peer
 So suppress those warnings by manually connecting every NIC
 on the board to some backend.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
 Reviewed-by: Thomas Huth <thuth@redhat.com>
 Message-id: 20240206171231.396392-3-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 4 ++++
+ tests/qtest/npcm7xx_emc-test.c | 5 ++++-
- target/arm/mve.decode      | 1 +
+file changed, 4 insertions(+), 1 deletion(-)
  target/arm/mve_helper.c    | 5 +++++
  target/arm/translate-mve.c | 1 +
 files changed, 11 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/tests/qtest/npcm7xx_emc-test.c b/tests/qtest/npcm7xx_emc-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/tests/qtest/npcm7xx_emc-test.c
-+++ b/target/arm/helper-mve.h
++++ b/tests/qtest/npcm7xx_emc-test.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i3
+@@ -XXX,XX +XXX,XX @@ static int *packet_test_init(int module_num, GString *cmd_line)
- DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+      * KISS and use -nic. The driver accepts 'emc0' and 'emc1' as aliases
- DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+      * in the 'model' field to specify the device to match.
+      */
-+DEF_HELPER_FLAGS_4(mve_vmlab, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-    g_string_append_printf(cmd_line, " -nic socket,fd=%d,model=emc%d ",
-+DEF_HELPER_FLAGS_4(mve_vmlah, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++    g_string_append_printf(cmd_line, " -nic socket,fd=%d,model=emc%d "
-+DEF_HELPER_FLAGS_4(mve_vmlaw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++                           "-nic user,model=npcm7xx-emc "
-+
++                           "-nic user,model=npcm-gmac "
- DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++                           "-nic user,model=npcm-gmac",
- DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+                            test_sockets[1], module_num);
- DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+     g_test_queue_destroy(packet_test_clear, test_sockets);
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  # The U bit (28) is don't-care because it does not affect the result
 +VMLA             111- 1110 0 . .. ... 1 ... 0 1110 . 100 .... @2scalar
  VMLAS            111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
  # Vector add across vector
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 +/* Vector by scalar plus vector */
 +#define DO_VMLA(D, N, M) ((N) * (M) + (D))
 +
 +DO_2OP_ACC_SCALAR_U(vmla, DO_VMLA)
 +
  /* Vector by vector plus scalar */
  #define DO_VMLAS(D, N, M) ((N) * (D) + (M))
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
  DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
  DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
  DO_2OP_SCALAR(VBRSR, vbrsr)
 +DO_2OP_SCALAR(VMLA, vmla)
  DO_2OP_SCALAR(VMLAS, vmlas)
  static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
 --
-.20.1
+.34.1

-[PULL 37/44] target/arm: Implement M-profile trapping on division by zero
+[PULL 16/35] target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
-Unlike A-profile, for M-profile the UDIV and SDIV insns can be
+It doesn't make sense to read the value of MDCR_EL2 on a non-A-profile
-configured to raise an exception on division by zero, using the CCR
+CPU, and in fact if you try to do it we will assert:
 DIV_0_TRP bit.
-Implement support for setting this bit by making the helper functions
+#6  0x00007ffff4b95e96 in __GI___assert_fail
-raise the appropriate exception.
+    (assertion=0x5555565a8c70 "!arm_feature(env, ARM_FEATURE_M)", file=0x5555565a6e5c "../../target/arm/helper.c", line=12600, function=0x5555565a9560 <__PRETTY_FUNCTION__.0> "arm_security_space_below_el3") at ./assert/assert.c:101
 #7  0x0000555555ebf412 in arm_security_space_below_el3 (env=0x555557bc8190) at ../../target/arm/helper.c:12600
 #8  0x0000555555ea6f89 in arm_is_el2_enabled (env=0x555557bc8190) at ../../target/arm/cpu.h:2595
 #9  0x0000555555ea942f in arm_mdcr_el2_eff (env=0x555557bc8190) at ../../target/arm/internals.h:1512
+We might call pmu_counter_enabled() on an M-profile CPU (for example
+from the migration pre/post hooks in machine.c); this should always
+return false because these CPUs don't set ARM_FEATURE_PMU.
+Avoid the assertion by not calling arm_mdcr_el2_eff() before we
+have done the early return for "PMU not present".
+This fixes an assertion failure if you try to do a loadvm or
+savevm for an M-profile board.
+Cc: qemu-stable@nongnu.org
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2155
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210730151636.17254-3-peter.maydell@linaro.org
+Message-id: 20240208153346.970021-1-peter.maydell@linaro.org
 ---
- target/arm/cpu.h       |  1 +
+ target/arm/helper.c | 12 ++++++++++--
- target/arm/helper.h    |  4 ++--
+file changed, 10 insertions(+), 2 deletions(-)
  target/arm/helper.c    | 19 +++++++++++++++++--
  target/arm/m_helper.c  |  4 ++++
  target/arm/translate.c |  4 ++--
 files changed, 26 insertions(+), 6 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
-+++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
- #define EXCP_LAZYFP         20   /* v7M fault during lazy FP stacking */
- #define EXCP_LSERR          21   /* v8M LSERR SecureFault */
- #define EXCP_UNALIGNED      22   /* v7M UNALIGNED UsageFault */
-+#define EXCP_DIVBYZERO      23   /* v7M DIVBYZERO UsageFault */
- /* NB: add new EXCP_ defines to the array in arm_log_exception() too */
- #define ARMV7M_EXCP_RESET   1
-diff --git a/target/arm/helper.h b/target/arm/helper.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
-+++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(add_saturate, i32, env, i32, i32)
- DEF_HELPER_3(sub_saturate, i32, env, i32, i32)
- DEF_HELPER_3(add_usaturate, i32, env, i32, i32)
- DEF_HELPER_3(sub_usaturate, i32, env, i32, i32)
--DEF_HELPER_FLAGS_2(sdiv, TCG_CALL_NO_RWG_SE, s32, s32, s32)
--DEF_HELPER_FLAGS_2(udiv, TCG_CALL_NO_RWG_SE, i32, i32, i32)
-+DEF_HELPER_FLAGS_3(sdiv, TCG_CALL_NO_RWG, s32, env, s32, s32)
-+DEF_HELPER_FLAGS_3(udiv, TCG_CALL_NO_RWG, i32, env, i32, i32)
- DEF_HELPER_FLAGS_1(rbit, TCG_CALL_NO_RWG_SE, i32, i32)
- #define PAS_OP(pfx)  \
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sxtb16)(uint32_t x)
+@@ -XXX,XX +XXX,XX @@ static bool pmu_counter_enabled(CPUARMState *env, uint8_t counter)
-     return res;
+     bool enabled, prohibited = false, filtered;
- }
+     bool secure = arm_is_secure(env);
+     int el = arm_current_el(env);
-+static void handle_possible_div0_trap(CPUARMState *env, uintptr_t ra)
+-    uint64_t mdcr_el2 = arm_mdcr_el2_eff(env);
-+{
+-    uint8_t hpmn = mdcr_el2 & MDCR_HPMN;
 +    uint64_t mdcr_el2;
 +    uint8_t hpmn;
 +    /*
-+     * Take a division-by-zero exception if necessary; otherwise return
++     * We might be called for M-profile cores where MDCR_EL2 doesn't
-+     * to get the usual non-trapping division behaviour (result of 0)
++     * exist and arm_mdcr_el2_eff() will assert, so this early-exit check
 +     * must be before we read that value.
 +     */
-+    if (arm_feature(env, ARM_FEATURE_M)
+     if (!arm_feature(env, ARM_FEATURE_PMU)) {
-+        && (env->v7m.ccr[env->v7m.secure] & R_V7M_CCR_DIV_0_TRP_MASK)) {
+         return false;
-+        raise_exception_ra(env, EXCP_DIVBYZERO, 0, 1, ra);
+     }
-+    }
-+}
++    mdcr_el2 = arm_mdcr_el2_eff(env);
 +    hpmn = mdcr_el2 & MDCR_HPMN;
 +
- uint32_t HELPER(uxtb16)(uint32_t x)
+     if (!arm_feature(env, ARM_FEATURE_EL2) ||
- {
+             (counter < hpmn || counter == 31)) {
-     uint32_t res;
+         e = env->cp15.c9_pmcr & PMCRE;
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(uxtb16)(uint32_t x)
      return res;
  }
 -int32_t HELPER(sdiv)(int32_t num, int32_t den)
 +int32_t HELPER(sdiv)(CPUARMState *env, int32_t num, int32_t den)
  {
      if (den == 0) {
 +        handle_possible_div0_trap(env, GETPC());
          return 0;
      }
      if (num == INT_MIN && den == -1) {
@@ -XXX,XX +XXX,XX @@ int32_t HELPER(sdiv)(int32_t num, int32_t den)
      return num / den;
  }
 -uint32_t HELPER(udiv)(uint32_t num, uint32_t den)
 +uint32_t HELPER(udiv)(CPUARMState *env, uint32_t num, uint32_t den)
  {
      if (den == 0) {
 +        handle_possible_div0_trap(env, GETPC());
          return 0;
      }
      return num / den;
@@ -XXX,XX +XXX,XX @@ void arm_log_exception(int idx)
              [EXCP_LAZYFP] = "v7M exception during lazy FP stacking",
              [EXCP_LSERR] = "v8M LSERR UsageFault",
              [EXCP_UNALIGNED] = "v7M UNALIGNED UsageFault",
 +            [EXCP_DIVBYZERO] = "v7M DIVBYZERO UsageFault",
          };
          if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
 diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/m_helper.c
 +++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
          armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
          env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_UNALIGNED_MASK;
          break;
 +    case EXCP_DIVBYZERO:
 +        armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
 +        env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_DIVBYZERO_MASK;
 +        break;
      case EXCP_SWI:
          /* The PC already points to the next instruction.  */
          armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SVC, env->v7m.secure);
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool op_div(DisasContext *s, arg_rrr *a, bool u)
      t1 = load_reg(s, a->rn);
      t2 = load_reg(s, a->rm);
      if (u) {
 -        gen_helper_udiv(t1, t1, t2);
 +        gen_helper_udiv(t1, cpu_env, t1, t2);
      } else {
 -        gen_helper_sdiv(t1, t1, t2);
 +        gen_helper_sdiv(t1, cpu_env, t1, t2);
      }
      tcg_temp_free_i32(t2);
      store_reg(s, a->rd, t1);
 --
-.20.1
+.34.1

-[PULL 38/44] target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
+[PULL 17/35] tests/qtest: Fix GMAC test to run on a machine in upstream QEMU
-From: Hamza Mahfooz <someguy@effective-light.com>
+From: Nabih Estefan <nabihestefan@google.com>
-As per commit 5626f8c6d468 ("rcu: Add automatically released rcu_read_lock
+Fix the nocm_gmac-test.c file to run on a nuvoton 7xx machine instead
-variants"), RCU_READ_LOCK_GUARD() should be used instead of
+of 8xx. Also fix comments referencing this and values expecting 8xx.
 rcu_read_{un}lock().
-Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
+Change-Id: Iabd0fba14910c3f1e883c4a9521350f3db9ffab8
-Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
+Signed-Off-By: Nabih Estefan <nabihestefan@google.com>
-Message-id: 20210727235201.11491-1-someguy@effective-light.com
+Reviewed-by: Tyrone Ting <kfting@nuvoton.com>
 Message-id: 20240208194759.2858582-2-nabihestefan@google.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 [PMM: commit message tweaks]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/kvm.c | 17 ++++++++---------
+ tests/qtest/npcm_gmac-test.c | 84 +-----------------------------------
-file changed, 8 insertions(+), 9 deletions(-)
+ tests/qtest/meson.build      |  3 +-
 files changed, 4 insertions(+), 83 deletions(-)
-diff --git a/target/arm/kvm.c b/target/arm/kvm.c
+diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm.c
+--- a/tests/qtest/npcm_gmac-test.c
-+++ b/target/arm/kvm.c
++++ b/tests/qtest/npcm_gmac-test.c
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+@@ -XXX,XX +XXX,XX @@ typedef struct TestData {
-     hwaddr xlat, len, doorbell_gpa;
+     const GMACModule *module;
-     MemoryRegionSection mrs;
+ } TestData;
-     MemoryRegion *mr;
--    int ret = 1;
+-/* Values extracted from hw/arm/npcm8xx.c */
++/* Values extracted from hw/arm/npcm7xx.c */
-     if (as == &address_space_memory) {
+ static const GMACModule gmac_module_list[] = {
-         return 0;
+     {
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+         .irq        = 14,
+@@ -XXX,XX +XXX,XX @@ static const GMACModule gmac_module_list[] = {
-     /* MSI doorbell address is translated by an IOMMU */
+         .irq        = 15,
+         .base_addr  = 0xf0804000
--    rcu_read_lock();
+     },
-+    RCU_READ_LOCK_GUARD();
+-    {
-+
+-        .irq        = 16,
-     mr = address_space_translate(as, address, &xlat, &len, true,
+-        .base_addr  = 0xf0806000
-                                  MEMTXATTRS_UNSPECIFIED);
+-    },
-+
+-    {
-     if (!mr) {
+-        .irq        = 17,
--        goto unlock;
+-        .base_addr  = 0xf0808000
-+        return 1;
+-    }
-     }
+ };
-+
-     mrs = memory_region_find(mr, xlat, 1);
+ /* Returns the index of the GMAC module. */
-+
+@@ -XXX,XX +XXX,XX @@ static uint32_t gmac_read(QTestState *qts, const GMACModule *mod,
-     if (!mrs.mr) {
+     return qtest_readl(qts, mod->base_addr + regno);
--        goto unlock;
+ }
-+        return 1;
-     }
+-static uint16_t pcs_read(QTestState *qts, const GMACModule *mod,
+-                          NPCMRegister regno)
-     doorbell_gpa = mrs.offset_within_address_space;
+-{
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
+-    uint32_t write_value = (regno & 0x3ffe00) >> 9;
+-    qtest_writel(qts, PCS_BASE_ADDRESS + NPCM_PCS_IND_AC_BA, write_value);
-     trace_kvm_arm_fixup_msi_route(address, doorbell_gpa);
+-    uint32_t read_offset = regno & 0x1ff;
+-    return qtest_readl(qts, PCS_BASE_ADDRESS + read_offset);
--    ret = 0;
+-}
 -
--unlock:
+ /* Check that GMAC registers are reset to default value */
--    rcu_read_unlock();
+ static void test_init(gconstpointer test_data)
--    return ret;
+ {
-+    return 0;
+     const TestData *td = test_data;
      const GMACModule *mod = td->module;
 -    QTestState *qts = qtest_init("-machine npcm845-evb");
 +    QTestState *qts = qtest_init("-machine npcm750-evb");
  #define CHECK_REG32(regno, value) \
      do { \
          g_assert_cmphex(gmac_read(qts, mod, (regno)), ==, (value)); \
      } while (0)
 -#define CHECK_REG_PCS(regno, value) \
 -    do { \
 -        g_assert_cmphex(pcs_read(qts, mod, (regno)), ==, (value)); \
 -    } while (0)
 -
      CHECK_REG32(NPCM_DMA_BUS_MODE, 0x00020100);
      CHECK_REG32(NPCM_DMA_XMT_POLL_DEMAND, 0);
      CHECK_REG32(NPCM_DMA_RCV_POLL_DEMAND, 0);
@@ -XXX,XX +XXX,XX @@ static void test_init(gconstpointer test_data)
      CHECK_REG32(NPCM_GMAC_PTP_TAR, 0);
      CHECK_REG32(NPCM_GMAC_PTP_TTSR, 0);
 -    /* TODO Add registers PCS */
 -    if (mod->base_addr == 0xf0802000) {
 -        CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID1, 0x699e);
 -        CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID2, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_CTL_STS, 0x8000);
 -
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_CTRL, 0x1140);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_STS, 0x0109);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID1, 0x699e);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID2, 0x0ced0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_AN_ADV, 0x0020);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_LP_BABL, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_AN_EXPN, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_EXT_STS, 0xc000);
 -
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_ABL, 0x0003);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_LWR, 0x0038);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_UPR, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_LWR, 0x0038);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_UPR, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_LWR, 0x0058);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_UPR, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_LWR, 0x0048);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_UPR, 0);
 -
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MMD_DIG_CTRL1, 0x2400);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_AN_CTRL, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_AN_INTR_STS, 0x000a);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_TC, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DBG_CTRL, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_MCTRL0, 0x899c);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_TXTIMER, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_RXTIMER, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_LINK_TIMER_CTRL, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_MCTRL1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_STS, 0x0010);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_ICG_ERRCNT1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MISC_STS, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_RX_LSTS, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_BSTCTRL0, 0x00a);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_LVLCTRL0, 0x007f);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_GENCTRL0, 0x0001);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_GENCTRL1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_STS, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_GENCTRL0, 0x0100);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_GENCTRL1, 0x1100);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_LOS_CTRL0, 0x000e);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_CTRL0, 0x0100);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_CTRL1, 0x0032);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_STS, 0x0001);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL2, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_LVL_CTRL, 0x0019);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL0, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_CTRL2, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_ERRCNT_SEL, 0);
 -    }
 -
      qtest_quit(qts);
  }
- int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qtest/meson.build
 +++ b/tests/qtest/meson.build
@@ -XXX,XX +XXX,XX @@ qtests_npcm7xx = \
     'npcm7xx_sdhci-test',
     'npcm7xx_smbus-test',
     'npcm7xx_timer-test',
 -   'npcm7xx_watchdog_timer-test'] + \
 +   'npcm7xx_watchdog_timer-test',
 +   'npcm_gmac-test'] + \
     (slirp.found() ? ['npcm7xx_emc-test'] : [])
  qtests_aspeed = \
    ['aspeed_hace-test',
 --
-.20.1
+.34.1

-[PULL 44/44] docs: Document how to use gdb with unix sockets
+[PULL 18/35] hw/arm/smmuv3: add support for stage 1 access fault
-From: Sebastian Meyer <meyer@absint.com>
+From: Luc Michel <luc.michel@amd.com>
-With gdb 9.0 and better it is possible to connect to a gdbstub
+An access fault is raised when the Access Flag is not set in the
-over unix sockets, which is better than a TCP socket connection
+looked-up PTE and the AFFD field is not set in the corresponding context
-in some situations. The QEMU command line to set this up is
+descriptor. This was already implemented for stage 2. Implement it for
-non-obvious; document it.
+stage 1 as well.
-Signed-off-by: Sebastian Meyer <meyer@absint.com>
+Signed-off-by: Luc Michel <luc.michel@amd.com>
-Message-id: 162867284829.27377.4784930719350564918-0@git.sr.ht
+Reviewed-by: Mostafa Saleh <smostafa@google.com>
-[PMM: Tweaked commit message; adjusted wording in a couple of
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
-places; fixed rST formatting issue; moved section up out of
+Tested-by: Mostafa Saleh <smostafa@google.com>
-the 'advanced debugging options' subsection]
+Message-id: 20240213082211.3330400-1-luc.michel@amd.com
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+[PMM: tweaked comment text]
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- docs/system/gdb.rst | 26 +++++++++++++++++++++++++-
+ hw/arm/smmuv3-internal.h     |  1 +
-file changed, 25 insertions(+), 1 deletion(-)
+ include/hw/arm/smmu-common.h |  1 +
  hw/arm/smmu-common.c         | 11 +++++++++++
  hw/arm/smmuv3.c              |  1 +
 files changed, 14 insertions(+)
-diff --git a/docs/system/gdb.rst b/docs/system/gdb.rst
+diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
 index XXXXXXX..XXXXXXX 100644
---- a/docs/system/gdb.rst
+--- a/hw/arm/smmuv3-internal.h
-+++ b/docs/system/gdb.rst
++++ b/hw/arm/smmuv3-internal.h
-@@ -XXX,XX +XXX,XX @@ The ``-s`` option will make QEMU listen for an incoming connection
+@@ -XXX,XX +XXX,XX @@ static inline int pa_range(STE *ste)
- from gdb on TCP port 1234, and ``-S`` will make QEMU not start the
+ #define CD_EPD(x, sel)   extract32((x)->word[0], (16 * (sel)) + 14, 1)
- guest until you tell it to from gdb. (If you want to specify which
+ #define CD_ENDI(x)       extract32((x)->word[0], 15, 1)
- TCP port to use or to use something other than TCP for the gdbstub
+ #define CD_IPS(x)        extract32((x)->word[1], 0 , 3)
--connection, use the ``-gdb dev`` option instead of ``-s``.)
++#define CD_AFFD(x)       extract32((x)->word[1], 3 , 1)
-+connection, use the ``-gdb dev`` option instead of ``-s``. See
+ #define CD_TBI(x)        extract32((x)->word[1], 6 , 2)
-+`Using unix sockets`_ for an example.)
+ #define CD_HD(x)         extract32((x)->word[1], 10 , 1)
+ #define CD_HA(x)         extract32((x)->word[1], 11 , 1)
- .. parsed-literal::
+diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ not just those in the cluster you are currently working on::
+--- a/include/hw/arm/smmu-common.h
++++ b/include/hw/arm/smmu-common.h
-   (gdb) set schedule-multiple on
+@@ -XXX,XX +XXX,XX @@ typedef struct SMMUTransCfg {
+     bool disabled;             /* smmu is disabled */
-+Using unix sockets
+     bool bypassed;             /* translation is bypassed */
-+==================
+     bool aborted;              /* translation is aborted */
 +    bool affd;                 /* AF fault disable */
      uint32_t iotlb_hits;       /* counts IOTLB hits */
      uint32_t iotlb_misses;     /* counts IOTLB misses*/
      /* Used by stage-1 only. */
 diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/smmu-common.c
 +++ b/hw/arm/smmu-common.c
@@ -XXX,XX +XXX,XX @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
                                       pte_addr, pte, iova, gpa,
                                       block_size >> 20);
          }
 +
-+An alternate method for connecting gdb to the QEMU gdbstub is to use
++        /*
-+a unix socket (if supported by your operating system). This is useful when
++         * QEMU does not currently implement HTTU, so if AFFD and PTE.AF
-+running several tests in parallel, or if you do not have a known free TCP
++         * are 0 we take an Access flag fault. (5.4. Context Descriptor)
-+port (e.g. when running automated tests).
++         * An Access flag fault takes priority over a Permission fault.
 +         */
 +        if (!PTE_AF(pte) && !cfg->affd) {
 +            info->type = SMMU_PTW_ERR_ACCESS;
 +            goto error;
 +        }
 +
-+First create a chardev with the appropriate options, then
+         ap = PTE_AP(pte);
-+instruct the gdbserver to use that device:
+         if (is_permission_fault(ap, perm)) {
-+
+             info->type = SMMU_PTW_ERR_PERMISSION;
-+.. parsed-literal::
+diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
-+
+index XXXXXXX..XXXXXXX 100644
-+   |qemu_system| -chardev socket,path=/tmp/gdb-socket,server=on,wait=off,id=gdb0 -gdb chardev:gdb0 -S ...
+--- a/hw/arm/smmuv3.c
-+
++++ b/hw/arm/smmuv3.c
-+Start gdb as before, but this time connect using the path to
+@@ -XXX,XX +XXX,XX @@ static int decode_cd(SMMUTransCfg *cfg, CD *cd, SMMUEventInfo *event)
-+the socket::
+     cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
-+
+     cfg->tbi = CD_TBI(cd);
-+   (gdb) target remote /tmp/gdb-socket
+     cfg->asid = CD_ASID(cd);
-+
++    cfg->affd = CD_AFFD(cd);
-+Note that to use a unix socket for the connection you will need
-+gdb version 9.0 or newer.
+     trace_smmuv3_decode_cd(cfg->oas);
 +
  Advanced debugging options
  ==========================
 --
-.20.1
+.34.1

-[PULL 23/44] target/arm: Implement MVE narrowing moves
+[PULL 19/35] hw/arm/stellaris: Convert ADC controller to Resettable interface
-Implement the MVE narrowing move insns VMOVN, VQMOVN and VQMOVUN.
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
 These take a double-width input, narrow it (possibly saturating) and
 store the result to either the top or bottom half of the output
 element.
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20240213155214.13619-2-philmd@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/helper-mve.h    | 20 ++++++++++
+ hw/arm/stellaris.c | 6 ++++--
- target/arm/mve.decode      | 12 ++++++
+file changed, 4 insertions(+), 2 deletions(-)
  target/arm/mve_helper.c    | 78 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c | 22 +++++++++++
 files changed, 132 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/stellaris.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ static void stellaris_adc_trigger(void *opaque, int irq, int level)
- DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+     }
  DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vmovnth, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vqmovunbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqmovunbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqmovuntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqmovunth, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vqmovnbsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqmovnbsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqmovntsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqmovntsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vqmovnbub, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqmovnbuh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqmovntub, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqmovntuh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vand, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vbic, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
    VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
    VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +  VQMOVUNB       111 0 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
 +  VQMOVN_BS      111 0 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
 +
    VMULH_S        111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
  }
-@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
+-static void stellaris_adc_reset(StellarisADCState *s)
-   VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
++static void stellaris_adc_reset_hold(Object *obj)
-   VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
+ {
++    StellarisADCState *s = STELLARIS_ADC(obj);
-+  VMOVNB         111 1 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
+     int n;
-+  VQMOVN_BU      111 1 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
-+
+     for (n = 0; n < 4; n++) {
-   VMULH_U        111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+@@ -XXX,XX +XXX,XX @@ static void stellaris_adc_init(Object *obj)
      memory_region_init_io(&s->iomem, obj, &stellaris_adc_ops, s,
                            "adc", 0x1000);
      sysbus_init_mmio(sbd, &s->iomem);
 -    stellaris_adc_reset(s);
      qdev_init_gpio_in(dev, stellaris_adc_trigger, 1);
  }
-@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
+@@ -XXX,XX +XXX,XX @@ static const TypeInfo stellaris_i2c_info = {
-   VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
+ static void stellaris_adc_class_init(ObjectClass *klass, void *data)
-   VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
+ {
+     DeviceClass *dc = DEVICE_CLASS(klass);
-+  VQMOVUNT       111 0 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
-+  VQMOVN_TS      111 0 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
-+
++    rc->phases.hold = stellaris_adc_reset_hold;
-   VRMULH_S       111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+     dc->vmsd = &vmstate_stellaris_adc;
  }
-@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
-   VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
-   VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
-+  VMOVNT         111 1 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
-+  VQMOVN_TU      111 1 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
-+
-   VRMULH_U       111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
- }
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
- DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
- DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
-+#define DO_VMOVN(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE)                   \
-+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
-+    {                                                                   \
-+        LTYPE *m = vm;                                                  \
-+        TYPE *d = vd;                                                   \
-+        uint16_t mask = mve_element_mask(env);                          \
-+        unsigned le;                                                    \
-+        mask >>= ESIZE * TOP;                                           \
-+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
-+            mergemask(&d[H##ESIZE(le * 2 + TOP)],                       \
-+                      m[H##LESIZE(le)], mask);                          \
-+        }                                                               \
-+        mve_advance_vpt(env);                                           \
-+    }
-+
-+DO_VMOVN(vmovnbb, false, 1, uint8_t, 2, uint16_t)
-+DO_VMOVN(vmovnbh, false, 2, uint16_t, 4, uint32_t)
-+DO_VMOVN(vmovntb, true, 1, uint8_t, 2, uint16_t)
-+DO_VMOVN(vmovnth, true, 2, uint16_t, 4, uint32_t)
-+
-+#define DO_VMOVN_SAT(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)           \
-+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
-+    {                                                                   \
-+        LTYPE *m = vm;                                                  \
-+        TYPE *d = vd;                                                   \
-+        uint16_t mask = mve_element_mask(env);                          \
-+        bool qc = false;                                                \
-+        unsigned le;                                                    \
-+        mask >>= ESIZE * TOP;                                           \
-+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
-+            bool sat = false;                                           \
-+            TYPE r = FN(m[H##LESIZE(le)], &sat);                        \
-+            mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);             \
-+            qc |= sat & mask & 1;                                       \
-+        }                                                               \
-+        if (qc) {                                                       \
-+            env->vfp.qc[0] = qc;                                        \
-+        }                                                               \
-+        mve_advance_vpt(env);                                           \
-+    }
-+
-+#define DO_VMOVN_SAT_UB(BOP, TOP, FN)                           \
-+    DO_VMOVN_SAT(BOP, false, 1, uint8_t, 2, uint16_t, FN)       \
-+    DO_VMOVN_SAT(TOP, true, 1, uint8_t, 2, uint16_t, FN)
-+
-+#define DO_VMOVN_SAT_UH(BOP, TOP, FN)                           \
-+    DO_VMOVN_SAT(BOP, false, 2, uint16_t, 4, uint32_t, FN)      \
-+    DO_VMOVN_SAT(TOP, true, 2, uint16_t, 4, uint32_t, FN)
-+
-+#define DO_VMOVN_SAT_SB(BOP, TOP, FN)                           \
-+    DO_VMOVN_SAT(BOP, false, 1, int8_t, 2, int16_t, FN)         \
-+    DO_VMOVN_SAT(TOP, true, 1, int8_t, 2, int16_t, FN)
-+
-+#define DO_VMOVN_SAT_SH(BOP, TOP, FN)                           \
-+    DO_VMOVN_SAT(BOP, false, 2, int16_t, 4, int32_t, FN)        \
-+    DO_VMOVN_SAT(TOP, true, 2, int16_t, 4, int32_t, FN)
-+
-+#define DO_VQMOVN_SB(N, SATP)                           \
-+    do_sat_bhs((int64_t)(N), INT8_MIN, INT8_MAX, SATP)
-+#define DO_VQMOVN_UB(N, SATP)                           \
-+    do_sat_bhs((uint64_t)(N), 0, UINT8_MAX, SATP)
-+#define DO_VQMOVUN_B(N, SATP)                           \
-+    do_sat_bhs((int64_t)(N), 0, UINT8_MAX, SATP)
-+
-+#define DO_VQMOVN_SH(N, SATP)                           \
-+    do_sat_bhs((int64_t)(N), INT16_MIN, INT16_MAX, SATP)
-+#define DO_VQMOVN_UH(N, SATP)                           \
-+    do_sat_bhs((uint64_t)(N), 0, UINT16_MAX, SATP)
-+#define DO_VQMOVUN_H(N, SATP)                           \
-+    do_sat_bhs((int64_t)(N), 0, UINT16_MAX, SATP)
-+
-+DO_VMOVN_SAT_SB(vqmovnbsb, vqmovntsb, DO_VQMOVN_SB)
-+DO_VMOVN_SAT_SH(vqmovnbsh, vqmovntsh, DO_VQMOVN_SH)
-+DO_VMOVN_SAT_UB(vqmovnbub, vqmovntub, DO_VQMOVN_UB)
-+DO_VMOVN_SAT_UH(vqmovnbuh, vqmovntuh, DO_VQMOVN_UH)
-+DO_VMOVN_SAT_SB(vqmovunbb, vqmovuntb, DO_VQMOVUN_B)
-+DO_VMOVN_SAT_SH(vqmovunbh, vqmovunth, DO_VQMOVUN_H)
-+
- uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
-                            uint32_t shift)
- {
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ DO_1OP(VCLS, vcls)
- DO_1OP(VABS, vabs)
- DO_1OP(VNEG, vneg)
-+/* Narrowing moves: only size 0 and 1 are valid */
-+#define DO_VMOVN(INSN, FN) \
-+    static bool trans_##INSN(DisasContext *s, arg_1op *a)       \
-+    {                                                           \
-+        static MVEGenOneOpFn * const fns[] = {                  \
-+            gen_helper_mve_##FN##b,                             \
-+            gen_helper_mve_##FN##h,                             \
-+            NULL,                                               \
-+            NULL,                                               \
-+        };                                                      \
-+        return do_1op(s, a, fns[a->size]);                      \
-+    }
-+
-+DO_VMOVN(VMOVNB, vmovnb)
-+DO_VMOVN(VMOVNT, vmovnt)
-+DO_VMOVN(VQMOVUNB, vqmovunb)
-+DO_VMOVN(VQMOVUNT, vqmovunt)
-+DO_VMOVN(VQMOVN_BS, vqmovnbs)
-+DO_VMOVN(VQMOVN_TS, vqmovnts)
-+DO_VMOVN(VQMOVN_BU, vqmovnbu)
-+DO_VMOVN(VQMOVN_TU, vqmovntu)
-+
- static bool trans_VREV16(DisasContext *s, arg_1op *a)
- {
-     static MVEGenOneOpFn * const fns[] = {
 --
-.20.1
+.34.1

-[PULL 25/44] target/arm: Implement MVE VMLADAV and VMLSLDAV
+[PULL 20/35] hw/arm/stellaris: Convert I2C controller to Resettable interface
-Implement the MVE VMLADAV and VMLSLDAV insns.  Like the VMLALDAV and
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
 VMLSLDAV insns already implemented, these accumulate multiplied
 vector elements; but they accumulate a 32-bit result rather than a
 -bit one.
-Note that these encodings overlap with what would be RdaHi=0b111 for
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
-VMLALDAV, VMLSLDAV, VRMLALDAVH and VRMLSLDAVH.
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240213155214.13619-3-philmd@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/stellaris.c | 26 ++++++++++++++++++++++----
 file changed, 22 insertions(+), 4 deletions(-)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  target/arm/helper-mve.h    | 17 ++++++++++
  target/arm/mve.decode      | 33 +++++++++++++++++---
  target/arm/mve_helper.c    | 41 ++++++++++++++++++++++++
  target/arm/translate-mve.c | 64 ++++++++++++++++++++++++++++++++++++++
 files changed, 150 insertions(+), 5 deletions(-)
 diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/stellaris.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrmlaldavhuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+@@ -XXX,XX +XXX,XX @@ static void stellaris_sys_instance_init(Object *obj)
- DEF_HELPER_FLAGS_4(mve_vrmlsldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+     s->sysclk = qdev_init_clock_out(DEVICE(s), "SYSCLK");
- DEF_HELPER_FLAGS_4(mve_vrmlsldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+ }
-+DEF_HELPER_FLAGS_4(mve_vmladavsb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+-/* I2C controller.  */
-+DEF_HELPER_FLAGS_4(mve_vmladavsh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++/*
-+DEF_HELPER_FLAGS_4(mve_vmladavsw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++ * I2C controller.
-+DEF_HELPER_FLAGS_4(mve_vmladavub, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++ * ??? For now we only implement the master interface.
-+DEF_HELPER_FLAGS_4(mve_vmladavuh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++ */
-+DEF_HELPER_FLAGS_4(mve_vmladavuw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vmlsdavb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+ #define TYPE_STELLARIS_I2C "stellaris-i2c"
-+DEF_HELPER_FLAGS_4(mve_vmlsdavh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+ OBJECT_DECLARE_SIMPLE_TYPE(stellaris_i2c_state, STELLARIS_I2C)
-+DEF_HELPER_FLAGS_4(mve_vmlsdavw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_write(void *opaque, hwaddr offset,
      stellaris_i2c_update(s);
  }
 -static void stellaris_i2c_reset(stellaris_i2c_state *s)
 +static void stellaris_i2c_reset_enter(Object *obj, ResetType type)
  {
 +    stellaris_i2c_state *s = STELLARIS_I2C(obj);
 +
-+DEF_HELPER_FLAGS_4(mve_vmladavsxb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+     if (s->mcs & STELLARIS_I2C_MCS_BUSBSY)
-+DEF_HELPER_FLAGS_4(mve_vmladavsxh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+         i2c_end_transfer(s->bus);
 +DEF_HELPER_FLAGS_4(mve_vmladavsxw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vmlsdavxb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vmlsdavxh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vmlsdavxw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_3(mve_vaddvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
  DEF_HELPER_FLAGS_3(mve_vaddvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
  DEF_HELPER_FLAGS_3(mve_vaddvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
  %size_16 16:1 !function=plus_1
  &vmlaldav rdahi rdalo size qn qm x a
 +&vmladav rda size qn qm x a
  @vmlaldav        .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
                   qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
  @vmlaldav_nosz   .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
                   qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
 -VMLALDAV_S       1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
 -VMLALDAV_U       1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
 +@vmladav         .... .... .... ... . ... x:1 .... . . a:1 . qm:3 . \
 +                 qn=%qn rda=%rdalo size=%size_16 &vmladav
 +@vmladav_nosz    .... .... .... ... . ... x:1 .... . . a:1 . qm:3 . \
 +                 qn=%qn rda=%rdalo size=0 &vmladav
 -VMLSLDAV         1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
 +{
 +  VMLADAV_S      1110 1110 1111  ... . ... . 1110 . 0 . 0 ... 0 @vmladav
 +  VMLALDAV_S     1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
 +}
 +{
 +  VMLADAV_U      1111 1110 1111  ... . ... . 1110 . 0 . 0 ... 0 @vmladav
 +  VMLALDAV_U     1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
 +}
 +
++static void stellaris_i2c_reset_hold(Object *obj)
 +{
-+  VMLSDAV        1110 1110 1111  ... . ... . 1110 . 0 . 0 ... 1 @vmladav
++    stellaris_i2c_state *s = STELLARIS_I2C(obj);
-+  VMLSLDAV       1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
      s->msa = 0;
      s->mcs = 0;
@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_reset(stellaris_i2c_state *s)
      s->mimr = 0;
      s->mris = 0;
      s->mcr = 0;
 +}
 +
++static void stellaris_i2c_reset_exit(Object *obj)
 +{
-+  VMLSDAV        1111 1110 1111  ... 0 ... . 1110 . 0 . 0 ... 1 @vmladav_nosz
++    stellaris_i2c_state *s = STELLARIS_I2C(obj);
 +  VRMLSLDAVH     1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
 +}
 +
-+VMLADAV_S        1110 1110 1111  ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
+     stellaris_i2c_update(s);
-+VMLADAV_U        1111 1110 1111  ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
+ }
@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_init(Object *obj)
      memory_region_init_io(&s->iomem, obj, &stellaris_i2c_ops, s,
                            "i2c", 0x1000);
      sysbus_init_mmio(sbd, &s->iomem);
 -    /* ??? For now we only implement the master interface.  */
 -    stellaris_i2c_reset(s);
  }
  /* Analogue to Digital Converter.  This is only partially implemented,
@@ -XXX,XX +XXX,XX @@ type_init(stellaris_machine_init)
  static void stellaris_i2c_class_init(ObjectClass *klass, void *data)
  {
-   VMAXV_S        1110 1110 1110  .. 10 ....  1111 0 0 . 0 ... 0 @vmaxv
+     DeviceClass *dc = DEVICE_CLASS(klass);
-   VMINV_S        1110 1110 1110  .. 10 ....  1111 1 0 . 0 ... 0 @vmaxv
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
-   VMAXAV         1110 1110 1110  .. 00 ....  1111 0 0 . 0 ... 0 @vmaxv
-   VMINAV         1110 1110 1110  .. 00 ....  1111 1 0 . 0 ... 0 @vmaxv
++    rc->phases.enter = stellaris_i2c_reset_enter;
-+  VMLADAV_S      1110 1110 1111  ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
++    rc->phases.hold = stellaris_i2c_reset_hold;
-   VRMLALDAVH_S   1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
++    rc->phases.exit = stellaris_i2c_reset_exit;
      dc->vmsd = &vmstate_stellaris_i2c;
  }
- {
-   VMAXV_U        1111 1110 1110  .. 10 ....  1111 0 0 . 0 ... 0 @vmaxv
-   VMINV_U        1111 1110 1110  .. 10 ....  1111 1 0 . 0 ... 0 @vmaxv
-+  VMLADAV_U      1111 1110 1111  ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
-   VRMLALDAVH_U   1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
- }
--VRMLSLDAVH       1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
--
- # Scalar operations
- VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavxsh, 2, int16_t, true, +=, -=)
- DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
- DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
-+/*
-+ * Multiply add dual accumulate ops
-+ */
-+#define DO_DAV(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC) \
-+    uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
-+                                    void *vm, uint32_t a)               \
-+    {                                                                   \
-+        uint16_t mask = mve_element_mask(env);                          \
-+        unsigned e;                                                     \
-+        TYPE *n = vn, *m = vm;                                          \
-+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
-+            if (mask & 1) {                                             \
-+                if (e & 1) {                                            \
-+                    a ODDACC                                            \
-+                        n[H##ESIZE(e - 1 * XCHG)] * m[H##ESIZE(e)];     \
-+                } else {                                                \
-+                    a EVENACC                                           \
-+                        n[H##ESIZE(e + 1 * XCHG)] * m[H##ESIZE(e)];     \
-+                }                                                       \
-+            }                                                           \
-+        }                                                               \
-+        mve_advance_vpt(env);                                           \
-+        return a;                                                       \
-+    }
-+
-+#define DO_DAV_S(INSN, XCHG, EVENACC, ODDACC)           \
-+    DO_DAV(INSN##b, 1, int8_t, XCHG, EVENACC, ODDACC)   \
-+    DO_DAV(INSN##h, 2, int16_t, XCHG, EVENACC, ODDACC)  \
-+    DO_DAV(INSN##w, 4, int32_t, XCHG, EVENACC, ODDACC)
-+
-+#define DO_DAV_U(INSN, XCHG, EVENACC, ODDACC)           \
-+    DO_DAV(INSN##b, 1, uint8_t, XCHG, EVENACC, ODDACC)  \
-+    DO_DAV(INSN##h, 2, uint16_t, XCHG, EVENACC, ODDACC) \
-+    DO_DAV(INSN##w, 4, uint32_t, XCHG, EVENACC, ODDACC)
-+
-+DO_DAV_S(vmladavs, false, +=, +=)
-+DO_DAV_U(vmladavu, false, +=, +=)
-+DO_DAV_S(vmlsdav, false, +=, -=)
-+DO_DAV_S(vmladavsx, true, +=, +=)
-+DO_DAV_S(vmlsdavx, true, +=, -=)
-+
- /*
-  * Rounding multiply add long dual accumulate high. In the pseudocode
-  * this is implemented with a 72-bit internal accumulator value of which
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TC
- typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
- typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
- typedef void MVEGenVABAVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
-+typedef void MVEGenDualAccOpFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
- /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
- static inline long mve_qreg_offset(unsigned reg)
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
-     return do_long_dual_acc(s, a, fns[a->x]);
- }
-+static bool do_dual_acc(DisasContext *s, arg_vmladav *a, MVEGenDualAccOpFn *fn)
-+{
-+    TCGv_ptr qn, qm;
-+    TCGv_i32 rda;
-+
-+    if (!dc_isar_feature(aa32_mve, s) ||
-+        !mve_check_qreg_bank(s, a->qn) ||
-+        !fn) {
-+        return false;
-+    }
-+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    qn = mve_qreg_ptr(a->qn);
-+    qm = mve_qreg_ptr(a->qm);
-+
-+    /*
-+     * This insn is subject to beat-wise execution. Partial execution
-+     * of an A=0 (no-accumulate) insn which does not execute the first
-+     * beat must start with the current rda value, not 0.
-+     */
-+    if (a->a || mve_skip_first_beat(s)) {
-+        rda = load_reg(s, a->rda);
-+    } else {
-+        rda = tcg_const_i32(0);
-+    }
-+
-+    fn(rda, cpu_env, qn, qm, rda);
-+    store_reg(s, a->rda, rda);
-+    tcg_temp_free_ptr(qn);
-+    tcg_temp_free_ptr(qm);
-+
-+    mve_update_eci(s);
-+    return true;
-+}
-+
-+#define DO_DUAL_ACC(INSN, FN)                                           \
-+    static bool trans_##INSN(DisasContext *s, arg_vmladav *a)           \
-+    {                                                                   \
-+        static MVEGenDualAccOpFn * const fns[4][2] = {                  \
-+            { gen_helper_mve_##FN##b, gen_helper_mve_##FN##xb },        \
-+            { gen_helper_mve_##FN##h, gen_helper_mve_##FN##xh },        \
-+            { gen_helper_mve_##FN##w, gen_helper_mve_##FN##xw },        \
-+            { NULL, NULL },                                             \
-+        };                                                              \
-+        return do_dual_acc(s, a, fns[a->size][a->x]);                   \
-+    }
-+
-+DO_DUAL_ACC(VMLADAV_S, vmladavs)
-+DO_DUAL_ACC(VMLSDAV, vmlsdav)
-+
-+static bool trans_VMLADAV_U(DisasContext *s, arg_vmladav *a)
-+{
-+    static MVEGenDualAccOpFn * const fns[4][2] = {
-+        { gen_helper_mve_vmladavub, NULL },
-+        { gen_helper_mve_vmladavuh, NULL },
-+        { gen_helper_mve_vmladavuw, NULL },
-+        { NULL, NULL },
-+    };
-+    return do_dual_acc(s, a, fns[a->size][a->x]);
-+}
-+
- static void gen_vpst(DisasContext *s, uint32_t mask)
- {
-     /*
 --
-.20.1
+.34.1

-[PULL 40/44] fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
+[PULL 21/35] hw/arm/stellaris: Add missing QOM 'machine' parent
-From: Guenter Roeck <linux@roeck-us.net>
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
-Instantiate SAI1/2/3 and ASRC as unimplemented devices to avoid random
+QDev objects created with qdev_new() need to manually add
-Linux kernel crashes, such as
+their parent relationship with object_property_add_child().
-Unhandled fault: external abort on non-linefetch (0x808) at 0xd1580010
+This commit plug the devices which aren't part of the SoC;
-pgd = (ptrval)
+they will be plugged into a SoC container in the next one.
 [d1580010] *pgd=8231b811, *pte=02034653, *ppte=02034453
 Internal error: : 808 [#1] SMP ARM
 ...
 [<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
 [<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
 [<c09580f4>] (_regmap_write) from [<c095837c>] (_regmap_update_bits+0xe4/0xec)
 [<c095837c>] (_regmap_update_bits) from [<c09599b4>] (regmap_update_bits_base+0x50/0x74)
 [<c09599b4>] (regmap_update_bits_base) from [<c0d3e9e4>] (fsl_asrc_runtime_resume+0x1e4/0x21c)
 [<c0d3e9e4>] (fsl_asrc_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
 [<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
 [<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
 [<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
 [<c0942dfc>] (__pm_runtime_resume) from [<c0d3ecc4>] (fsl_asrc_probe+0x2a8/0x708)
 [<c0d3ecc4>] (fsl_asrc_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
 [<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
 [<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
 [<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
 [<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
 [<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
 [<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
 [<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
 [<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
 [<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
 [<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
 [<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
-or
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000
+Message-id: 20240213155214.13619-4-philmd@linaro.org
 pgd = (ptrval)
 [d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453
 Internal error: : 808 [#1] SMP ARM
 ...
 [<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
 [<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
 [<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60)
 [<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec)
 [<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
 [<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
 [<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
 [<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
 [<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c)
 [<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
 [<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
 [<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
 [<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
 [<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
 [<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
 [<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
 [<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
 [<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
 [<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
 [<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
 [<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Guenter Roeck <linux@roeck-us.net>
 Message-id: 20210810160318.87376-1-linux@roeck-us.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/fsl-imx6ul.c | 12 ++++++++++++
+ hw/arm/stellaris.c | 4 ++++
-file changed, 12 insertions(+)
+file changed, 4 insertions(+)
-diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/fsl-imx6ul.c
+--- a/hw/arm/stellaris.c
-+++ b/hw/arm/fsl-imx6ul.c
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-      */
+                                    &error_fatal);
-     create_unimplemented_device("sdma", FSL_IMX6UL_SDMA_ADDR, 0x4000);
+             ssddev = qdev_new("ssd0323");
-+    /*
++            object_property_add_child(OBJECT(ms), "oled", OBJECT(ssddev));
-+     * SAI (Audio SSI (Synchronous Serial Interface))
+             qdev_prop_set_uint8(ssddev, "cs", 1);
-+     */
+             qdev_realize_and_unref(ssddev, bus, &error_fatal);
-+    create_unimplemented_device("sai1", FSL_IMX6UL_SAI1_ADDR, 0x4000);
-+    create_unimplemented_device("sai2", FSL_IMX6UL_SAI2_ADDR, 0x4000);
+             gpio_d_splitter = qdev_new(TYPE_SPLIT_IRQ);
-+    create_unimplemented_device("sai3", FSL_IMX6UL_SAI3_ADDR, 0x4000);
++            object_property_add_child(OBJECT(ms), "splitter",
-+
++                                      OBJECT(gpio_d_splitter));
-     /*
+             qdev_prop_set_uint32(gpio_d_splitter, "num-lines", 2);
-      * PWM
+             qdev_realize_and_unref(gpio_d_splitter, NULL, &error_fatal);
-      */
+             qdev_connect_gpio_out(
-@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-     create_unimplemented_device("pwm3", FSL_IMX6UL_PWM3_ADDR, 0x4000);
+         DeviceState *gpad;
-     create_unimplemented_device("pwm4", FSL_IMX6UL_PWM4_ADDR, 0x4000);
+         gpad = qdev_new(TYPE_STELLARIS_GAMEPAD);
-+    /*
++        object_property_add_child(OBJECT(ms), "gamepad", OBJECT(gpad));
-+     * Audio ASRC (asynchronous sample rate converter)
+         for (i = 0; i < ARRAY_SIZE(gpad_keycode); i++) {
-+     */
+             qlist_append_int(gpad_keycode_list, gpad_keycode[i]);
-+    create_unimplemented_device("asrc", FSL_IMX6UL_ASRC_ADDR, 0x4000);
+         }
 +
      /*
       * CAN
       */
 --
-.20.1
+.34.1

-[PULL 39/44] hw/char/pl011: add support for sending break
+[PULL 22/35] hw/arm/stellaris: Add missing QOM 'SoC' parent
-From: Jan Luebbe <jlu@pengutronix.de>
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
-Break events are currently only handled by chardev/char-serial.c, so we
+QDev objects created with qdev_new() need to manually add
-just ignore errors, which results in no behaviour change for other
+their parent relationship with object_property_add_child().
 chardevs.
-Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
+Since we don't model the SoC, just use a QOM container.
-Message-id: 20210806144700.3751979-1-jlu@pengutronix.de
 Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20240213155214.13619-5-philmd@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/char/pl011.c | 6 ++++++
+ hw/arm/stellaris.c | 11 ++++++++++-
-file changed, 6 insertions(+)
+file changed, 10 insertions(+), 1 deletion(-)
-diff --git a/hw/char/pl011.c b/hw/char/pl011.c
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/char/pl011.c
+--- a/hw/arm/stellaris.c
-+++ b/hw/char/pl011.c
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
- #include "hw/qdev-properties-system.h"
+      * 400fe000 system control
- #include "migration/vmstate.h"
+      */
- #include "chardev/char-fe.h"
-+#include "chardev/char-serial.h"
++    Object *soc_container;
- #include "qemu/log.h"
+     DeviceState *gpio_dev[7], *nvic;
- #include "qemu/module.h"
+     qemu_irq gpio_in[7][8];
- #include "trace.h"
+     qemu_irq gpio_out[7][8];
-@@ -XXX,XX +XXX,XX @@ static void pl011_write(void *opaque, hwaddr offset,
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-             s->read_count = 0;
+     flash_size = (((board->dc0 & 0xffff) + 1) << 1) * 1024;
-             s->read_pos = 0;
+     sram_size = ((board->dc0 >> 18) + 1) * 1024;
-         }
-+        if ((s->lcr ^ value) & 0x1) {
++    soc_container = object_new("container");
-+            int break_enable = value & 0x1;
++    object_property_add_child(OBJECT(ms), "soc", soc_container);
-+            qemu_chr_fe_ioctl(&s->chr, CHR_IOCTL_SERIAL_SET_BREAK,
++
-+                              &break_enable);
+     /* Flash programming is done via the SCU, so pretend it is ROM.  */
-+        }
+     memory_region_init_rom(flash, NULL, "stellaris.flash", flash_size,
-         s->lcr = value;
+                            &error_fatal);
-         pl011_set_read_trigger(s);
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-         break;
+      * need its sysclk output.
       */
      ssys_dev = qdev_new(TYPE_STELLARIS_SYS);
 +    object_property_add_child(soc_container, "sys", OBJECT(ssys_dev));
      /*
       * Most devices come preprogrammed with a MAC address in the user data.
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
      sysbus_realize_and_unref(SYS_BUS_DEVICE(ssys_dev), &error_fatal);
      nvic = qdev_new(TYPE_ARMV7M);
 +    object_property_add_child(soc_container, "v7m", OBJECT(nvic));
      qdev_prop_set_uint32(nvic, "num-irq", NUM_IRQ_LINES);
      qdev_prop_set_uint8(nvic, "num-prio-bits", NUM_PRIO_BITS);
      qdev_prop_set_string(nvic, "cpu-type", ms->cpu_type);
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
              dev = qdev_new(TYPE_STELLARIS_GPTM);
              sbd = SYS_BUS_DEVICE(dev);
 +            object_property_add_child(soc_container, "gptm[*]", OBJECT(dev));
              qdev_connect_clock_in(dev, "clk",
                                    qdev_get_clock_out(ssys_dev, "SYSCLK"));
              sysbus_realize_and_unref(sbd, &error_fatal);
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
      if (board->dc1 & (1 << 3)) { /* watchdog present */
          dev = qdev_new(TYPE_LUMINARY_WATCHDOG);
 -
 +        object_property_add_child(soc_container, "wdg", OBJECT(dev));
          qdev_connect_clock_in(dev, "WDOGCLK",
                                qdev_get_clock_out(ssys_dev, "SYSCLK"));
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
              SysBusDevice *sbd;
              dev = qdev_new("pl011_luminary");
 +            object_property_add_child(soc_container, "uart[*]", OBJECT(dev));
              sbd = SYS_BUS_DEVICE(dev);
              qdev_prop_set_chr(dev, "chardev", serial_hd(i));
              sysbus_realize_and_unref(sbd, &error_fatal);
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
          DeviceState *enet;
          enet = qdev_new("stellaris_enet");
 +        object_property_add_child(soc_container, "enet", OBJECT(enet));
          if (nd) {
              qdev_set_nic_properties(enet, nd);
          } else {
 --
-.20.1
+.34.1

-[PULL 36/44] target/arm: Re-indent sdiv and udiv helpers
+[PULL 23/35] target/arm: Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
-We're about to make a code change to the sdiv and udiv helper
+We support two different encodings for the AArch32 IMPDEF
-functions, so first fix their indentation and coding style.
+CBAR register -- older cores like the Cortex A9, A7, A15
 have this at 4, c15, c0, 0; newer cores like the
 Cortex A35, A53, A57 and A72 have it at 1 c15 c0 0.
 When we implemented this we picked which encoding to
 use based on whether the CPU set ARM_FEATURE_AARCH64.
 However this isn't right for three cases:
  * the qemu-system-arm 'max' CPU, which is supposed to be
    a variant on a Cortex-A57; it ought to use the same
    encoding the A57 does and which the AArch64 'max'
    exposes to AArch32 guest code
  * the Cortex-R52, which is AArch32-only but has the CBAR
    at the newer encoding (and where we incorrectly are
    not yet setting ARM_FEATURE_CBAR_RO anyway)
  * any possible future support for other v8 AArch32
    only CPUs, or for supporting "boot the CPU into
    AArch32 mode" on our existing cores like the A57 etc
 Make the decision of the encoding be based on whether
 the CPU implements the ARM_FEATURE_V8 flag instead.
 This changes the behaviour only for the qemu-system-arm
 '-cpu max'. We don't expect anybody to be relying on the
 old behaviour because:
  * it's not what the real hardware Cortex-A57 does
    (and that's what our ID register claims we are)
  * we don't implement the memory-mapped GICv3 support
    which is the only thing that exists at the peripheral
    base address pointed to by the register
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210730151636.17254-2-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-2-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 15 +++++++++------
+ target/arm/helper.c | 2 +-
-file changed, 9 insertions(+), 6 deletions(-)
+file changed, 1 insertion(+), 1 deletion(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(uxtb16)(uint32_t x)
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+          * AArch64 cores we might need to add a specific feature flag
- int32_t HELPER(sdiv)(int32_t num, int32_t den)
+          * to indicate cores with "flavour 2" CBAR.
- {
+          */
--    if (den == 0)
+-        if (arm_feature(env, ARM_FEATURE_AARCH64)) {
--      return 0;
++        if (arm_feature(env, ARM_FEATURE_V8)) {
--    if (num == INT_MIN && den == -1)
+             /* 32 bit view is [31:18] 0...0 [43:32]. */
--      return INT_MIN;
+             uint32_t cbar32 = (extract64(cpu->reset_cbar, 18, 14) << 18)
-+    if (den == 0) {
+                 | extract64(cpu->reset_cbar, 32, 12);
 +        return 0;
 +    }
 +    if (num == INT_MIN && den == -1) {
 +        return INT_MIN;
 +    }
      return num / den;
  }
  uint32_t HELPER(udiv)(uint32_t num, uint32_t den)
  {
 -    if (den == 0)
 -      return 0;
 +    if (den == 0) {
 +        return 0;
 +    }
      return num / den;
  }
 --
-.20.1
+.34.1

-[PULL 24/44] target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn
+[PULL 24/35] target/arm: The Cortex-R52 has a read-only CBAR
-The MVEGenDualAccOpFn is a bit misnamed, since it is used for
+The Cortex-R52 implements the Configuration Base Address Register
-the "long dual accumulate" operations that use a 64-bit
+(CBAR), as a read-only register.  Add ARM_FEATURE_CBAR_RO to this CPU
-accumulator. Rename it to MVEGenLongDualAccOpFn so we can
+type, so that our implementation provides the register and the
-use the former name for the 32-bit accumulator insns.
+associated qdev property.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-3-peter.maydell@linaro.org
 ---
- target/arm/translate-mve.c | 16 ++++++++--------
+ target/arm/tcg/cpu32.c | 1 +
-file changed, 8 insertions(+), 8 deletions(-)
+file changed, 1 insertion(+)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+diff --git a/target/arm/tcg/cpu32.c b/target/arm/tcg/cpu32.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
+--- a/target/arm/tcg/cpu32.c
-+++ b/target/arm/translate-mve.c
++++ b/target/arm/tcg/cpu32.c
-@@ -XXX,XX +XXX,XX @@ typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
+@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
- typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
+     set_feature(&cpu->env, ARM_FEATURE_PMSA);
- typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
+     set_feature(&cpu->env, ARM_FEATURE_NEON);
- typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
+     set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
--typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
++    set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
-+typedef void MVEGenLongDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
+     cpu->midr = 0x411fd133; /* r1p3 */
- typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
+     cpu->revidr = 0x00000000;
- typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
+     cpu->reset_fpsid = 0x41034023;
  typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
  }
  static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
 -                             MVEGenDualAccOpFn *fn)
 +                             MVEGenLongDualAccOpFn *fn)
  {
      TCGv_ptr qn, qm;
      TCGv_i64 rda;
@@ -XXX,XX +XXX,XX @@ static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
  static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
  {
 -    static MVEGenDualAccOpFn * const fns[4][2] = {
 +    static MVEGenLongDualAccOpFn * const fns[4][2] = {
          { NULL, NULL },
          { gen_helper_mve_vmlaldavsh, gen_helper_mve_vmlaldavxsh },
          { gen_helper_mve_vmlaldavsw, gen_helper_mve_vmlaldavxsw },
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
  static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
  {
 -    static MVEGenDualAccOpFn * const fns[4][2] = {
 +    static MVEGenLongDualAccOpFn * const fns[4][2] = {
          { NULL, NULL },
          { gen_helper_mve_vmlaldavuh, NULL },
          { gen_helper_mve_vmlaldavuw, NULL },
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
  static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
  {
 -    static MVEGenDualAccOpFn * const fns[4][2] = {
 +    static MVEGenLongDualAccOpFn * const fns[4][2] = {
          { NULL, NULL },
          { gen_helper_mve_vmlsldavsh, gen_helper_mve_vmlsldavxsh },
          { gen_helper_mve_vmlsldavsw, gen_helper_mve_vmlsldavxsw },
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
  static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
  {
 -    static MVEGenDualAccOpFn * const fns[] = {
 +    static MVEGenLongDualAccOpFn * const fns[] = {
          gen_helper_mve_vrmlaldavhsw, gen_helper_mve_vrmlaldavhxsw,
      };
      return do_long_dual_acc(s, a, fns[a->x]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
  static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
  {
 -    static MVEGenDualAccOpFn * const fns[] = {
 +    static MVEGenLongDualAccOpFn * const fns[] = {
          gen_helper_mve_vrmlaldavhuw, NULL,
      };
      return do_long_dual_acc(s, a, fns[a->x]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
  static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
  {
 -    static MVEGenDualAccOpFn * const fns[] = {
 +    static MVEGenLongDualAccOpFn * const fns[] = {
          gen_helper_mve_vrmlsldavhsw, gen_helper_mve_vrmlsldavhxsw,
      };
      return do_long_dual_acc(s, a, fns[a->x]);
 --
-.20.1
+.34.1

-[PULL 22/44] target/arm: Implement MVE VABAV
+[PULL 25/35] target/arm: Add Cortex-R52 IMPDEF sysregs
-Implement the MVE VABAV insn, which computes absolute differences
+Add the Cortex-R52 IMPDEF sysregs, by defining them here and
-between elements of two vectors and accumulates the result into
+also by enabling the AUXCR feature which defines the ACTLR
-a general purpose register.
+and HACTLR registers. As is our usual practice, we make these
 simple reads-as-zero stubs for now.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-4-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  7 +++++++
+ target/arm/tcg/cpu32.c | 108 +++++++++++++++++++++++++++++++++++++++++
- target/arm/mve.decode      |  6 ++++++
+file changed, 108 insertions(+)
  target/arm/mve_helper.c    | 26 +++++++++++++++++++++++
  target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++++++
 files changed, 82 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/target/arm/tcg/cpu32.c b/target/arm/tcg/cpu32.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/target/arm/tcg/cpu32.c
-+++ b/target/arm/helper-mve.h
++++ b/target/arm/tcg/cpu32.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vminavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
- DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
+     define_arm_cp_regs(cpu, cortexr5_cp_reginfo);
- DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
+ }
-+DEF_HELPER_FLAGS_4(mve_vabavsb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++static const ARMCPRegInfo cortex_r52_cp_reginfo[] = {
-+DEF_HELPER_FLAGS_4(mve_vabavsh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++    { .name = "CPUACTLR", .cp = 15, .opc1 = 0, .crm = 15,
-+DEF_HELPER_FLAGS_4(mve_vabavsw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++      .access = PL1_RW, .type = ARM_CP_CONST | ARM_CP_64BIT, .resetvalue = 0 },
-+DEF_HELPER_FLAGS_4(mve_vabavub, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++    { .name = "IMP_ATCMREGIONR",
-+DEF_HELPER_FLAGS_4(mve_vabavuh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++      .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 0,
-+DEF_HELPER_FLAGS_4(mve_vabavuw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_BTCMREGIONR",
 +      .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_CTCMREGIONR",
 +      .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 2,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_CSCTLR",
 +      .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_BPCTLR",
 +      .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_MEMPROTCLR",
 +      .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 2,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_SLAVEPCTLR",
 +      .cp = 15, .opc1 = 0, .crn = 11, .crm = 0, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_PERIPHREGIONR",
 +      .cp = 15, .opc1 = 0, .crn = 15, .crm = 0, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_FLASHIFREGIONR",
 +      .cp = 15, .opc1 = 0, .crn = 15, .crm = 0, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_BUILDOPTR",
 +      .cp = 15, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 0,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_PINOPTR",
 +      .cp = 15, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 7,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_QOSR",
 +      .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_BUSTIMEOUTR",
 +      .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 2,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_INTMONR",
 +      .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 4,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_ICERR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 0, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_ICERR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 0, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_DCERR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 1, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_DCERR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 1, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMERR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMERR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMSYNDR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 2,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMSYNDR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 3,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_FLASHERR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 3, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_FLASHERR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 3, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDR0",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 0, .opc2 = 0,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_CBDGBR1",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 0, .opc2 = 1,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TESTR0",
 +      .cp = 15, .opc1 = 4, .crn = 15, .crm = 0, .opc2 = 0,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TESTR1",
 +      .cp = 15, .opc1 = 4, .crn = 15, .crm = 0, .opc2 = 1,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDCI",
 +      .cp = 15, .opc1 = 0, .crn = 15, .crm = 15, .opc2 = 0,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDCT",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 2, .opc2 = 0,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGICT",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 2, .opc2 = 1,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDCD",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 4, .opc2 = 0,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGICD",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 4, .opc2 = 1,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +};
 +
- DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
++
- DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
+ static void cortex_r52_initfn(Object *obj)
- DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
+ {
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+     ARMCPU *cpu = ARM_CPU(obj);
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
---- a/target/arm/mve.decode
+     set_feature(&cpu->env, ARM_FEATURE_NEON);
-+++ b/target/arm/mve.decode
+     set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
-@@ -XXX,XX +XXX,XX @@
+     set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
- &vcmp_scalar qn rm size mask
++    set_feature(&cpu->env, ARM_FEATURE_AUXCR);
- &shl_scalar qda rm size
+     cpu->midr = 0x411fd133; /* r1p3 */
- &vmaxv qm rda size
+     cpu->revidr = 0x00000000;
-+&vabav qn qm rda size
+     cpu->reset_fpsid = 0x41034023;
+@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
- @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
- # Note that both Rn and Qd are 3 bits only (no D bit)
+     cpu->pmsav7_dregion = 16;
-@@ -XXX,XX +XXX,XX @@ VMLAS            111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
+     cpu->pmsav8r_hdregion = 16;
-                  rdahi=%rdahi rdalo=%rdalo
++
 +    define_arm_cp_regs(cpu, cortex_r52_cp_reginfo);
  }
-+@vabav           .... .... .. size:2 .... rda:4 .... .... .... &vabav qn=%qn qm=%qm
+ static void cortex_r5f_initfn(Object *obj)
 +
 +VABAV_S          111 0 1110 10 .. ... 0 .... 1111 . 0 . 0 ... 1 @vabav
 +VABAV_U          111 1 1110 10 .. ... 0 .... 1111 . 0 . 0 ... 1 @vabav
 +
  # Logical immediate operations (1 reg and modified-immediate)
  # The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINV(vminavb, 1, int8_t, uint8_t, do_mina)
  DO_VMAXMINV(vminavh, 2, int16_t, uint16_t, do_mina)
  DO_VMAXMINV(vminavw, 4, int32_t, uint32_t, do_mina)
 +#define DO_VABAV(OP, ESIZE, TYPE)                               \
 +    uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
 +                                    void *vm, uint32_t ra)      \
 +    {                                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        TYPE *m = vm, *n = vn;                                  \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            if (mask & 1) {                                     \
 +                int64_t n0 = n[H##ESIZE(e)];                    \
 +                int64_t m0 = m[H##ESIZE(e)];                    \
 +                uint32_t r = n0 >= m0 ? (n0 - m0) : (m0 - n0);  \
 +                ra += r;                                        \
 +            }                                                   \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +        return ra;                                              \
 +    }
 +
 +DO_VABAV(vabavsb, 1, int8_t)
 +DO_VABAV(vabavsh, 2, int16_t)
 +DO_VABAV(vabavsw, 4, int32_t)
 +DO_VABAV(vabavub, 1, uint8_t)
 +DO_VABAV(vabavuh, 2, uint16_t)
 +DO_VABAV(vabavuw, 4, uint32_t)
 +
  #define DO_VADDLV(OP, TYPE, LTYPE)                              \
      uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
                                      uint64_t ra)                \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
  typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
  typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenVABAVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ DO_VMAXV(VMAXAV, vmaxav)
  DO_VMAXV(VMINV_S, vminvs)
  DO_VMAXV(VMINV_U, vminvu)
  DO_VMAXV(VMINAV, vminav)
 +
 +static bool do_vabav(DisasContext *s, arg_vabav *a, MVEGenVABAVFn *fn)
 +{
 +    /* Absolute difference accumulated across vector */
 +    TCGv_ptr qn, qm;
 +    TCGv_i32 rda;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qm | a->qn) ||
 +        !fn || a->rda == 13 || a->rda == 15) {
 +        /* Rda cases are UNPREDICTABLE */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qm = mve_qreg_ptr(a->qm);
 +    qn = mve_qreg_ptr(a->qn);
 +    rda = load_reg(s, a->rda);
 +    fn(rda, cpu_env, qn, qm, rda);
 +    store_reg(s, a->rda, rda);
 +    tcg_temp_free_ptr(qm);
 +    tcg_temp_free_ptr(qn);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +#define DO_VABAV(INSN, FN)                                      \
 +    static bool trans_##INSN(DisasContext *s, arg_vabav *a)     \
 +    {                                                           \
 +        static MVEGenVABAVFn * const fns[] = {                  \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +            gen_helper_mve_##FN##w,                             \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_vabav(s, a, fns[a->size]);                    \
 +    }
 +
 +DO_VABAV(VABAV_S, vabavs)
 +DO_VABAV(VABAV_U, vabavu)
 --
-.20.1
+.34.1

-[PULL 09/44] target/arm: Factor out mve_eci_mask()
+[PULL 26/35] target/arm: Allow access to SPSR_hyp from hyp mode
-In some situations we need a mask telling us which parts of the
+Architecturally, the AArch32 MSR/MRS to/from banked register
-vector correspond to beats that are not being executed because of
+instructions are UNPREDICTABLE for attempts to access a banked
-ECI, separately from the combined "which bytes are predicated away"
+register that the guest could access in a more direct way (e.g.
-mask.  Factor this mask calculation out of mve_element_mask() into
+using this insn to access r8_fiq when already in FIQ mode).  QEMU has
-its own function.
+chosen to UNDEF on all of these.
 However, for the case of accessing SPSR_hyp from hyp mode, it turns
 out that real hardware permits this, with the same effect as if the
 guest had directly written to SPSR. Further, there is some
 guest code out there that assumes it can do this, because it
 happens to work on hardware: an example Cortex-R52 startup code
 fragment uses this, and it got copied into various other places,
 including Zephyr. Zephyr was fixed to not use this:
  https://github.com/zephyrproject-rtos/zephyr/issues/47330
 but other examples are still out there, like the selftest
 binary for the MPS3-AN536.
 For convenience of being able to run guest code, permit
 this UNPREDICTABLE access instead of UNDEFing it.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-5-peter.maydell@linaro.org
 ---
- target/arm/mve_helper.c | 58 ++++++++++++++++++++++++-----------------
+ target/arm/tcg/op_helper.c | 43 ++++++++++++++++++++++++++------------
-file changed, 34 insertions(+), 24 deletions(-)
+ target/arm/tcg/translate.c | 19 +++++++++++------
 files changed, 43 insertions(+), 19 deletions(-)
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+diff --git a/target/arm/tcg/op_helper.c b/target/arm/tcg/op_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
+--- a/target/arm/tcg/op_helper.c
-+++ b/target/arm/mve_helper.c
++++ b/target/arm/tcg/op_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void msr_mrs_banked_exc_checks(CPUARMState *env, uint32_t tgtmode,
- #include "exec/exec-all.h"
+      */
- #include "tcg/tcg.h"
+     int curmode = env->uncached_cpsr & CPSR_M;
-+static uint16_t mve_eci_mask(CPUARMState *env)
+-    if (regno == 17) {
-+{
+-        /* ELR_Hyp: a special case because access from tgtmode is OK */
-+    /*
+-        if (curmode != ARM_CPU_MODE_HYP && curmode != ARM_CPU_MODE_MON) {
-+     * Return the mask of which elements in the MVE vector correspond
+-            goto undef;
-+     * to beats being executed. The mask has 1 bits for executed lanes
++    if (tgtmode == ARM_CPU_MODE_HYP) {
-+     * and 0 bits where ECI says this beat was already executed.
++        /*
-+     */
++         * Handle Hyp target regs first because some are special cases
-+    int eci;
++         * which don't want the usual "not accessible from tgtmode" check.
-+
++         */
-+    if ((env->condexec_bits & 0xf) != 0) {
++        switch (regno) {
-+        return 0xffff;
++        case 16 ... 17: /* ELR_Hyp, SPSR_Hyp */
-+    }
++            if (curmode != ARM_CPU_MODE_HYP && curmode != ARM_CPU_MODE_MON) {
-+
++                goto undef;
-+    eci = env->condexec_bits >> 4;
++            }
-+    switch (eci) {
++            break;
-+    case ECI_NONE:
++        case 13:
-+        return 0xffff;
++            if (curmode != ARM_CPU_MODE_MON) {
-+    case ECI_A0:
++                goto undef;
-+        return 0xfff0;
++            }
-+    case ECI_A0A1:
++            break;
-+        return 0xff00;
++        default:
-+    case ECI_A0A1A2:
++            g_assert_not_reached();
-+    case ECI_A0A1A2B0:
+         }
-+        return 0xf000;
+         return;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
  static uint16_t mve_element_mask(CPUARMState *env)
  {
      /*
@@ -XXX,XX +XXX,XX @@ static uint16_t mve_element_mask(CPUARMState *env)
          mask &= ltpmask;
      }
+@@ -XXX,XX +XXX,XX @@ static void msr_mrs_banked_exc_checks(CPUARMState *env, uint32_t tgtmode,
--    if ((env->condexec_bits & 0xf) == 0) {
+         }
--        /*
+     }
--         * ECI bits indicate which beats are already executed;
--         * we handle this by effectively predicating them out.
+-    if (tgtmode == ARM_CPU_MODE_HYP) {
--         */
+-        /* SPSR_Hyp, r13_hyp: accessible from Monitor mode only */
--        int eci = env->condexec_bits >> 4;
+-        if (curmode != ARM_CPU_MODE_MON) {
--        switch (eci) {
+-            goto undef;
 -        case ECI_NONE:
 -            break;
 -        case ECI_A0:
 -            mask &= 0xfff0;
 -            break;
 -        case ECI_A0A1:
 -            mask &= 0xff00;
 -            break;
 -        case ECI_A0A1A2:
 -        case ECI_A0A1A2B0:
 -            mask &= 0xf000;
 -            break;
 -        default:
 -            g_assert_not_reached();
 -        }
 -    }
 -
-+    /*
+     return;
-+     * ECI bits indicate which beats are already executed;
-+     * we handle this by effectively predicating them out.
+ undef:
-+     */
+@@ -XXX,XX +XXX,XX @@ void HELPER(msr_banked)(CPUARMState *env, uint32_t value, uint32_t tgtmode,
-+    mask &= mve_eci_mask(env);
-     return mask;
+     switch (regno) {
- }
+     case 16: /* SPSRs */
+-        env->banked_spsr[bank_number(tgtmode)] = value;
 +        if (tgtmode == (env->uncached_cpsr & CPSR_M)) {
 +            /* Only happens for SPSR_Hyp access in Hyp mode */
 +            env->spsr = value;
 +        } else {
 +            env->banked_spsr[bank_number(tgtmode)] = value;
 +        }
          break;
      case 17: /* ELR_Hyp */
          env->elr_el[2] = value;
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mrs_banked)(CPUARMState *env, uint32_t tgtmode, uint32_t regno)
      switch (regno) {
      case 16: /* SPSRs */
 -        return env->banked_spsr[bank_number(tgtmode)];
 +        if (tgtmode == (env->uncached_cpsr & CPSR_M)) {
 +            /* Only happens for SPSR_Hyp access in Hyp mode */
 +            return env->spsr;
 +        } else {
 +            return env->banked_spsr[bank_number(tgtmode)];
 +        }
      case 17: /* ELR_Hyp */
          return env->elr_el[2];
      case 13:
 diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate.c
 +++ b/target/arm/tcg/translate.c
@@ -XXX,XX +XXX,XX @@ static bool msr_banked_access_decode(DisasContext *s, int r, int sysm, int rn,
          break;
      case ARM_CPU_MODE_HYP:
          /*
 -         * SPSR_hyp and r13_hyp can only be accessed from Monitor mode
 -         * (and so we can forbid accesses from EL2 or below). elr_hyp
 -         * can be accessed also from Hyp mode, so forbid accesses from
 -         * EL0 or EL1.
 +         * r13_hyp can only be accessed from Monitor mode, and so we
 +         * can forbid accesses from EL2 or below.
 +         * elr_hyp can be accessed also from Hyp mode, so forbid
 +         * accesses from EL0 or EL1.
 +         * SPSR_hyp is supposed to be in the same category as r13_hyp
 +         * and UNPREDICTABLE if accessed from anything except Monitor
 +         * mode. However there is some real-world code that will do
 +         * it because at least some hardware happens to permit the
 +         * access. (Notably a standard Cortex-R52 startup code fragment
 +         * does this.) So we permit SPSR_hyp from Hyp mode also, to allow
 +         * this (incorrect) guest code to run.
           */
 -        if (!arm_dc_feature(s, ARM_FEATURE_EL2) || s->current_el < 2 ||
 -            (s->current_el < 3 && *regno != 17)) {
 +        if (!arm_dc_feature(s, ARM_FEATURE_EL2) || s->current_el < 2
 +            || (s->current_el < 3 && *regno != 16 && *regno != 17)) {
              goto undef;
          }
          break;
 --
-.20.1
+.34.1

-[PULL 19/44] target/arm: Implement MVE shift-by-scalar
+[PULL 27/35] hw/misc/mps2-scc: Fix condition for CFG3 register
-Implement the MVE instructions which perform shifts by a scalar.
+We currently guard the CFG3 register read with
-These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2.  They take the
+ (scc_partno(s) == 0x524 && scc_partno(s) == 0x547)
-shift amount in a general purpose register and shift every element in
+which is clearly wrong as it is never true.
 the vector by that amount.
-Mostly we can reuse the helper functions for shift-by-immediate; we
+This register is present on all board types except AN524
-do need two new helpers for VQRSHL.
+and AN527; correct the condition.
+Fixes: 6ac80818941829c0 ("hw/misc/mps2-scc: Implement changes for AN547")
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-6-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  8 +++++++
+ hw/misc/mps2-scc.c | 2 +-
- target/arm/mve.decode      | 23 ++++++++++++++++---
+file changed, 1 insertion(+), 1 deletion(-)
  target/arm/mve_helper.c    |  2 ++
  target/arm/translate-mve.c | 46 ++++++++++++++++++++++++++++++++++++++
 files changed, 76 insertions(+), 3 deletions(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/misc/mps2-scc.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/misc/mps2-scc.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
- DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+         r = s->cfg2;
- DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+         break;
+     case A_CFG3:
-+DEF_HELPER_FLAGS_4(mve_vqrshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-        if (scc_partno(s) == 0x524 && scc_partno(s) == 0x547) {
-+DEF_HELPER_FLAGS_4(mve_vqrshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++        if (scc_partno(s) == 0x524 || scc_partno(s) == 0x547) {
-+DEF_HELPER_FLAGS_4(mve_vqrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+             /* CFG3 reserved on AN524 */
-+
+             goto bad_offset;
-+DEF_HELPER_FLAGS_4(mve_vqrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+         }
 +DEF_HELPER_FLAGS_4(mve_vqrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_4(mve_vshllbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vshllbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vshllbub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  &viwdup qd rn rm size imm
  &vcmp qm qn size mask
  &vcmp_scalar qn rm size mask
 +&shl_scalar qda rm size
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
  @2_shr_w .... .... .. 1 ..... .... .... .... .... &2shift qd=%qd qm=%qm \
           size=2 shift=%rshift_i5
 +@shl_scalar .... .... .... size:2 .. .... .... .... rm:4 &shl_scalar qda=%qd
 +
  # Vector comparison; 4-bit Qm but 3-bit Qn
  %mask_22_13      22:1 13:3
  @vcmp    .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
@@ -XXX,XX +XXX,XX @@ VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_no
  VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
  VSUB_scalar      1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
 -VMUL_scalar      1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 +
 +{
 +  VSHL_S_scalar   1110 1110 0 . 11 .. 01 ... 1 1110 0110 .... @shl_scalar
 +  VRSHL_S_scalar  1110 1110 0 . 11 .. 11 ... 1 1110 0110 .... @shl_scalar
 +  VQSHL_S_scalar  1110 1110 0 . 11 .. 01 ... 1 1110 1110 .... @shl_scalar
 +  VQRSHL_S_scalar 1110 1110 0 . 11 .. 11 ... 1 1110 1110 .... @shl_scalar
 +  VMUL_scalar     1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 +}
 +
 +{
 +  VSHL_U_scalar   1111 1110 0 . 11 .. 01 ... 1 1110 0110 .... @shl_scalar
 +  VRSHL_U_scalar  1111 1110 0 . 11 .. 11 ... 1 1110 0110 .... @shl_scalar
 +  VQSHL_U_scalar  1111 1110 0 . 11 .. 01 ... 1 1110 1110 .... @shl_scalar
 +  VQRSHL_U_scalar 1111 1110 0 . 11 .. 11 ... 1 1110 1110 .... @shl_scalar
 +  VBRSR           1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 +}
 +
  VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
  VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
  VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
@@ -XXX,XX +XXX,XX @@ VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
                    size=%size_28
  }
 -VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 -
  VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
  DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
  DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
  DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 +DO_2SHIFT_SAT_U(vqrshli_u, DO_UQRSHL_OP)
 +DO_2SHIFT_SAT_S(vqrshli_s, DO_SQRSHL_OP)
  /* Shift-and-insert; we always work with 64 bits at a time */
  #define DO_2SHIFT_INSERT(OP, ESIZE, SHIFTFN, MASKFN)                    \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VRSHRI_U, vrshli_u, true)
  DO_2SHIFT(VSRI, vsri, false)
  DO_2SHIFT(VSLI, vsli, false)
 +static bool do_2shift_scalar(DisasContext *s, arg_shl_scalar *a,
 +                             MVEGenTwoOpShiftFn *fn)
 +{
 +    TCGv_ptr qda;
 +    TCGv_i32 rm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qda) ||
 +        a->rm == 13 || a->rm == 15 || !fn) {
 +        /* Rm cases are UNPREDICTABLE */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qda = mve_qreg_ptr(a->qda);
 +    rm = load_reg(s, a->rm);
 +    fn(cpu_env, qda, qda, rm);
 +    tcg_temp_free_ptr(qda);
 +    tcg_temp_free_i32(rm);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +#define DO_2SHIFT_SCALAR(INSN, FN)                                      \
 +    static bool trans_##INSN(DisasContext *s, arg_shl_scalar *a)        \
 +    {                                                                   \
 +        static MVEGenTwoOpShiftFn * const fns[] = {                     \
 +            gen_helper_mve_##FN##b,                                     \
 +            gen_helper_mve_##FN##h,                                     \
 +            gen_helper_mve_##FN##w,                                     \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_2shift_scalar(s, a, fns[a->size]);                    \
 +    }
 +
 +DO_2SHIFT_SCALAR(VSHL_S_scalar, vshli_s)
 +DO_2SHIFT_SCALAR(VSHL_U_scalar, vshli_u)
 +DO_2SHIFT_SCALAR(VRSHL_S_scalar, vrshli_s)
 +DO_2SHIFT_SCALAR(VRSHL_U_scalar, vrshli_u)
 +DO_2SHIFT_SCALAR(VQSHL_S_scalar, vqshli_s)
 +DO_2SHIFT_SCALAR(VQSHL_U_scalar, vqshli_u)
 +DO_2SHIFT_SCALAR(VQRSHL_S_scalar, vqrshli_s)
 +DO_2SHIFT_SCALAR(VQRSHL_U_scalar, vqrshli_u)
 +
  #define DO_VSHLL(INSN, FN)                                      \
      static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
      {                                                           \
 --
-.20.1
+.34.1

-[PULL 35/44] target/arm: Implement MVE interleaving loads/stores
+[PULL 28/35] hw/misc/mps2-scc: Factor out which-board conditionals
-Implement the MVE interleaving load/store functions VLD2, VLD4, VST2
+The MPS SCC device has a lot of different flavours for the various
-and VST4.  VLD2 loads 16 bytes of data from memory and writes to 2
+different MPS FPGA images, which look mostly similar but have
-consecutive Qregs; VLD4 loads 16 bytes of data from memory and writes
+differences in how particular registers are handled.  Currently we
-to 4 consecutive Qregs.  The 'pattern' field in the encoding
+deal with this with a lot of open-coded checks on scc_partno(), but
-determines the offset into memory which is accessed and also which
+as we add more board types this is getting a bit hard to read.
-elements in the Qregs are written to.  (The intention is that a
-sequence of four consecutive VLD4 with different pattern values
+Factor out the conditions into some functions which we can
-performs a complete de-interleaving load of 64 bytes into all
+give more descriptive names to.
 elements of the 4 Qregs.) VST2 and VST4 do the same, but for stores.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-7-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  48 ++++++
+ hw/misc/mps2-scc.c | 45 +++++++++++++++++++++++++++++++--------------
- target/arm/mve.decode      |  11 ++
+file changed, 31 insertions(+), 14 deletions(-)
  target/arm/mve_helper.c    | 342 +++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  94 ++++++++++
 files changed, 495 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/misc/mps2-scc.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/misc/mps2-scc.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vldrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ static int scc_partno(MPS2SCC *s)
- DEF_HELPER_FLAGS_4(mve_vstrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+     return extract32(s->id, 4, 8);
  DEF_HELPER_FLAGS_4(mve_vstrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vld20b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld20h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld20w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vld21b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld21h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld21w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vld40b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld40h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld40w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vld41b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld41h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld41w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vld42b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld42h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld42w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vld43b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld43h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld43w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst20b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst20h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst20w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst21b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst21h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst21w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst40b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst40h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst40w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst41b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst41h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst41w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst42b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst42h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst42w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst43b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst43h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst43w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
  DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  &vabav qn qm rda size
  &vldst_sg qd qm rn size msize os
  &vldst_sg_imm qd qm a w imm
 +&vldst_il qd rn size pat w
  # scatter-gather memory size is in bits 6:4
  %sg_msize 6:1 4:1
@@ -XXX,XX +XXX,XX @@
  @vldst_sg_imm .... .... a:1 . w:1 . .... .... .... . imm:7 &vldst_sg_imm \
                qd=%qd qm=%qn
 +# Deinterleaving load/interleaving store
 +@vldst_il .... .... .. w:1 . rn:4 .... ... size:2 pat:2 ..... &vldst_il \
 +          qd=%qd
 +
  @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
  @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
  @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
@@ -XXX,XX +XXX,XX @@ VLDRD_sg_imm     111 1 1101 ... 1 ... 0 ... 1 1111 .... .... @vldst_sg_imm
  VSTRW_sg_imm     111 1 1101 ... 0 ... 0 ... 1 1110 .... .... @vldst_sg_imm
  VSTRD_sg_imm     111 1 1101 ... 0 ... 0 ... 1 1111 .... .... @vldst_sg_imm
 +# deinterleaving loads/interleaving stores
 +VLD2             1111 1100 1 .. 1 .... ... 1 111 .. .. 00000 @vldst_il
 +VLD4             1111 1100 1 .. 1 .... ... 1 111 .. .. 00001 @vldst_il
 +VST2             1111 1100 1 .. 0 .... ... 1 111 .. .. 00000 @vldst_il
 +VST4             1111 1100 1 .. 0 .... ... 1 111 .. .. 00001 @vldst_il
 +
  # Moves between 2 32-bit vector lanes and 2 general purpose registers
  VMOV_to_2gp      1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
  VMOV_from_2gp    1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true)
  DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true)
  DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true)
 +/*
 + * Deinterleaving loads/interleaving stores.
 + *
 + * For these helpers we are passed the index of the first Qreg
 + * (VLD2/VST2 will also access Qn+1, VLD4/VST4 access Qn .. Qn+3)
 + * and the value of the base address register Rn.
 + * The helpers are specialized for pattern and element size, so
 + * for instance vld42h is VLD4 with pattern 2, element size MO_16.
 + *
 + * These insns are beatwise but not predicated, so we must honour ECI,
 + * but need not look at mve_element_mask().
 + *
 + * The pseudocode implements these insns with multiple memory accesses
 + * of the element size, but rules R_VVVG and R_FXDM permit us to make
 + * one 32-bit memory access per beat.
 + */
 +#define DO_VLD4B(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat, e;                                                    \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            for (e = 0; e < 4; e++, data >>= 8) {                       \
 +                uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \
 +                qd[H1(off[beat])] = data;                               \
 +            }                                                           \
 +        }                                                               \
 +    }
 +
 +#define DO_VLD4H(OP, O1, O2)                                            \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O1, O2, O2 };               \
 +        uint32_t addr, data;                                            \
 +        int y; /* y counts 0 2 0 2 */                                   \
 +        uint16_t *qd;                                                   \
 +        for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) {   \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 8 + (beat & 1) * 4;               \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y);             \
 +            qd[H2(off[beat])] = data;                                   \
 +            data >>= 16;                                                \
 +            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1);         \
 +            qd[H2(off[beat])] = data;                                   \
 +        }                                                               \
 +    }
 +
 +#define DO_VLD4W(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint32_t *qd;                                                   \
 +        int y;                                                          \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            y = (beat + (O1 & 2)) & 3;                                  \
 +            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y);             \
 +            qd[H4(off[beat] >> 2)] = data;                              \
 +        }                                                               \
 +    }
 +
 +DO_VLD4B(vld40b, 0, 1, 10, 11)
 +DO_VLD4B(vld41b, 2, 3, 12, 13)
 +DO_VLD4B(vld42b, 4, 5, 14, 15)
 +DO_VLD4B(vld43b, 6, 7, 8, 9)
 +
 +DO_VLD4H(vld40h, 0, 5)
 +DO_VLD4H(vld41h, 1, 6)
 +DO_VLD4H(vld42h, 2, 7)
 +DO_VLD4H(vld43h, 3, 4)
 +
 +DO_VLD4W(vld40w, 0, 1, 10, 11)
 +DO_VLD4W(vld41w, 2, 3, 12, 13)
 +DO_VLD4W(vld42w, 4, 5, 14, 15)
 +DO_VLD4W(vld43w, 6, 7, 8, 9)
 +
 +#define DO_VLD2B(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat, e;                                                    \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint8_t *qd;                                                    \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 2;                                \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            for (e = 0; e < 4; e++, data >>= 8) {                       \
 +                qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1));    \
 +                qd[H1(off[beat] + (e >> 1))] = data;                    \
 +            }                                                           \
 +        }                                                               \
 +    }
 +
 +#define DO_VLD2H(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        int e;                                                          \
 +        uint16_t *qd;                                                   \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            for (e = 0; e < 2; e++, data >>= 16) {                      \
 +                qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e);         \
 +                qd[H2(off[beat])] = data;                               \
 +            }                                                           \
 +        }                                                               \
 +    }
 +
 +#define DO_VLD2W(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint32_t *qd;                                                   \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat];                                    \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1));    \
 +            qd[H4(off[beat] >> 3)] = data;                              \
 +        }                                                               \
 +    }
 +
 +DO_VLD2B(vld20b, 0, 2, 12, 14)
 +DO_VLD2B(vld21b, 4, 6, 8, 10)
 +
 +DO_VLD2H(vld20h, 0, 1, 6, 7)
 +DO_VLD2H(vld21h, 2, 3, 4, 5)
 +
 +DO_VLD2W(vld20w, 0, 4, 24, 28)
 +DO_VLD2W(vld21w, 8, 12, 16, 20)
 +
 +#define DO_VST4B(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat, e;                                                    \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            data = 0;                                                   \
 +            for (e = 3; e >= 0; e--) {                                  \
 +                uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \
 +                data = (data << 8) | qd[H1(off[beat])];                 \
 +            }                                                           \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +#define DO_VST4H(OP, O1, O2)                                            \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O1, O2, O2 };               \
 +        uint32_t addr, data;                                            \
 +        int y; /* y counts 0 2 0 2 */                                   \
 +        uint16_t *qd;                                                   \
 +        for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) {   \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 8 + (beat & 1) * 4;               \
 +            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y);             \
 +            data = qd[H2(off[beat])];                                   \
 +            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1);         \
 +            data |= qd[H2(off[beat])] << 16;                            \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +#define DO_VST4W(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint32_t *qd;                                                   \
 +        int y;                                                          \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            y = (beat + (O1 & 2)) & 3;                                  \
 +            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y);             \
 +            data = qd[H4(off[beat] >> 2)];                              \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +DO_VST4B(vst40b, 0, 1, 10, 11)
 +DO_VST4B(vst41b, 2, 3, 12, 13)
 +DO_VST4B(vst42b, 4, 5, 14, 15)
 +DO_VST4B(vst43b, 6, 7, 8, 9)
 +
 +DO_VST4H(vst40h, 0, 5)
 +DO_VST4H(vst41h, 1, 6)
 +DO_VST4H(vst42h, 2, 7)
 +DO_VST4H(vst43h, 3, 4)
 +
 +DO_VST4W(vst40w, 0, 1, 10, 11)
 +DO_VST4W(vst41w, 2, 3, 12, 13)
 +DO_VST4W(vst42w, 4, 5, 14, 15)
 +DO_VST4W(vst43w, 6, 7, 8, 9)
 +
 +#define DO_VST2B(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat, e;                                                    \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint8_t *qd;                                                    \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 2;                                \
 +            data = 0;                                                   \
 +            for (e = 3; e >= 0; e--) {                                  \
 +                qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1));    \
 +                data = (data << 8) | qd[H1(off[beat] + (e >> 1))];      \
 +            }                                                           \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +#define DO_VST2H(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        int e;                                                          \
 +        uint16_t *qd;                                                   \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            data = 0;                                                   \
 +            for (e = 1; e >= 0; e--) {                                  \
 +                qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e);         \
 +                data = (data << 16) | qd[H2(off[beat])];                \
 +            }                                                           \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +#define DO_VST2W(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint32_t *qd;                                                   \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat];                                    \
 +            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1));    \
 +            data = qd[H4(off[beat] >> 3)];                              \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +DO_VST2B(vst20b, 0, 2, 12, 14)
 +DO_VST2B(vst21b, 4, 6, 8, 10)
 +
 +DO_VST2H(vst20h, 0, 1, 6, 7)
 +DO_VST2H(vst21h, 2, 3, 4, 5)
 +
 +DO_VST2W(vst20w, 0, 4, 24, 28)
 +DO_VST2W(vst21w, 8, 12, 16, 20)
 +
  /*
   * The mergemask(D, R, M) macro performs the operation "*D = R" but
   * storing only the bytes which correspond to 1 bits in M,
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static inline int vidup_imm(DisasContext *s, int x)
  typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenLdStSGFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenLdStIlFn(TCGv_ptr, TCGv_i32, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSTRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
      return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
  }
-+static bool do_vldst_il(DisasContext *s, arg_vldst_il *a, MVEGenLdStIlFn *fn,
++/* Is CFG_REG2 present? */
-+                        int addrinc)
++static bool have_cfg2(MPS2SCC *s)
 +{
-+    TCGv_i32 rn;
++    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd) ||
 +        !fn || (a->rn == 13 && a->w) || a->rn == 15) {
 +        /* Variously UNPREDICTABLE or UNDEF or related-encoding */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    rn = load_reg(s, a->rn);
 +    /*
 +     * We pass the index of Qd, not a pointer, because the helper must
 +     * access multiple Q registers starting at Qd and working up.
 +     */
 +    fn(cpu_env, tcg_constant_i32(a->qd), rn);
 +
 +    if (a->w) {
 +        tcg_gen_addi_i32(rn, rn, addrinc);
 +        store_reg(s, a->rn, rn);
 +    } else {
 +        tcg_temp_free_i32(rn);
 +    }
 +    mve_update_and_store_eci(s);
 +    return true;
 +}
 +
-+/* This macro is just to make the arrays more compact in these functions */
++/* Is CFG_REG3 present? */
-+#define F(N) gen_helper_mve_##N
++static bool have_cfg3(MPS2SCC *s)
 +
 +static bool trans_VLD2(DisasContext *s, arg_vldst_il *a)
 +{
-+    static MVEGenLdStIlFn * const fns[4][4] = {
++    return scc_partno(s) != 0x524 && scc_partno(s) != 0x547;
 +        { F(vld20b), F(vld20h), F(vld20w), NULL, },
 +        { F(vld21b), F(vld21h), F(vld21w), NULL, },
 +        { NULL, NULL, NULL, NULL },
 +        { NULL, NULL, NULL, NULL },
 +    };
 +    if (a->qd > 6) {
 +        return false;
 +    }
 +    return do_vldst_il(s, a, fns[a->pat][a->size], 32);
 +}
 +
-+static bool trans_VLD4(DisasContext *s, arg_vldst_il *a)
++/* Is CFG_REG5 present? */
 +static bool have_cfg5(MPS2SCC *s)
 +{
-+    static MVEGenLdStIlFn * const fns[4][4] = {
++    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
 +        { F(vld40b), F(vld40h), F(vld40w), NULL, },
 +        { F(vld41b), F(vld41h), F(vld41w), NULL, },
 +        { F(vld42b), F(vld42h), F(vld42w), NULL, },
 +        { F(vld43b), F(vld43h), F(vld43w), NULL, },
 +    };
 +    if (a->qd > 4) {
 +        return false;
 +    }
 +    return do_vldst_il(s, a, fns[a->pat][a->size], 64);
 +}
 +
-+static bool trans_VST2(DisasContext *s, arg_vldst_il *a)
++/* Is CFG_REG6 present? */
 +static bool have_cfg6(MPS2SCC *s)
 +{
-+    static MVEGenLdStIlFn * const fns[4][4] = {
++    return scc_partno(s) == 0x524;
 +        { F(vst20b), F(vst20h), F(vst20w), NULL, },
 +        { F(vst21b), F(vst21h), F(vst21w), NULL, },
 +        { NULL, NULL, NULL, NULL },
 +        { NULL, NULL, NULL, NULL },
 +    };
 +    if (a->qd > 6) {
 +        return false;
 +    }
 +    return do_vldst_il(s, a, fns[a->pat][a->size], 32);
 +}
 +
-+static bool trans_VST4(DisasContext *s, arg_vldst_il *a)
+ /* Handle a write via the SYS_CFG channel to the specified function/device.
-+{
+  * Return false on error (reported to guest via SYS_CFGCTRL ERROR bit).
-+    static MVEGenLdStIlFn * const fns[4][4] = {
+  */
-+        { F(vst40b), F(vst40h), F(vst40w), NULL, },
+@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
-+        { F(vst41b), F(vst41h), F(vst41w), NULL, },
+         r = s->cfg1;
-+        { F(vst42b), F(vst42h), F(vst42w), NULL, },
+         break;
-+        { F(vst43b), F(vst43h), F(vst43w), NULL, },
+     case A_CFG2:
-+    };
+-        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
-+    if (a->qd > 4) {
+-            /* CFG2 reserved on other boards */
-+        return false;
++        if (!have_cfg2(s)) {
-+    }
+             goto bad_offset;
-+    return do_vldst_il(s, a, fns[a->pat][a->size], 64);
+         }
-+}
+         r = s->cfg2;
-+
+         break;
-+#undef F
+     case A_CFG3:
-+
+-        if (scc_partno(s) == 0x524 || scc_partno(s) == 0x547) {
- static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
+-            /* CFG3 reserved on AN524 */
- {
++        if (!have_cfg3(s)) {
-     TCGv_ptr qd;
+             goto bad_offset;
          }
          /* These are user-settable DIP switches on the board. We don't
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
          r = s->cfg4;
          break;
      case A_CFG5:
 -        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
 -            /* CFG5 reserved on other boards */
 +        if (!have_cfg5(s)) {
              goto bad_offset;
          }
          r = s->cfg5;
          break;
      case A_CFG6:
 -        if (scc_partno(s) != 0x524) {
 -            /* CFG6 reserved on other boards */
 +        if (!have_cfg6(s)) {
              goto bad_offset;
          }
          r = s->cfg6;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
          }
          break;
      case A_CFG2:
 -        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
 -            /* CFG2 reserved on other boards */
 +        if (!have_cfg2(s)) {
              goto bad_offset;
          }
          /* AN524: QSPI Select signal */
          s->cfg2 = value;
          break;
      case A_CFG5:
 -        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
 -            /* CFG5 reserved on other boards */
 +        if (!have_cfg5(s)) {
              goto bad_offset;
          }
          /* AN524: ACLK frequency in Hz */
          s->cfg5 = value;
          break;
      case A_CFG6:
 -        if (scc_partno(s) != 0x524) {
 -            /* CFG6 reserved on other boards */
 +        if (!have_cfg6(s)) {
              goto bad_offset;
          }
          /* AN524: Clock divider for BRAM */
 --
-.20.1
+.34.1

-[PULL 02/44] target/arm: Print MVE VPR in CPU dumps
+[PULL 29/35] hw/misc/mps2-scc: Make changes needed for AN536 FPGA image
-Include the MVE VPR register value in the CPU dumps produced by
+The MPS2 SCC device is broadly the same for all FPGA images, but has
-arm_cpu_dump_state() if we are printing FPU information. This
+minor differences in the behaviour of the CFG registers depending on
-makes it easier to interpret debug logs when predication is
+the image. In many cases we don't really care about the functionality
-active.
+controlled by these registers and a reads-as-written or similar
 behaviour is sufficient for the moment.
 For the AN536 the required behaviour is:
  * A_CFG0 has CPU reset and halt bits
     - implement as reads-as-written for the moment
  * A_CFG1 has flash or ATCM address 0 remap handling
     - QEMU doesn't model this; implement as reads-as-written
  * A_CFG2 has QSPI select (like AN524)
     - implemented (no behaviour, as with AN524)
  * A_CFG3 is MCC_MSB_ADDR "additional MCC addressing bits"
     - QEMU doesn't care about these, so use the existing
       RAZ behaviour for convenience
  * A_CFG4 is board rev (like all other images)
     - no change needed
  * A_CFG5 is ACLK frq in hz (like AN524)
     - implemented as reads-as-written, as for other boards
  * A_CFG6 is core 0 vector table base address
     - implemented as reads-as-written for the moment
  * A_CFG7 is core 1 vector table base address
     - implemented as reads-as-written for the moment
 Make the changes necessary for this; leave TODO comments where
 appropriate to indicate where we might want to come back and
 implement things like CPU reset.
 The other aspects of the device specific to this FPGA image (like the
 values of the board ID and similar registers) will be set via the
 device's qdev properties.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20240206132931.38376-8-peter.maydell@linaro.org
 ---
- target/arm/cpu.c | 3 +++
+ include/hw/misc/mps2-scc.h |   1 +
-file changed, 3 insertions(+)
+ hw/misc/mps2-scc.c         | 101 +++++++++++++++++++++++++++++++++----
+files changed, 92 insertions(+), 10 deletions(-)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/include/hw/misc/mps2-scc.h
-+++ b/target/arm/cpu.c
++++ b/include/hw/misc/mps2-scc.h
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_dump_state(CPUState *cs, FILE *f, int flags)
+@@ -XXX,XX +XXX,XX @@ struct MPS2SCC {
-                          i, v);
+     uint32_t cfg4;
-         }
+     uint32_t cfg5;
-         qemu_fprintf(f, "FPSCR: %08x\n", vfp_get_fpscr(env));
+     uint32_t cfg6;
-+        if (cpu_isar_feature(aa32_mve, cpu)) {
++    uint32_t cfg7;
-+            qemu_fprintf(f, "VPR: %08x\n", env->v7m.vpr);
+     uint32_t cfgdata_rtn;
      uint32_t cfgdata_out;
      uint32_t cfgctrl;
 diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/mps2-scc.c
 +++ b/hw/misc/mps2-scc.c
@@ -XXX,XX +XXX,XX @@ REG32(CFG3, 0xc)
  REG32(CFG4, 0x10)
  REG32(CFG5, 0x14)
  REG32(CFG6, 0x18)
 +REG32(CFG7, 0x1c)
  REG32(CFGDATA_RTN, 0xa0)
  REG32(CFGDATA_OUT, 0xa4)
  REG32(CFGCTRL, 0xa8)
@@ -XXX,XX +XXX,XX @@ static int scc_partno(MPS2SCC *s)
  /* Is CFG_REG2 present? */
  static bool have_cfg2(MPS2SCC *s)
  {
 -    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
 +    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547 ||
 +        scc_partno(s) == 0x536;
  }
  /* Is CFG_REG3 present? */
  static bool have_cfg3(MPS2SCC *s)
  {
 -    return scc_partno(s) != 0x524 && scc_partno(s) != 0x547;
 +    return scc_partno(s) != 0x524 && scc_partno(s) != 0x547 &&
 +        scc_partno(s) != 0x536;
  }
  /* Is CFG_REG5 present? */
  static bool have_cfg5(MPS2SCC *s)
  {
 -    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
 +    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547 ||
 +        scc_partno(s) == 0x536;
  }
  /* Is CFG_REG6 present? */
  static bool have_cfg6(MPS2SCC *s)
  {
 -    return scc_partno(s) == 0x524;
 +    return scc_partno(s) == 0x524 || scc_partno(s) == 0x536;
 +}
 +
 +/* Is CFG_REG7 present? */
 +static bool have_cfg7(MPS2SCC *s)
 +{
 +    return scc_partno(s) == 0x536;
 +}
 +
 +/* Does CFG_REG0 drive the 'remap' GPIO output? */
 +static bool cfg0_is_remap(MPS2SCC *s)
 +{
 +    return scc_partno(s) != 0x536;
 +}
 +
 +/* Is CFG_REG1 driving a set of LEDs? */
 +static bool cfg1_is_leds(MPS2SCC *s)
 +{
 +    return scc_partno(s) != 0x536;
  }
  /* Handle a write via the SYS_CFG channel to the specified function/device.
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
          if (!have_cfg3(s)) {
              goto bad_offset;
          }
 -        /* These are user-settable DIP switches on the board. We don't
 +        /*
 +         * These are user-settable DIP switches on the board. We don't
           * model that, so just return zeroes.
 +         *
 +         * TODO: for AN536 this is MCC_MSB_ADDR "additional MCC addressing
 +         * bits". These change which part of the DDR4 the motherboard
 +         * configuration controller can see in its memory map (see the
 +         * appnote section 2.4). QEMU doesn't model the MCC at all, so these
 +         * bits are not interesting to us; read-as-zero is as good as anything
 +         * else.
           */
          r = 0;
          break;
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
          }
          r = s->cfg6;
          break;
 +    case A_CFG7:
 +        if (!have_cfg7(s)) {
 +            goto bad_offset;
 +        }
++        r = s->cfg7;
++        break;
+     case A_CFGDATA_RTN:
+         r = s->cfgdata_rtn;
+         break;
+@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
+          * we always reflect bit 0 in the 'remap' GPIO output line,
+          * and let the board wire it up or not as it chooses.
+          * TODO on some boards bit 1 is CPU_WAIT.
++         *
++         * TODO: on the AN536 this register controls reset and halt
++         * for both CPUs. For the moment we don't implement this, so the
++         * register just reads as written.
+          */
+         s->cfg0 = value;
+-        qemu_set_irq(s->remap, s->cfg0 & 1);
++        if (cfg0_is_remap(s)) {
++            qemu_set_irq(s->remap, s->cfg0 & 1);
++        }
+         break;
+     case A_CFG1:
+         s->cfg1 = value;
+-        for (size_t i = 0; i < ARRAY_SIZE(s->led); i++) {
+-            led_set_state(s->led[i], extract32(value, i, 1));
++        /*
++         * On most boards this register drives LEDs.
++         *
++         * TODO: for AN536 this controls whether flash and ATCM are
++         * enabled or disabled on reset. QEMU doesn't model this, and
++         * always wires up RAM in the ATCM area and ROM in the flash area.
++         */
++        if (cfg1_is_leds(s)) {
++            for (size_t i = 0; i < ARRAY_SIZE(s->led); i++) {
++                led_set_state(s->led[i], extract32(value, i, 1));
++            }
+         }
+         break;
+     case A_CFG2:
+         if (!have_cfg2(s)) {
+             goto bad_offset;
+         }
+-        /* AN524: QSPI Select signal */
++        /* AN524, AN536: QSPI Select signal */
+         s->cfg2 = value;
+         break;
+     case A_CFG5:
+         if (!have_cfg5(s)) {
+             goto bad_offset;
+         }
+-        /* AN524: ACLK frequency in Hz */
++        /* AN524, AN536: ACLK frequency in Hz */
+         s->cfg5 = value;
+         break;
+     case A_CFG6:
+@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
+             goto bad_offset;
+         }
+         /* AN524: Clock divider for BRAM */
++        /* AN536: Core 0 vector table base address */
++        s->cfg6 = value;
++        break;
++    case A_CFG7:
++        if (!have_cfg7(s)) {
++            goto bad_offset;
++        }
++        /* AN536: Core 1 vector table base address */
+         s->cfg6 = value;
+         break;
+     case A_CFGDATA_OUT:
+@@ -XXX,XX +XXX,XX @@ static void mps2_scc_finalize(Object *obj)
+     g_free(s->oscclk_reset);
+ }
++static bool cfg7_needed(void *opaque)
++{
++    MPS2SCC *s = opaque;
++
++    return have_cfg7(s);
++}
++
++static const VMStateDescription vmstate_cfg7 = {
++    .name = "mps2-scc/cfg7",
++    .version_id = 1,
++    .minimum_version_id = 1,
++    .needed = cfg7_needed,
++    .fields = (const VMStateField[]) {
++        VMSTATE_UINT32(cfg7, MPS2SCC),
++        VMSTATE_END_OF_LIST()
++    }
++};
++
+ static const VMStateDescription mps2_scc_vmstate = {
+     .name = "mps2-scc",
+     .version_id = 3,
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription mps2_scc_vmstate = {
+         VMSTATE_VARRAY_UINT32(oscclk, MPS2SCC, num_oscclk,
+, vmstate_info_uint32, uint32_t),
+         VMSTATE_END_OF_LIST()
++    },
++    .subsections = (const VMStateDescription * const []) {
++        &vmstate_cfg7,
++        NULL
      }
- }
+ };
 --
-.20.1
+.34.1

-[PULL 03/44] target/arm: Fix MVE VSLI by 0 and VSRI by <dt>
+Deleted patch
-In the MVE shift-and-insert insns, we special case VSLI by 0
-and VSRI by <dt>. VSRI by <dt> means "don't update the destination",
-which is what we've implemented. However VSLI by 0 is "set
-destination to the input", so we don't want to use the same
-special-casing that we do for VSRI by <dt>.
-Since the generic logic gives the right answer for a shift
-by 0, just use that.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/mve_helper.c | 9 +++++----
-file changed, 5 insertions(+), 4 deletions(-)
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
-         uint16_t mask;                                                  \
-         uint64_t shiftmask;                                             \
-         unsigned e;                                                     \
--        if (shift == 0 || shift == ESIZE * 8) {                         \
-+        if (shift == ESIZE * 8) {                                       \
-             /*                                                          \
--             * Only VSLI can shift by 0; only VSRI can shift by <dt>.   \
--             * The generic logic would give the right answer for 0 but  \
--             * fails for <dt>.                                          \
-+             * Only VSRI can shift by <dt>; it should mean "don't       \
-+             * update the destination". The generic logic can't handle  \
-+             * this because it would try to shift by an out-of-range    \
-+             * amount, so special case it here.                         \
-              */                                                         \
-             goto done;                                                  \
-         }                                                               \
---
-.20.1

-[PULL 04/44] target/arm: Fix signed VADDV
+Deleted patch
-A cut-and-paste error meant we handled signed VADDV like
-unsigned VADDV; fix the type used.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/mve_helper.c | 6 +++---
-file changed, 3 insertions(+), 3 deletions(-)
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
-         return ra;                                              \
-     }                                                           \
--DO_VADDV(vaddvsb, 1, uint8_t)
--DO_VADDV(vaddvsh, 2, uint16_t)
--DO_VADDV(vaddvsw, 4, uint32_t)
-+DO_VADDV(vaddvsb, 1, int8_t)
-+DO_VADDV(vaddvsh, 2, int16_t)
-+DO_VADDV(vaddvsw, 4, int32_t)
- DO_VADDV(vaddvub, 1, uint8_t)
- DO_VADDV(vaddvuh, 2, uint16_t)
- DO_VADDV(vaddvuw, 4, uint32_t)
---
-.20.1

-[PULL 06/44] target/arm: Fix 48-bit saturating shifts
+Deleted patch
-In do_sqrshl48_d() and do_uqrshl48_d() we got some of the edge
-cases wrong and failed to saturate correctly:
-(1) In do_sqrshl48_d() we used the same code that do_shrshl_bhs()
-does to obtain the saturated most-negative and most-positive 48-bit
-signed values for the large-shift-left case.  This gives (1 << 47)
-for saturate-to-most-negative, but we weren't sign-extending this
-value to the 64-bit output as the pseudocode requires.
-(2) For left shifts by less than 48, we copied the "8/16 bit" code
-from do_sqrshl_bhs() and do_uqrshl_bhs().  This doesn't do the right
-thing because it assumes the C type we're working with is at least
-twice the number of bits we're saturating to (so that a shift left by
-bits-1 can't shift anything off the top of the value).  This isn't
-true for bits == 48, so we would incorrectly return 0 rather than the
-most-positive value for situations like "shift (1 << 44) right by
-".  Instead check for saturation by doing the shift and signextend
-and then testing whether shifting back left again gives the original
-value.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/mve_helper.c | 12 +++++-------
-file changed, 5 insertions(+), 7 deletions(-)
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
-         }
-         return src >> -shift;
-     } else if (shift < 48) {
--        int64_t val = src << shift;
--        int64_t extval = sextract64(val, 0, 48);
--        if (!sat || val == extval) {
-+        int64_t extval = sextract64(src << shift, 0, 48);
-+        if (!sat || src == (extval >> shift)) {
-             return extval;
-         }
-     } else if (!sat || src == 0) {
-@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
-     }
-     *sat = 1;
--    return (1ULL << 47) - (src >= 0);
-+    return src >= 0 ? MAKE_64BIT_MASK(0, 47) : MAKE_64BIT_MASK(47, 17);
- }
- /* Operate on 64-bit values, but saturate at 48 bits */
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t do_uqrshl48_d(uint64_t src, int64_t shift,
-             return extval;
-         }
-     } else if (shift < 48) {
--        uint64_t val = src << shift;
--        uint64_t extval = extract64(val, 0, 48);
--        if (!sat || val == extval) {
-+        uint64_t extval = extract64(src << shift, 0, 48);
-+        if (!sat || src == (extval >> shift)) {
-             return extval;
-         }
-     } else if (!sat || src == 0) {
---
-.20.1

-[PULL 07/44] target/arm: Fix MVE 48-bit SQRSHRL for small right shifts
+Deleted patch
-We got an edge case wrong in the 48-bit SQRSHRL implementation: if
-the shift is to the right, although it always makes the result
-smaller than the input value it might not be within the 48-bit range
-the result is supposed to be if the input had some bits in [63..48]
-set and the shift didn't bring all of those within the [47..0] range.
-Handle this similarly to the way we already do for this case in
-do_uqrshl48_d(): extend the calculated result from 48 bits,
-and return that if not saturating or if it doesn't change the
-result; otherwise fall through to return a saturated value.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/mve_helper.c | 11 +++++++++--
-file changed, 9 insertions(+), 2 deletions(-)
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t n, uint32_t shift)
- static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
-                                     bool round, uint32_t *sat)
- {
-+    int64_t val, extval;
-+
-     if (shift <= -48) {
-         /* Rounding the sign bit always produces 0. */
-         if (round) {
-@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
-     } else if (shift < 0) {
-         if (round) {
-             src >>= -shift - 1;
--            return (src >> 1) + (src & 1);
-+            val = (src >> 1) + (src & 1);
-+        } else {
-+            val = src >> -shift;
-+        }
-+        extval = sextract64(val, 0, 48);
-+        if (!sat || val == extval) {
-+            return extval;
-         }
--        return src >> -shift;
-     } else if (shift < 48) {
-         int64_t extval = sextract64(src << shift, 0, 48);
-         if (!sat || src == (extval >> shift)) {
---
-.20.1

-[PULL 08/44] target/arm: Fix calculation of LTP mask when LR is 0
+Deleted patch
-In mve_element_mask(), we calculate a mask for tail predication which
-should have a number of 1 bits based on the value of LR.  However,
-our MAKE_64BIT_MASK() macro has undefined behaviour when passed a
-zero length.  Special case this to give the all-zeroes mask we
-require.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/mve_helper.c | 3 ++-
-file changed, 2 insertions(+), 1 deletion(-)
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static uint16_t mve_element_mask(CPUARMState *env)
-          */
-         int masklen = env->regs[14] << env->v7m.ltpsize;
-         assert(masklen <= 16);
--        mask &= MAKE_64BIT_MASK(0, masklen);
-+        uint16_t ltpmask = masklen ? MAKE_64BIT_MASK(0, masklen) : 0;
-+        mask &= ltpmask;
-     }
-     if ((env->condexec_bits & 0xf) == 0) {
---
-.20.1

-[PULL 10/44] target/arm: Fix VPT advance when ECI is non-zero
+Deleted patch
-We were not paying attention to the ECI state when advancing the VPT
-state.  Architecturally, VPT state advance happens for every beat
-(see the pseudocode VPTAdvance()), so on every beat the 4 bits of
-VPR.P0 corresponding to the current beat are inverted if required,
-and at the end of beats 1 and 3 the VPR MASK fields are updated.
-This means that if the ECI state says we should not be executing all
-beats then we need to skip some of the updating of the VPR that we
-currently do in mve_advance_vpt().
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/mve_helper.c | 24 +++++++++++++++++-------
-file changed, 17 insertions(+), 7 deletions(-)
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
-     /* Advance the VPT and ECI state if necessary */
-     uint32_t vpr = env->v7m.vpr;
-     unsigned mask01, mask23;
-+    uint16_t inv_mask;
-+    uint16_t eci_mask = mve_eci_mask(env);
-     if ((env->condexec_bits & 0xf) == 0) {
-         env->condexec_bits = (env->condexec_bits == (ECI_A0A1A2B0 << 4)) ?
-@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
-         return;
-     }
-+    /* Invert P0 bits if needed, but only for beats we actually executed */
-     mask01 = FIELD_EX32(vpr, V7M_VPR, MASK01);
-     mask23 = FIELD_EX32(vpr, V7M_VPR, MASK23);
--    if (mask01 > 8) {
--        /* high bit set, but not 0b1000: invert the relevant half of P0 */
--        vpr ^= 0xff;
-+    /* Start by assuming we invert all bits corresponding to executed beats */
-+    inv_mask = eci_mask;
-+    if (mask01 <= 8) {
-+        /* MASK01 says don't invert low half of P0 */
-+        inv_mask &= ~0xff;
-     }
--    if (mask23 > 8) {
--        /* high bit set, but not 0b1000: invert the relevant half of P0 */
--        vpr ^= 0xff00;
-+    if (mask23 <= 8) {
-+        /* MASK23 says don't invert high half of P0 */
-+        inv_mask &= ~0xff00;
-     }
--    vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
-+    vpr ^= inv_mask;
-+    /* Only update MASK01 if beat 1 executed */
-+    if (eci_mask & 0xf0) {
-+        vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
-+    }
-+    /* Beat 3 always executes, so update MASK23 */
-     vpr = FIELD_DP32(vpr, V7M_VPR, MASK23, mask23 << 1);
-     env->v7m.vpr = vpr;
- }
---
-.20.1

-[PULL 11/44] target/arm: Fix VLDRB/H/W for predicated elements
+Deleted patch
-For vector loads, predicated elements are zeroed, instead of
-retaining their previous values (as happens for most data
-processing operations). This means we need to distinguish
-"beat not executed due to ECI" (don't touch destination
-element) from "beat executed but predicated out" (zero
-destination element).
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/mve_helper.c | 8 +++++---
-file changed, 5 insertions(+), 3 deletions(-)
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
-     env->v7m.vpr = vpr;
- }
--
-+/* For loads, predicated lanes are zeroed instead of keeping their old values */
- #define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE)                         \
-     void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
-     {                                                                   \
-         TYPE *d = vd;                                                   \
-         uint16_t mask = mve_element_mask(env);                          \
-+        uint16_t eci_mask = mve_eci_mask(env);                          \
-         unsigned b, e;                                                  \
-         /*                                                              \
-          * R_SXTM allows the dest reg to become UNKNOWN for abandoned   \
-@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
-          * then take an exception.                                      \
-          */                                                             \
-         for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
--            if (mask & (1 << b)) {                                      \
--                d[H##ESIZE(e)] = cpu_##LDTYPE##_data_ra(env, addr, GETPC()); \
-+            if (eci_mask & (1 << b)) {                                  \
-+                d[H##ESIZE(e)] = (mask & (1 << b)) ?                    \
-+                    cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0;     \
-             }                                                           \
-             addr += MSIZE;                                              \
-         }                                                               \
---
-.20.1

-[PULL 18/44] target/arm: Implement MVE VMLAS
+[PULL 30/35] hw/arm/mps3r: Initial skeleton for mps3-an536 board
-Implement the MVE VMLAS insn, which multiplies a vector by a vector
+The AN536 is another FPGA image for the MPS3 development board. Unlike
-and adds a scalar.
+the existing FPGA images we already model, this board uses a Cortex-R
 family CPU, and it does not use any equivalent to the M-profile
 "Subsystem for Embedded" SoC-equivalent that we model in hw/arm/armsse.c.
 It's therefore more convenient for us to model it as a completely
 separate C file.
 This commit adds the basic skeleton of the board model, and the
 code to create all the RAM and ROM. We assume that we're probably
 going to want to add more images in future, so use the same
 base class/subclass setup that mps2-tz.c uses, even though at
 the moment there's only a single subclass.
 Following commits will add the CPUs and the peripherals.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206132931.38376-9-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  4 ++++
+ MAINTAINERS                             |   3 +-
- target/arm/mve.decode      |  3 +++
+ configs/devices/arm-softmmu/default.mak |   1 +
- target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
+ hw/arm/mps3r.c                          | 239 ++++++++++++++++++++++++
- target/arm/translate-mve.c |  1 +
+ hw/arm/Kconfig                          |   5 +
-files changed, 34 insertions(+)
+ hw/arm/meson.build                      |   1 +
+files changed, 248 insertions(+), 1 deletion(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+ create mode 100644 hw/arm/mps3r.c
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/MAINTAINERS
-+++ b/target/arm/helper-mve.h
++++ b/MAINTAINERS
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i3
+@@ -XXX,XX +XXX,XX @@ F: include/hw/misc/imx7_*.h
- DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ F: hw/pci-host/designware.c
- DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ F: include/hw/pci-host/designware.h
-+DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-MPS2
-+DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++MPS2 / MPS3
-+DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ M: Peter Maydell <peter.maydell@linaro.org>
-+
+ L: qemu-arm@nongnu.org
- DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+ S: Maintained
- DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+ F: hw/arm/mps2.c
- DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+ F: hw/arm/mps2-tz.c
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
++F: hw/arm/mps3r.c
  F: hw/misc/mps2-*.c
  F: include/hw/misc/mps2-*.h
  F: hw/arm/armsse.c
 diff --git a/configs/devices/arm-softmmu/default.mak b/configs/devices/arm-softmmu/default.mak
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/configs/devices/arm-softmmu/default.mak
-+++ b/target/arm/mve.decode
++++ b/configs/devices/arm-softmmu/default.mak
-@@ -XXX,XX +XXX,XX @@ VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
+@@ -XXX,XX +XXX,XX @@ CONFIG_ARM_VIRT=y
- VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
+ # CONFIG_INTEGRATOR=n
- VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
+ # CONFIG_FSL_IMX31=n
+ # CONFIG_MUSICPAL=n
-+# The U bit (28) is don't-care because it does not affect the result
++# CONFIG_MPS3R=n
-+VMLAS            111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
+ # CONFIG_MUSCA=n
-+
+ # CONFIG_CHEETAH=n
- # Vector add across vector
+ # CONFIG_SX1=n
- {
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
-   VADDV          111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
+new file mode 100644
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/arm/mps3r.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Arm MPS3 board emulation for Cortex-R-based FPGA images.
 + * (For M-profile images see mps2.c and mps2tz.c.)
 + *
 + * Copyright (c) 2017 Linaro Limited
 + * Written by Peter Maydell
 + *
 + *  This program is free software; you can redistribute it and/or modify
 + *  it under the terms of the GNU General Public License version 2 or
 + *  (at your option) any later version.
 + */
 +
 +/*
 + * The MPS3 is an FPGA based dev board. This file handles FPGA images
 + * which use the Cortex-R CPUs. We model these separately from the
 + * M-profile images, because on M-profile the FPGA image is based on
 + * a "Subsystem for Embedded" which is similar to an SoC, whereas
 + * the R-profile FPGA images don't have that abstraction layer.
 + *
 + * We model the following FPGA images here:
 + *  "mps3-an536" -- dual Cortex-R52 as documented in Arm Application Note AN536
 + *
 + * Application Note AN536:
 + * https://developer.arm.com/documentation/dai0536/latest/
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/units.h"
 +#include "qapi/error.h"
 +#include "exec/address-spaces.h"
 +#include "cpu.h"
 +#include "hw/boards.h"
 +#include "hw/arm/boot.h"
 +
 +/* Define the layout of RAM and ROM in a board */
 +typedef struct RAMInfo {
 +    const char *name;
 +    hwaddr base;
 +    hwaddr size;
 +    int mrindex; /* index into rams[]; -1 for the system RAM block */
 +    int flags;
 +} RAMInfo;
 +
 +/*
 + * The MPS3 DDR is 3GiB, but on a 32-bit host QEMU doesn't permit
 + * emulation of that much guest RAM, so artificially make it smaller.
 + */
 +#if HOST_LONG_BITS == 32
 +#define MPS3_DDR_SIZE (1 * GiB)
 +#else
 +#define MPS3_DDR_SIZE (3 * GiB)
 +#endif
 +
 +/*
 + * Flag values:
 + * IS_MAIN: this is the main machine RAM
 + * IS_ROM: this area is read-only
 + */
 +#define IS_MAIN 1
 +#define IS_ROM 2
 +
 +#define MPS3R_RAM_MAX 9
 +
 +typedef enum MPS3RFPGAType {
 +    FPGA_AN536,
 +} MPS3RFPGAType;
 +
 +struct MPS3RMachineClass {
 +    MachineClass parent;
 +    MPS3RFPGAType fpga_type;
 +    const RAMInfo *raminfo;
 +};
 +
 +struct MPS3RMachineState {
 +    MachineState parent;
 +    MemoryRegion ram[MPS3R_RAM_MAX];
 +};
 +
 +#define TYPE_MPS3R_MACHINE "mps3r"
 +#define TYPE_MPS3R_AN536_MACHINE MACHINE_TYPE_NAME("mps3-an536")
 +
 +OBJECT_DECLARE_TYPE(MPS3RMachineState, MPS3RMachineClass, MPS3R_MACHINE)
 +
 +static const RAMInfo an536_raminfo[] = {
 +    {
 +        .name = "ATCM",
 +        .base = 0x00000000,
 +        .size = 0x00008000,
 +        .mrindex = 0,
 +    }, {
 +        /* We model the QSPI flash as simple ROM for now */
 +        .name = "QSPI",
 +        .base = 0x08000000,
 +        .size = 0x00800000,
 +        .flags = IS_ROM,
 +        .mrindex = 1,
 +    }, {
 +        .name = "BRAM",
 +        .base = 0x10000000,
 +        .size = 0x00080000,
 +        .mrindex = 2,
 +    }, {
 +        .name = "DDR",
 +        .base = 0x20000000,
 +        .size = MPS3_DDR_SIZE,
 +        .mrindex = -1,
 +    }, {
 +        .name = "ATCM0",
 +        .base = 0xee000000,
 +        .size = 0x00008000,
 +        .mrindex = 3,
 +    }, {
 +        .name = "BTCM0",
 +        .base = 0xee100000,
 +        .size = 0x00008000,
 +        .mrindex = 4,
 +    }, {
 +        .name = "CTCM0",
 +        .base = 0xee200000,
 +        .size = 0x00008000,
 +        .mrindex = 5,
 +    }, {
 +        .name = "ATCM1",
 +        .base = 0xee400000,
 +        .size = 0x00008000,
 +        .mrindex = 6,
 +    }, {
 +        .name = "BTCM1",
 +        .base = 0xee500000,
 +        .size = 0x00008000,
 +        .mrindex = 7,
 +    }, {
 +        .name = "CTCM1",
 +        .base = 0xee600000,
 +        .size = 0x00008000,
 +        .mrindex = 8,
 +    }, {
 +        .name = NULL,
 +    }
 +};
 +
 +static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
 +                                    const RAMInfo *raminfo)
 +{
 +    /* Return an initialized MemoryRegion for the RAMInfo. */
 +    MemoryRegion *ram;
 +
 +    if (raminfo->mrindex < 0) {
 +        /* Means this RAMInfo is for QEMU's "system memory" */
 +        MachineState *machine = MACHINE(mms);
 +        assert(!(raminfo->flags & IS_ROM));
 +        return machine->ram;
 +    }
 +
 +    assert(raminfo->mrindex < MPS3R_RAM_MAX);
 +    ram = &mms->ram[raminfo->mrindex];
 +
 +    memory_region_init_ram(ram, NULL, raminfo->name,
 +                           raminfo->size, &error_fatal);
 +    if (raminfo->flags & IS_ROM) {
 +        memory_region_set_readonly(ram, true);
 +    }
 +    return ram;
 +}
 +
 +static void mps3r_common_init(MachineState *machine)
 +{
 +    MPS3RMachineState *mms = MPS3R_MACHINE(machine);
 +    MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
 +    MemoryRegion *sysmem = get_system_memory();
 +
 +    for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
 +        MemoryRegion *mr = mr_for_raminfo(mms, ri);
 +        memory_region_add_subregion(sysmem, ri->base, mr);
 +    }
 +}
 +
 +static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
 +{
 +    /*
 +     * Set mc->default_ram_size and default_ram_id from the
 +     * information in mmc->raminfo.
 +     */
 +    MachineClass *mc = MACHINE_CLASS(mmc);
 +    const RAMInfo *p;
 +
 +    for (p = mmc->raminfo; p->name; p++) {
 +        if (p->mrindex < 0) {
 +            /* Found the entry for "system memory" */
 +            mc->default_ram_size = p->size;
 +            mc->default_ram_id = p->name;
 +            return;
 +        }
 +    }
 +    g_assert_not_reached();
 +}
 +
 +static void mps3r_class_init(ObjectClass *oc, void *data)
 +{
 +    MachineClass *mc = MACHINE_CLASS(oc);
 +
 +    mc->init = mps3r_common_init;
 +}
 +
 +static void mps3r_an536_class_init(ObjectClass *oc, void *data)
 +{
 +    MachineClass *mc = MACHINE_CLASS(oc);
 +    MPS3RMachineClass *mmc = MPS3R_MACHINE_CLASS(oc);
 +    static const char * const valid_cpu_types[] = {
 +        ARM_CPU_TYPE_NAME("cortex-r52"),
 +        NULL
 +    };
 +
 +    mc->desc = "ARM MPS3 with AN536 FPGA image for Cortex-R52";
 +    mc->default_cpus = 2;
 +    mc->min_cpus = mc->default_cpus;
 +    mc->max_cpus = mc->default_cpus;
 +    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-r52");
 +    mc->valid_cpu_types = valid_cpu_types;
 +    mmc->raminfo = an536_raminfo;
 +    mps3r_set_default_ram_info(mmc);
 +}
 +
 +static const TypeInfo mps3r_machine_types[] = {
 +    {
 +        .name = TYPE_MPS3R_MACHINE,
 +        .parent = TYPE_MACHINE,
 +        .abstract = true,
 +        .instance_size = sizeof(MPS3RMachineState),
 +        .class_size = sizeof(MPS3RMachineClass),
 +        .class_init = mps3r_class_init,
 +    }, {
 +        .name = TYPE_MPS3R_AN536_MACHINE,
 +        .parent = TYPE_MPS3R_MACHINE,
 +        .class_init = mps3r_an536_class_init,
 +    },
 +};
 +
 +DEFINE_TYPES(mps3r_machine_types);
 diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
+--- a/hw/arm/Kconfig
-+++ b/target/arm/mve_helper.c
++++ b/hw/arm/Kconfig
-@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
+@@ -XXX,XX +XXX,XX @@ config MAINSTONE
-         mve_advance_vpt(env);                                           \
+     select PFLASH_CFI01
-     }
+     select SMC91C111
-+/* "accumulating" version where FN takes d as well as n and m */
++config MPS3R
-+#define DO_2OP_ACC_SCALAR(OP, ESIZE, TYPE, FN)                          \
++    bool
-+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
++    default y
-+                                uint32_t rm)                            \
++    depends on TCG && ARM
-+    {                                                                   \
++
-+        TYPE *d = vd, *n = vn;                                          \
+ config MUSCA
-+        TYPE m = rm;                                                    \
+     bool
-+        uint16_t mask = mve_element_mask(env);                          \
+     default y
-+        unsigned e;                                                     \
+diff --git a/hw/arm/meson.build b/hw/arm/meson.build
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            mergemask(&d[H##ESIZE(e)],                                  \
 +                      FN(d[H##ESIZE(e)], n[H##ESIZE(e)], m), mask);     \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
  /* provide unsigned 2-op scalar helpers for all sizes */
  #define DO_2OP_SCALAR_U(OP, FN)                 \
      DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
      DO_2OP_SCALAR(OP##h, 2, int16_t, FN)        \
      DO_2OP_SCALAR(OP##w, 4, int32_t, FN)
 +#define DO_2OP_ACC_SCALAR_U(OP, FN)             \
 +    DO_2OP_ACC_SCALAR(OP##b, 1, uint8_t, FN)    \
 +    DO_2OP_ACC_SCALAR(OP##h, 2, uint16_t, FN)   \
 +    DO_2OP_ACC_SCALAR(OP##w, 4, uint32_t, FN)
 +
  DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
  DO_2OP_SCALAR_U(vsub_scalar, DO_SUB)
  DO_2OP_SCALAR_U(vmul_scalar, DO_MUL)
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 +/* Vector by vector plus scalar */
 +#define DO_VMLAS(D, N, M) ((N) * (D) + (M))
 +
 +DO_2OP_ACC_SCALAR_U(vmlas, DO_VMLAS)
 +
  /*
   * Long saturating scalar ops. As with DO_2OP_L, TYPE and H are for the
   * input (smaller) type and LESIZE, LTYPE, LH for the output (long) type.
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
+--- a/hw/arm/meson.build
-+++ b/target/arm/translate-mve.c
++++ b/hw/arm/meson.build
-@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
+@@ -XXX,XX +XXX,XX @@ arm_ss.add(when: 'CONFIG_HIGHBANK', if_true: files('highbank.c'))
- DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
+ arm_ss.add(when: 'CONFIG_INTEGRATOR', if_true: files('integratorcp.c'))
- DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
+ arm_ss.add(when: 'CONFIG_MAINSTONE', if_true: files('mainstone.c'))
- DO_2OP_SCALAR(VBRSR, vbrsr)
+ arm_ss.add(when: 'CONFIG_MICROBIT', if_true: files('microbit.c'))
-+DO_2OP_SCALAR(VMLAS, vmlas)
++arm_ss.add(when: 'CONFIG_MPS3R', if_true: files('mps3r.c'))
+ arm_ss.add(when: 'CONFIG_MUSICPAL', if_true: files('musicpal.c'))
- static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
+ arm_ss.add(when: 'CONFIG_NETDUINOPLUS2', if_true: files('netduinoplus2.c'))
- {
+ arm_ss.add(when: 'CONFIG_OLIMEX_STM32_H405', if_true: files('olimex-stm32-h405.c'))
 --
-.20.1
+.34.1

-[PULL 12/44] target/arm: Implement MVE VMULL (polynomial)
+[PULL 31/35] hw/arm/mps3r: Add CPUs, GIC, and per-CPU RAM
-Implement the MVE VMULL (polynomial) insn.  Unlike Neon, this comes
+Create the CPUs, the GIC, and the per-CPU RAM block for
-in two flavours: 8x8->16 and a 16x16->32.  Also unlike Neon, the
+the mps3-an536 board.
 inputs are in either the low or the high half of each double-width
 element.
 The assembler for this insn indicates the size with "P8" or "P16",
 encoded into bit 28 as size = 0 or 1. We choose to follow the
 same encoding as VQDMULL and decode this into a->size as MO_16
 or MO_32 indicating the size of the result elements. This then
 carries through to the helper function names where it then
 matches up with the existing pmull_h() which does an 8x8->16
 operation and a new pmull_w() which does the 16x16->32.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240206132931.38376-10-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  5 +++++
+ hw/arm/mps3r.c | 180 ++++++++++++++++++++++++++++++++++++++++++++++++-
- target/arm/vec_internal.h  | 11 +++++++++++
+file changed, 177 insertions(+), 3 deletions(-)
  target/arm/mve.decode      | 14 ++++++++++----
  target/arm/mve_helper.c    | 16 ++++++++++++++++
  target/arm/translate-mve.c | 28 ++++++++++++++++++++++++++++
  target/arm/vec_helper.c    | 14 +++++++++++++-
 files changed, 83 insertions(+), 5 deletions(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/mps3r.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@
- DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "qemu/osdep.h"
- DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "qemu/units.h"
+ #include "qapi/error.h"
-+DEF_HELPER_FLAGS_4(mve_vmullpbh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++#include "qapi/qmp/qlist.h"
-+DEF_HELPER_FLAGS_4(mve_vmullpth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "exec/address-spaces.h"
-+DEF_HELPER_FLAGS_4(mve_vmullpbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "cpu.h"
-+DEF_HELPER_FLAGS_4(mve_vmullptw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "hw/boards.h"
-+
++#include "hw/qdev-properties.h"
- DEF_HELPER_FLAGS_4(mve_vqdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "hw/arm/boot.h"
- DEF_HELPER_FLAGS_4(mve_vqdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++#include "hw/arm/bsa.h"
- DEF_HELPER_FLAGS_4(mve_vqdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++#include "hw/intc/arm_gicv3.h"
-diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
-index XXXXXXX..XXXXXXX 100644
+ /* Define the layout of RAM and ROM in a board */
---- a/target/arm/vec_internal.h
+ typedef struct RAMInfo {
-+++ b/target/arm/vec_internal.h
+@@ -XXX,XX +XXX,XX @@ typedef struct RAMInfo {
-@@ -XXX,XX +XXX,XX @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, bool, uint32_t *);
+ #define IS_ROM 2
- int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *);
- int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool);
+ #define MPS3R_RAM_MAX 9
 +#define MPS3R_CPU_MAX 2
 +
 +#define PERIPHBASE 0xf0000000
 +#define NUM_SPIS 96
  typedef enum MPS3RFPGAType {
      FPGA_AN536,
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineClass {
      MachineClass parent;
      MPS3RFPGAType fpga_type;
      const RAMInfo *raminfo;
 +    hwaddr loader_start;
  };
  struct MPS3RMachineState {
      MachineState parent;
 +    struct arm_boot_info bootinfo;
      MemoryRegion ram[MPS3R_RAM_MAX];
 +    Object *cpu[MPS3R_CPU_MAX];
 +    MemoryRegion cpu_sysmem[MPS3R_CPU_MAX];
 +    MemoryRegion sysmem_alias[MPS3R_CPU_MAX];
 +    MemoryRegion cpu_ram[MPS3R_CPU_MAX];
 +    GICv3State gic;
  };
  #define TYPE_MPS3R_MACHINE "mps3r"
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
      return ram;
  }
 +/*
-+ * 8 x 8 -> 16 vector polynomial multiply where the inputs are
++ * There is no defined secondary boot protocol for Linux for the AN536,
-+ * in the low 8 bits of each 16-bit element
++ * because real hardware has a restriction that atomic operations between
-+*/
++ * the two CPUs do not function correctly, and so true SMP is not
-+uint64_t pmull_h(uint64_t op1, uint64_t op2);
++ * possible. Therefore for cases where the user is directly booting
-+/*
++ * a kernel, we treat the system as essentially uniprocessor, and
-+ * 16 x 16 -> 32 vector polynomial multiply where the inputs are
++ * put the secondary CPU into power-off state (as if the user on the
-+ * in the low 16 bits of each 32-bit element
++ * real hardware had configured the secondary to be halted via the
 + * SCC config registers).
 + *
 + * Note that the default secondary boot code would not work here anyway
 + * as it assumes a GICv2, and we have a GICv3.
 + */
-+uint64_t pmull_w(uint64_t op1, uint64_t op2);
++static void mps3r_write_secondary_boot(ARMCPU *cpu,
-+
++                                       const struct arm_boot_info *info)
  #endif /* TARGET_ARM_VEC_INTERNALS_H */
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VHADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
  VHSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
  VHSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 -VMULL_BS         111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
 -VMULL_BU         111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
 -VMULL_TS         111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 -VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 +{
 +  VMULLP_B       111 . 1110 0 . 11 ... 1 ... 0 1110 . 0 . 0 ... 0 @2op_sz28
 +  VMULL_BS       111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
 +  VMULL_BU       111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
 +}
 +{
 +  VMULLP_T       111 . 1110 0 . 11 ... 1 ... 1 1110 . 0 . 0 ... 0 @2op_sz28
 +  VMULL_TS       111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 +  VMULL_TU       111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 +}
  VQDMULH          1110 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
  VQRDMULH         1111 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_L(vmulltub, 1, 1, uint8_t, 2, uint16_t, DO_MUL)
  DO_2OP_L(vmulltuh, 1, 2, uint16_t, 4, uint32_t, DO_MUL)
  DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
 +/*
 + * Polynomial multiply. We can always do this generating 64 bits
 + * of the result at a time, so we don't need to use DO_2OP_L.
 + */
 +#define VMULLPH_MASK 0x00ff00ff00ff00ffULL
 +#define VMULLPW_MASK 0x0000ffff0000ffffULL
 +#define DO_VMULLPBH(N, M) pmull_h((N) & VMULLPH_MASK, (M) & VMULLPH_MASK)
 +#define DO_VMULLPTH(N, M) DO_VMULLPBH((N) >> 8, (M) >> 8)
 +#define DO_VMULLPBW(N, M) pmull_w((N) & VMULLPW_MASK, (M) & VMULLPW_MASK)
 +#define DO_VMULLPTW(N, M) DO_VMULLPBW((N) >> 16, (M) >> 16)
 +
 +DO_2OP(vmullpbh, 8, uint64_t, DO_VMULLPBH)
 +DO_2OP(vmullpth, 8, uint64_t, DO_VMULLPTH)
 +DO_2OP(vmullpbw, 8, uint64_t, DO_VMULLPBW)
 +DO_2OP(vmullptw, 8, uint64_t, DO_VMULLPTW)
 +
  /*
   * Because the computation type is at least twice as large as required,
   * these work for both signed and unsigned source types.
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMULLT(DisasContext *s, arg_2op *a)
      return do_2op(s, a, fns[a->size]);
  }
 +static bool trans_VMULLP_B(DisasContext *s, arg_2op *a)
 +{
 +    /*
-+     * Note that a->size indicates the output size, ie VMULL.P8
++     * Power the secondary CPU off. This means we don't need to write any
-+     * is the 8x8->16 operation and a->size is MO_16; VMULL.P16
++     * boot code into guest memory. Note that the 'cpu' argument to this
-+     * is the 16x16->32 operation and a->size is MO_32.
++     * function is the primary CPU we passed to arm_load_kernel(), not
 +     * the secondary. Loop around all the other CPUs, as the boot.c
 +     * code does for the "disable secondaries if PSCI is enabled" case.
 +     */
-+    static MVEGenTwoOpFn * const fns[] = {
++    for (CPUState *cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
-+        NULL,
++        if (cs != first_cpu) {
-+        gen_helper_mve_vmullpbh,
++            object_property_set_bool(OBJECT(cs), "start-powered-off", true,
-+        gen_helper_mve_vmullpbw,
++                                     &error_abort);
-+        NULL,
++        }
-+    };
++    }
 +    return do_2op(s, a, fns[a->size]);
 +}
 +
-+static bool trans_VMULLP_T(DisasContext *s, arg_2op *a)
++static void mps3r_secondary_cpu_reset(ARMCPU *cpu,
 +                                      const struct arm_boot_info *info)
 +{
-+    /* a->size is as for trans_VMULLP_B */
++    /* We don't need to do anything here because the CPU will be off */
 +    static MVEGenTwoOpFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vmullpth,
 +        gen_helper_mve_vmullptw,
 +        NULL,
 +    };
 +    return do_2op(s, a, fns[a->size]);
 +}
 +
- /*
++static void create_gic(MPS3RMachineState *mms, MemoryRegion *sysmem)
-  * VADC and VSBC: these perform an add-with-carry or subtract-with-carry
++{
-  * of the 32-bit elements in each lane of the input vectors, where the
++    MachineState *machine = MACHINE(mms);
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
++    DeviceState *gicdev;
-index XXXXXXX..XXXXXXX 100644
++    QList *redist_region_count;
---- a/target/arm/vec_helper.c
++
-+++ b/target/arm/vec_helper.c
++    object_initialize_child(OBJECT(mms), "gic", &mms->gic, TYPE_ARM_GICV3);
-@@ -XXX,XX +XXX,XX @@ static uint64_t expand_byte_to_half(uint64_t x)
++    gicdev = DEVICE(&mms->gic);
-          | ((x & 0xff000000) << 24);
++    qdev_prop_set_uint32(gicdev, "num-cpu", machine->smp.cpus);
 +    qdev_prop_set_uint32(gicdev, "num-irq", NUM_SPIS + GIC_INTERNAL);
 +    redist_region_count = qlist_new();
 +    qlist_append_int(redist_region_count, machine->smp.cpus);
 +    qdev_prop_set_array(gicdev, "redist-region-count", redist_region_count);
 +    object_property_set_link(OBJECT(&mms->gic), "sysmem",
 +                             OBJECT(sysmem), &error_fatal);
 +    sysbus_realize(SYS_BUS_DEVICE(&mms->gic), &error_fatal);
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->gic), 0, PERIPHBASE);
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->gic), 1, PERIPHBASE + 0x100000);
 +    /*
 +     * Wire the outputs from each CPU's generic timer and the GICv3
 +     * maintenance interrupt signal to the appropriate GIC PPI inputs,
 +     * and the GIC's IRQ/FIQ/VIRQ/VFIQ interrupt outputs to the CPU's inputs.
 +     */
 +    for (int i = 0; i < machine->smp.cpus; i++) {
 +        DeviceState *cpudev = DEVICE(mms->cpu[i]);
 +        SysBusDevice *gicsbd = SYS_BUS_DEVICE(&mms->gic);
 +        int intidbase = NUM_SPIS + i * GIC_INTERNAL;
 +        int irq;
 +        /*
 +         * Mapping from the output timer irq lines from the CPU to the
 +         * GIC PPI inputs used for this board. This isn't a BSA board,
 +         * but it uses the standard convention for the PPI numbers.
 +         */
 +        const int timer_irq[] = {
 +            [GTIMER_PHYS] = ARCH_TIMER_NS_EL1_IRQ,
 +            [GTIMER_VIRT] = ARCH_TIMER_VIRT_IRQ,
 +            [GTIMER_HYP]  = ARCH_TIMER_NS_EL2_IRQ,
 +        };
 +
 +        for (irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
 +            qdev_connect_gpio_out(cpudev, irq,
 +                                  qdev_get_gpio_in(gicdev,
 +                                                   intidbase + timer_irq[irq]));
 +        }
 +
 +        qdev_connect_gpio_out_named(cpudev, "gicv3-maintenance-interrupt", 0,
 +                                    qdev_get_gpio_in(gicdev,
 +                                                     intidbase + ARCH_GIC_MAINT_IRQ));
 +
 +        qdev_connect_gpio_out_named(cpudev, "pmu-interrupt", 0,
 +                                    qdev_get_gpio_in(gicdev,
 +                                                     intidbase + VIRTUAL_PMU_IRQ));
 +
 +        sysbus_connect_irq(gicsbd, i,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_IRQ));
 +        sysbus_connect_irq(gicsbd, i + machine->smp.cpus,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_FIQ));
 +        sysbus_connect_irq(gicsbd, i + 2 * machine->smp.cpus,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_VIRQ));
 +        sysbus_connect_irq(gicsbd, i + 3 * machine->smp.cpus,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ));
 +    }
 +}
 +
  static void mps3r_common_init(MachineState *machine)
  {
      MPS3RMachineState *mms = MPS3R_MACHINE(machine);
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
          MemoryRegion *mr = mr_for_raminfo(mms, ri);
          memory_region_add_subregion(sysmem, ri->base, mr);
      }
 +
 +    assert(machine->smp.cpus <= MPS3R_CPU_MAX);
 +    for (int i = 0; i < machine->smp.cpus; i++) {
 +        g_autofree char *sysmem_name = g_strdup_printf("cpu-%d-memory", i);
 +        g_autofree char *ramname = g_strdup_printf("cpu-%d-memory", i);
 +        g_autofree char *alias_name = g_strdup_printf("sysmem-alias-%d", i);
 +
 +        /*
 +         * Each CPU has some private RAM/peripherals, so create the container
 +         * which will house those, with the whole-machine system memory being
 +         * used where there's no CPU-specific device. Note that we need the
 +         * sysmem_alias aliases because we can't put one MR (the original
 +         * 'sysmem') into more than one other MR.
 +         */
 +        memory_region_init(&mms->cpu_sysmem[i], OBJECT(machine),
 +                           sysmem_name, UINT64_MAX);
 +        memory_region_init_alias(&mms->sysmem_alias[i], OBJECT(machine),
 +                                 alias_name, sysmem, 0, UINT64_MAX);
 +        memory_region_add_subregion_overlap(&mms->cpu_sysmem[i], 0,
 +                                            &mms->sysmem_alias[i], -1);
 +
 +        mms->cpu[i] = object_new(machine->cpu_type);
 +        object_property_set_link(mms->cpu[i], "memory",
 +                                 OBJECT(&mms->cpu_sysmem[i]), &error_abort);
 +        object_property_set_int(mms->cpu[i], "reset-cbar",
 +                                PERIPHBASE, &error_abort);
 +        qdev_realize(DEVICE(mms->cpu[i]), NULL, &error_fatal);
 +        object_unref(mms->cpu[i]);
 +
 +        /* Per-CPU RAM */
 +        memory_region_init_ram(&mms->cpu_ram[i], NULL, ramname,
 +                               0x1000, &error_fatal);
 +        memory_region_add_subregion(&mms->cpu_sysmem[i], 0xe7c01000,
 +                                    &mms->cpu_ram[i]);
 +    }
 +
 +    create_gic(mms, sysmem);
 +
 +    mms->bootinfo.ram_size = machine->ram_size;
 +    mms->bootinfo.board_id = -1;
 +    mms->bootinfo.loader_start = mmc->loader_start;
 +    mms->bootinfo.write_secondary_boot = mps3r_write_secondary_boot;
 +    mms->bootinfo.secondary_cpu_reset_hook = mps3r_secondary_cpu_reset;
 +    arm_load_kernel(ARM_CPU(mms->cpu[0]), machine, &mms->bootinfo);
  }
--static uint64_t pmull_h(uint64_t op1, uint64_t op2)
+ static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
-+uint64_t pmull_w(uint64_t op1, uint64_t op2)
+@@ -XXX,XX +XXX,XX @@ static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
- {
+             /* Found the entry for "system memory" */
-     uint64_t result = 0;
+             mc->default_ram_size = p->size;
-     int i;
+             mc->default_ram_id = p->name;
-+    for (i = 0; i < 16; ++i) {
++            mmc->loader_start = p->base;
-+        uint64_t mask = (op1 & 0x0000000100000001ull) * 0xffffffff;
+             return;
-+        result ^= op2 & mask;
+         }
-+        op1 >>= 1;
+     }
-+        op2 <<= 1;
+@@ -XXX,XX +XXX,XX @@ static void mps3r_an536_class_init(ObjectClass *oc, void *data)
-+    }
+     };
-+    return result;
-+}
+     mc->desc = "ARM MPS3 with AN536 FPGA image for Cortex-R52";
+-    mc->default_cpus = 2;
-+uint64_t pmull_h(uint64_t op1, uint64_t op2)
+-    mc->min_cpus = mc->default_cpus;
-+{
+-    mc->max_cpus = mc->default_cpus;
-+    uint64_t result = 0;
++    /*
-+    int i;
++     * In the real FPGA image there are always two cores, but the standard
-     for (i = 0; i < 8; ++i) {
++     * initial setting for the SCC SYSCON 0x000 register is 0x21, meaning
-         uint64_t mask = (op1 & 0x0001000100010001ull) * 0xffff;
++     * that the second core is held in reset and halted. Many images built for
-         result ^= op2 & mask;
++     * the board do not expect the second core to run at startup (especially
 +     * since on the real FPGA image it is not possible to use LDREX/STREX
 +     * in RAM between the two cores, so a true SMP setup isn't supported).
 +     *
 +     * As QEMU's equivalent of this, we support both -smp 1 and -smp 2,
 +     * with the default being -smp 1. This seems a more intuitive UI for
 +     * QEMU users than, for instance, having a machine property to allow
 +     * the user to set the initial value of the SYSCON 0x000 register.
 +     */
 +    mc->default_cpus = 1;
 +    mc->min_cpus = 1;
 +    mc->max_cpus = 2;
      mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-r52");
      mc->valid_cpu_types = valid_cpu_types;
      mmc->raminfo = an536_raminfo;
 --
-.20.1
+.34.1

-[PULL 32/44] target/arm: Implement MVE VCTP
+[PULL 32/35] hw/arm/mps3r: Add UARTs
-Implement the MVE VCTP insn, which sets the VPR.P0 predicate bits so
+This board has a lot of UARTs: there is one UART per CPU in the
-as to predicate any element at index Rn or greater is predicated.  As
+per-CPU peripheral part of the address map, whose interrupts are
-with VPNOT, this insn itself is predicable and subject to beatwise
+connected as per-CPU interrupt lines.  Then there are 4 UARTs in the
-execution.
+normal part of the peripheral space, whose interrupts are shared
 peripheral interrupts.
-The calculation of the mask is the same as is used to determine
+Connect and wire them all up; this involves some OR gates where
-ltpmask in mve_element_mask(), but we precalculate masklen in
+multiple overflow interrupts are wired into one GIC input.
 generated code to avoid having to have 4 helpers specialized by size.
 We put the decode line in with the low-overhead-loop insns in
 t32.decode because it's logically part of that collection of insn
 patterns, even though it is an MVE only insn.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206132931.38376-11-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  2 ++
+ hw/arm/mps3r.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++
- target/arm/translate-a32.h |  1 +
+file changed, 94 insertions(+)
  target/arm/t32.decode      |  1 +
  target/arm/mve_helper.c    | 20 ++++++++++++++++++++
  target/arm/translate-mve.c |  2 +-
  target/arm/translate.c     | 33 +++++++++++++++++++++++++++++++++
 files changed, 58 insertions(+), 1 deletion(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/mps3r.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@
- DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "qapi/qmp/qlist.h"
- DEF_HELPER_FLAGS_1(mve_vpnot, TCG_CALL_NO_WG, void, env)
+ #include "exec/address-spaces.h"
+ #include "cpu.h"
-+DEF_HELPER_FLAGS_2(mve_vctp, TCG_CALL_NO_WG, void, env, i32)
++#include "sysemu/sysemu.h"
  #include "hw/boards.h"
 +#include "hw/or-irq.h"
  #include "hw/qdev-properties.h"
  #include "hw/arm/boot.h"
  #include "hw/arm/bsa.h"
 +#include "hw/char/cmsdk-apb-uart.h"
  #include "hw/intc/arm_gicv3.h"
  /* Define the layout of RAM and ROM in a board */
@@ -XXX,XX +XXX,XX @@ typedef struct RAMInfo {
  #define MPS3R_RAM_MAX 9
  #define MPS3R_CPU_MAX 2
 +#define MPS3R_UART_MAX 4 /* shared UART count */
  #define PERIPHBASE 0xf0000000
  #define NUM_SPIS 96
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
      MemoryRegion sysmem_alias[MPS3R_CPU_MAX];
      MemoryRegion cpu_ram[MPS3R_CPU_MAX];
      GICv3State gic;
 +    /* per-CPU UARTs followed by the shared UARTs */
 +    CMSDKAPBUART uart[MPS3R_CPU_MAX + MPS3R_UART_MAX];
 +    OrIRQState cpu_uart_oflow[MPS3R_CPU_MAX];
 +    OrIRQState uart_oflow;
  };
  #define TYPE_MPS3R_MACHINE "mps3r"
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
  OBJECT_DECLARE_TYPE(MPS3RMachineState, MPS3RMachineClass, MPS3R_MACHINE)
 +/*
 + * Main clock frequency CLK in Hz (50MHz). In the image there are also
 + * ACLK, MCLK, GPUCLK and PERIPHCLK at the same frequency; for our
 + * model we just roll them all into one.
 + */
 +#define CLK_FRQ 50000000
 +
- DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ static const RAMInfo an536_raminfo[] = {
- DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     {
- DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+         .name = "ATCM",
-diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
+@@ -XXX,XX +XXX,XX @@ static void create_gic(MPS3RMachineState *mms, MemoryRegion *sysmem)
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a32.h
 +++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@ long neon_element_offset(int reg, int element, MemOp memop);
  void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
  void clear_eci_state(DisasContext *s);
  bool mve_eci_check(DisasContext *s);
 +void mve_update_eci(DisasContext *s);
  void mve_update_and_store_eci(DisasContext *s);
  bool mve_skip_vmov(DisasContext *s, int vn, int index, int size);
 diff --git a/target/arm/t32.decode b/target/arm/t32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/t32.decode
 +++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@ BL               1111 0. .......... 11.1 ............         @branch24
        # This is DLSTP
        DLS        1111 0 0000 0 size:2 rn:4 1110 0000 0000 0001
      }
-+    VCTP         1111 0 0000 0 size:2 rn:4 1110 1000 0000 0001
-   ]
  }
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpnot)(CPUARMState *env)
-     mve_advance_vpt(env);
- }
 +/*
-+ * VCTP: P0 unexecuted bits unchanged, predicated bits zeroed,
++ * Create UART uartno, and map it into the MemoryRegion mem at address baseaddr.
-+ * otherwise set according to value of Rn. The calculation of
++ * The qemu_irq arguments are where we connect the various IRQs from the UART.
 + * newmask here works in the same way as the calculation of the
 + * ltpmask in mve_element_mask(), but we have pre-calculated
 + * the masklen in the generated code.
 + */
-+void HELPER(mve_vctp)(CPUARMState *env, uint32_t masklen)
++static void create_uart(MPS3RMachineState *mms, int uartno, MemoryRegion *mem,
 +                        hwaddr baseaddr, qemu_irq txirq, qemu_irq rxirq,
 +                        qemu_irq txoverirq, qemu_irq rxoverirq,
 +                        qemu_irq combirq)
 +{
-+    uint16_t mask = mve_element_mask(env);
++    g_autofree char *s = g_strdup_printf("uart%d", uartno);
-+    uint16_t eci_mask = mve_eci_mask(env);
++    SysBusDevice *sbd;
 +    uint16_t newmask;
 +
-+    assert(masklen <= 16);
++    assert(uartno < ARRAY_SIZE(mms->uart));
-+    newmask = masklen ? MAKE_64BIT_MASK(0, masklen) : 0;
++    object_initialize_child(OBJECT(mms), s, &mms->uart[uartno],
-+    newmask &= mask;
++                            TYPE_CMSDK_APB_UART);
-+    env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | (newmask & eci_mask);
++    qdev_prop_set_uint32(DEVICE(&mms->uart[uartno]), "pclk-frq", CLK_FRQ);
-+    mve_advance_vpt(env);
++    qdev_prop_set_chr(DEVICE(&mms->uart[uartno]), "chardev", serial_hd(uartno));
 +    sbd = SYS_BUS_DEVICE(&mms->uart[uartno]);
 +    sysbus_realize(sbd, &error_fatal);
 +    memory_region_add_subregion(mem, baseaddr,
 +                                sysbus_mmio_get_region(sbd, 0));
 +    sysbus_connect_irq(sbd, 0, txirq);
 +    sysbus_connect_irq(sbd, 1, rxirq);
 +    sysbus_connect_irq(sbd, 2, txoverirq);
 +    sysbus_connect_irq(sbd, 3, rxoverirq);
 +    sysbus_connect_irq(sbd, 4, combirq);
 +}
 +
- #define DO_1OP_SAT(OP, ESIZE, TYPE, FN)                                 \
+ static void mps3r_common_init(MachineState *machine)
-     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
+ {
-     {                                                                   \
+     MPS3RMachineState *mms = MPS3R_MACHINE(machine);
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+     MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
-index XXXXXXX..XXXXXXX 100644
+     MemoryRegion *sysmem = get_system_memory();
---- a/target/arm/translate-mve.c
++    DeviceState *gicdev;
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ bool mve_eci_check(DisasContext *s)
+     for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
          MemoryRegion *mr = mr_for_raminfo(mms, ri);
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
      }
- }
+     create_gic(mms, sysmem);
--static void mve_update_eci(DisasContext *s)
++    gicdev = DEVICE(&mms->gic);
 +void mve_update_eci(DisasContext *s)
  {
      /*
       * The helper function will always update the CPUState field,
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_LCTP(DisasContext *s, arg_LCTP *a)
      return true;
  }
 +static bool trans_VCTP(DisasContext *s, arg_VCTP *a)
 +{
 +    /*
 +     * M-profile Create Vector Tail Predicate. This insn is itself
 +     * predicated and is subject to beatwise execution.
 +     */
 +    TCGv_i32 rn_shifted, masklen;
 +
 +    if (!dc_isar_feature(aa32_mve, s) || a->rn == 13 || a->rn == 15) {
 +        return false;
 +    }
 +
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    /*
-+     * We pre-calculate the mask length here to avoid having
++     * UARTs 0 and 1 are per-CPU; their interrupts are wired to
-+     * to have multiple helpers specialized for size.
++     * the relevant CPU's PPI 0..3, aka INTID 16..19
 +     * We pass the helper "rn <= (1 << (4 - size)) ? (rn << size) : 16".
 +     */
-+    rn_shifted = tcg_temp_new_i32();
++    for (int i = 0; i < machine->smp.cpus; i++) {
-+    masklen = load_reg(s, a->rn);
++        int intidbase = NUM_SPIS + i * GIC_INTERNAL;
-+    tcg_gen_shli_i32(rn_shifted, masklen, a->size);
++        g_autofree char *s = g_strdup_printf("cpu-uart-oflow-orgate%d", i);
-+    tcg_gen_movcond_i32(TCG_COND_LEU, masklen,
++        DeviceState *orgate;
-+                        masklen, tcg_constant_i32(1 << (4 - a->size)),
++
-+                        rn_shifted, tcg_constant_i32(16));
++        /* The two overflow IRQs from the UART are ORed together into PPI 3 */
-+    gen_helper_mve_vctp(cpu_env, masklen);
++        object_initialize_child(OBJECT(mms), s, &mms->cpu_uart_oflow[i],
-+    tcg_temp_free_i32(masklen);
++                                TYPE_OR_IRQ);
-+    tcg_temp_free_i32(rn_shifted);
++        orgate = DEVICE(&mms->cpu_uart_oflow[i]);
-+    mve_update_eci(s);
++        qdev_prop_set_uint32(orgate, "num-lines", 2);
-+    return true;
++        qdev_realize(orgate, NULL, &error_fatal);
-+}
++        qdev_connect_gpio_out(orgate, 0,
++                              qdev_get_gpio_in(gicdev, intidbase + 19));
- static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
++
- {
++        create_uart(mms, i, &mms->cpu_sysmem[i], 0xe7c00000,
 +                    qdev_get_gpio_in(gicdev, intidbase + 17), /* tx */
 +                    qdev_get_gpio_in(gicdev, intidbase + 16), /* rx */
 +                    qdev_get_gpio_in(orgate, 0), /* txover */
 +                    qdev_get_gpio_in(orgate, 1), /* rxover */
 +                    qdev_get_gpio_in(gicdev, intidbase + 18) /* combined */);
 +    }
 +    /*
 +     * UARTs 2 to 5 are whole-system; all overflow IRQs are ORed
 +     * together into IRQ 17
 +     */
 +    object_initialize_child(OBJECT(mms), "uart-oflow-orgate",
 +                            &mms->uart_oflow, TYPE_OR_IRQ);
 +    qdev_prop_set_uint32(DEVICE(&mms->uart_oflow), "num-lines",
 +                         MPS3R_UART_MAX * 2);
 +    qdev_realize(DEVICE(&mms->uart_oflow), NULL, &error_fatal);
 +    qdev_connect_gpio_out(DEVICE(&mms->uart_oflow), 0,
 +                          qdev_get_gpio_in(gicdev, 17));
 +
 +    for (int i = 0; i < MPS3R_UART_MAX; i++) {
 +        hwaddr baseaddr = 0xe0205000 + i * 0x1000;
 +        int rxirq = 5 + i * 2, txirq = 6 + i * 2, combirq = 13 + i;
 +
 +        create_uart(mms, i + MPS3R_CPU_MAX, sysmem, baseaddr,
 +                    qdev_get_gpio_in(gicdev, txirq),
 +                    qdev_get_gpio_in(gicdev, rxirq),
 +                    qdev_get_gpio_in(DEVICE(&mms->uart_oflow), i * 2),
 +                    qdev_get_gpio_in(DEVICE(&mms->uart_oflow), i * 2 + 1),
 +                    qdev_get_gpio_in(gicdev, combirq));
 +    }
      mms->bootinfo.ram_size = machine->ram_size;
      mms->bootinfo.board_id = -1;
 --
-.20.1
+.34.1

-[PULL 34/44] target/arm: Implement MVE scatter-gather immediate forms
+[PULL 33/35] hw/arm/mps3r: Add GPIO, watchdog, dual-timer, I2C devices
-Implement the MVE VLDR/VSTR insns which do scatter-gather using base
+Add the GPIO, watchdog, dual-timer and I2C devices to the mps3-an536
-addresses from Qm plus or minus an immediate offset (possibly with
+board.  These are all simple devices that just need to be created and
-writeback). Note that writeback is not predicated but it does have
+wired up.
 to honour ECI state, so we have to add an eci_mask check to the
 VSTR_SG macros (the VLDR_SG macros already needed this to be able
 to distinguish "skip beat" from "set predicated element to 0").
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206132931.38376-12-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  5 +++
+ hw/arm/mps3r.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
- target/arm/mve.decode      | 10 +++++
+file changed, 59 insertions(+)
  target/arm/mve_helper.c    | 91 ++++++++++++++++++++++++--------------
  target/arm/translate-mve.c | 72 ++++++++++++++++++++++++++++++
 files changed, 146 insertions(+), 32 deletions(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/mps3r.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+@@ -XXX,XX +XXX,XX @@
- DEF_HELPER_FLAGS_4(mve_vstrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ #include "sysemu/sysemu.h"
- DEF_HELPER_FLAGS_4(mve_vstrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ #include "hw/boards.h"
+ #include "hw/or-irq.h"
-+DEF_HELPER_FLAGS_4(mve_vldrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++#include "hw/qdev-clock.h"
-+DEF_HELPER_FLAGS_4(mve_vldrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ #include "hw/qdev-properties.h"
-+DEF_HELPER_FLAGS_4(mve_vstrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ #include "hw/arm/boot.h"
-+DEF_HELPER_FLAGS_4(mve_vstrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ #include "hw/arm/bsa.h"
  #include "hw/char/cmsdk-apb-uart.h"
 +#include "hw/i2c/arm_sbcon_i2c.h"
  #include "hw/intc/arm_gicv3.h"
 +#include "hw/misc/unimp.h"
 +#include "hw/timer/cmsdk-apb-dualtimer.h"
 +#include "hw/watchdog/cmsdk-apb-watchdog.h"
  /* Define the layout of RAM and ROM in a board */
  typedef struct RAMInfo {
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
      CMSDKAPBUART uart[MPS3R_CPU_MAX + MPS3R_UART_MAX];
      OrIRQState cpu_uart_oflow[MPS3R_CPU_MAX];
      OrIRQState uart_oflow;
 +    CMSDKAPBWatchdog watchdog;
 +    CMSDKAPBDualTimer dualtimer;
 +    ArmSbconI2CState i2c[5];
 +    Clock *clk;
  };
  #define TYPE_MPS3R_MACHINE "mps3r"
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
      MemoryRegion *sysmem = get_system_memory();
      DeviceState *gicdev;
 +    mms->clk = clock_new(OBJECT(machine), "CLK");
 +    clock_set_hz(mms->clk, CLK_FRQ);
 +
- DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
+     for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
+         MemoryRegion *mr = mr_for_raminfo(mms, ri);
- DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
+         memory_region_add_subregion(sysmem, ri->base, mr);
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
-index XXXXXXX..XXXXXXX 100644
+                     qdev_get_gpio_in(gicdev, combirq));
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  &vmaxv qm rda size
  &vabav qn qm rda size
  &vldst_sg qd qm rn size msize os
 +&vldst_sg_imm qd qm a w imm
  # scatter-gather memory size is in bits 6:4
  %sg_msize 6:1 4:1
@@ -XXX,XX +XXX,XX @@
  @vldst_sg .... .... .... rn:4 .... ... size:2 ... ... os:1 &vldst_sg \
            qd=%qd qm=%qm msize=%sg_msize
 +# Qm is in the fields usually labeled Qn
 +@vldst_sg_imm .... .... a:1 . w:1 . .... .... .... . imm:7 &vldst_sg_imm \
 +              qd=%qd qm=%qn
 +
  @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
  @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
  @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
@@ -XXX,XX +XXX,XX @@ VLDR_S_sg        111 0 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
  VLDR_U_sg        111 1 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
  VSTR_sg          111 0 1100 1 . 00 .... ... 0 111 . .... .... @vldst_sg
 +VLDRW_sg_imm     111 1 1101 ... 1 ... 0 ... 1 1110 .... .... @vldst_sg_imm
 +VLDRD_sg_imm     111 1 1101 ... 1 ... 0 ... 1 1111 .... .... @vldst_sg_imm
 +VSTRW_sg_imm     111 1 1101 ... 0 ... 0 ... 1 1110 .... .... @vldst_sg_imm
 +VSTRD_sg_imm     111 1 1101 ... 0 ... 0 ... 1 1111 .... .... @vldst_sg_imm
 +
  # Moves between 2 32-bit vector lanes and 2 general purpose registers
  VMOV_to_2gp      1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
  VMOV_from_2gp    1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
   * For loads, predicated lanes are zeroed instead of retaining
   * their previous values.
   */
 -#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN)            \
 +#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB)        \
      void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
                            uint32_t base)                                \
      {                                                                   \
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
              addr = ADDRFN(base, m[H##ESIZE(e)]);                        \
              d[H##ESIZE(e)] = (mask & 1) ?                               \
                  cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0;         \
 +            if (WB) {                                                   \
 +                m[H##ESIZE(e)] = addr;                                  \
 +            }                                                           \
          }                                                               \
          mve_advance_vpt(env);                                           \
      }
- /* We know here TYPE is unsigned so always the same as the offset type */
++    for (int i = 0; i < 4; i++) {
--#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN)                     \
++        /* CMSDK GPIO controllers */
-+#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN, WB)                 \
++        g_autofree char *s = g_strdup_printf("gpio%d", i);
-     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
++        create_unimplemented_device(s, 0xe0000000 + i * 0x1000, 0x1000);
                            uint32_t base)                                \
      {                                                                   \
          TYPE *d = vd;                                                   \
          TYPE *m = vm;                                                   \
          uint16_t mask = mve_element_mask(env);                          \
 +        uint16_t eci_mask = mve_eci_mask(env);                          \
          unsigned e;                                                     \
          uint32_t addr;                                                  \
 -        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \
 +            if (!(eci_mask & 1)) {                                      \
 +                continue;                                               \
 +            }                                                           \
              addr = ADDRFN(base, m[H##ESIZE(e)]);                        \
              if (mask & 1) {                                             \
                  cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
              }                                                           \
 +            if (WB) {                                                   \
 +                m[H##ESIZE(e)] = addr;                                  \
 +            }                                                           \
          }                                                               \
          mve_advance_vpt(env);                                           \
      }
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
   * accesses, controlled by the predicate mask for the relevant beat,
   * and with a single 32-bit offset in the first of the two Qm elements.
   * Note that for QEMU our IMPDEF AIRCR.ENDIANNESS is always 0 (little).
 + * Address writeback happens on the odd beats and updates the address
 + * stored in the even-beat element.
   */
 -#define DO_VLDR64_SG(OP, ADDRFN)                                        \
 +#define DO_VLDR64_SG(OP, ADDRFN, WB)                                    \
      void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
                            uint32_t base)                                \
      {                                                                   \
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
              addr = ADDRFN(base, m[H4(e & ~1)]);                         \
              addr += 4 * (e & 1);                                        \
              d[H4(e)] = (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) : 0; \
 +            if (WB && (e & 1)) {                                        \
 +                m[H4(e & ~1)] = addr - 4;                               \
 +            }                                                           \
          }                                                               \
          mve_advance_vpt(env);                                           \
      }
 -#define DO_VSTR64_SG(OP, ADDRFN)                                        \
 +#define DO_VSTR64_SG(OP, ADDRFN, WB)                                    \
      void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
                            uint32_t base)                                \
      {                                                                   \
          uint32_t *d = vd;                                               \
          uint32_t *m = vm;                                               \
          uint16_t mask = mve_element_mask(env);                          \
 +        uint16_t eci_mask = mve_eci_mask(env);                          \
          unsigned e;                                                     \
          uint32_t addr;                                                  \
 -        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) {      \
 +            if (!(eci_mask & 1)) {                                      \
 +                continue;                                               \
 +            }                                                           \
              addr = ADDRFN(base, m[H4(e & ~1)]);                         \
              addr += 4 * (e & 1);                                        \
              if (mask & 1) {                                             \
                  cpu_stl_data_ra(env, addr, d[H4(e)], GETPC());          \
              }                                                           \
 +            if (WB && (e & 1)) {                                        \
 +                m[H4(e & ~1)] = addr - 4;                               \
 +            }                                                           \
          }                                                               \
          mve_advance_vpt(env);                                           \
      }
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
  #define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2))
  #define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3))
 -DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD)
 -DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD)
 -DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD)
 +DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD, false)
 -DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD)
 -DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD)
 -DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD)
 -DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD)
 -DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD)
 -DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD)
 -DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD)
 +DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, false)
 +DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD, false)
 -DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH)
 -DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH)
 -DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH)
 -DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW)
 -DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD)
 +DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH, false)
 +DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH, false)
 +DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH, false)
 +DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false)
 +DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD, false)
 -DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD)
 -DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD)
 -DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD)
 -DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD)
 -DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD)
 -DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD)
 -DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD)
 +DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD, false)
 +DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD, false)
 +DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD, false)
 +DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD, false)
 +DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD, false)
 +DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD, false)
 +DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD, false)
 -DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH)
 -DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH)
 -DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW)
 -DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD)
 +DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH, false)
 +DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH, false)
 +DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW, false)
 +DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD, false)
 +
 +DO_VLDR_SG(vldrw_sg_wb_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true)
 +DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true)
 +DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true)
 +DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true)
  /*
   * The mergemask(D, R, M) macro performs the operation "*D = R" but
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSTR_sg(DisasContext *s, arg_vldst_sg *a)
  #undef F
 +static bool do_ldst_sg_imm(DisasContext *s, arg_vldst_sg_imm *a,
 +                           MVEGenLdStSGFn *fn, unsigned msize)
 +{
 +    uint32_t offset;
 +    TCGv_ptr qd, qm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd | a->qm) ||
 +        !fn) {
 +        return false;
 +    }
 +
-+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
++    object_initialize_child(OBJECT(mms), "watchdog", &mms->watchdog,
-+        return true;
++                            TYPE_CMSDK_APB_WATCHDOG);
 +    qdev_connect_clock_in(DEVICE(&mms->watchdog), "WDOGCLK", mms->clk);
 +    sysbus_realize(SYS_BUS_DEVICE(&mms->watchdog), &error_fatal);
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->watchdog), 0,
 +                       qdev_get_gpio_in(gicdev, 0));
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->watchdog), 0, 0xe0100000);
 +
 +    object_initialize_child(OBJECT(mms), "dualtimer", &mms->dualtimer,
 +                            TYPE_CMSDK_APB_DUALTIMER);
 +    qdev_connect_clock_in(DEVICE(&mms->dualtimer), "TIMCLK", mms->clk);
 +    sysbus_realize(SYS_BUS_DEVICE(&mms->dualtimer), &error_fatal);
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 0,
 +                       qdev_get_gpio_in(gicdev, 3));
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 1,
 +                       qdev_get_gpio_in(gicdev, 1));
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 2,
 +                       qdev_get_gpio_in(gicdev, 2));
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->dualtimer), 0, 0xe0101000);
 +
 +    for (int i = 0; i < ARRAY_SIZE(mms->i2c); i++) {
 +        static const hwaddr i2cbase[] = {0xe0102000,    /* Touch */
 +                                         0xe0103000,    /* Audio */
 +                                         0xe0107000,    /* Shield0 */
 +                                         0xe0108000,    /* Shield1 */
 +                                         0xe0109000};   /* DDR4 EEPROM */
 +        g_autofree char *s = g_strdup_printf("i2c%d", i);
 +
 +        object_initialize_child(OBJECT(mms), s, &mms->i2c[i],
 +                                TYPE_ARM_SBCON_I2C);
 +        sysbus_realize(SYS_BUS_DEVICE(&mms->i2c[i]), &error_fatal);
 +        sysbus_mmio_map(SYS_BUS_DEVICE(&mms->i2c[i]), 0, i2cbase[i]);
 +        if (i != 2 && i != 3) {
 +            /*
 +             * internal-only bus: mark it full to avoid user-created
 +             * i2c devices being plugged into it.
 +             */
 +            qbus_mark_full(qdev_get_child_bus(DEVICE(&mms->i2c[i]), "i2c"));
 +        }
 +    }
 +
-+    offset = a->imm << msize;
+     mms->bootinfo.ram_size = machine->ram_size;
-+    if (!a->a) {
+     mms->bootinfo.board_id = -1;
-+        offset = -offset;
+     mms->bootinfo.loader_start = mmc->loader_start;
 +    }
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    qm = mve_qreg_ptr(a->qm);
 +    fn(cpu_env, qd, qm, tcg_constant_i32(offset));
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_ptr(qm);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +static bool trans_VLDRW_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
 +{
 +    static MVEGenLdStSGFn * const fns[] = {
 +        gen_helper_mve_vldrw_sg_uw,
 +        gen_helper_mve_vldrw_sg_wb_uw,
 +    };
 +    if (a->qd == a->qm) {
 +        return false; /* UNPREDICTABLE */
 +    }
 +    return do_ldst_sg_imm(s, a, fns[a->w], MO_32);
 +}
 +
 +static bool trans_VLDRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
 +{
 +    static MVEGenLdStSGFn * const fns[] = {
 +        gen_helper_mve_vldrd_sg_ud,
 +        gen_helper_mve_vldrd_sg_wb_ud,
 +    };
 +    if (a->qd == a->qm) {
 +        return false; /* UNPREDICTABLE */
 +    }
 +    return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
 +}
 +
 +static bool trans_VSTRW_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
 +{
 +    static MVEGenLdStSGFn * const fns[] = {
 +        gen_helper_mve_vstrw_sg_uw,
 +        gen_helper_mve_vstrw_sg_wb_uw,
 +    };
 +    return do_ldst_sg_imm(s, a, fns[a->w], MO_32);
 +}
 +
 +static bool trans_VSTRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
 +{
 +    static MVEGenLdStSGFn * const fns[] = {
 +        gen_helper_mve_vstrd_sg_ud,
 +        gen_helper_mve_vstrd_sg_wb_ud,
 +    };
 +    return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
 +}
 +
  static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
  {
      TCGv_ptr qd;
 --
-.20.1
+.34.1

-[PULL 13/44] target/arm: Implement MVE incrementing/decrementing dup insns
+[PULL 34/35] hw/arm/mps3r: Add remaining devices
-Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP,
+Add the remaining devices (or unimplemented-device stubs) for
-VIWDUP and VDWDUP.  These fill the elements of a vector with
+this board: SPI controllers, SCC, FPGAIO, I2S, RTC, the
-successively incrementing values, starting at the offset specified in
+QSPI write-config block, and ethernet.
 a general purpose register.  The final value of the offset is written
 back to this register.  The wrapping variants take a second general
 purpose register which specifies the point where the count should
 wrap back to 0.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206132931.38376-13-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  12 ++++
+ hw/arm/mps3r.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++
- target/arm/mve.decode      |  25 ++++++++
+file changed, 74 insertions(+)
  target/arm/mve_helper.c    |  63 +++++++++++++++++++
  target/arm/translate-mve.c | 120 +++++++++++++++++++++++++++++++++++++
 files changed, 220 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/mps3r.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
+@@ -XXX,XX +XXX,XX @@
+ #include "hw/char/cmsdk-apb-uart.h"
- DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
+ #include "hw/i2c/arm_sbcon_i2c.h"
+ #include "hw/intc/arm_gicv3.h"
-+DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
++#include "hw/misc/mps2-scc.h"
-+DEF_HELPER_FLAGS_4(mve_viduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
++#include "hw/misc/mps2-fpgaio.h"
-+DEF_HELPER_FLAGS_4(mve_vidupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
+ #include "hw/misc/unimp.h"
 +#include "hw/net/lan9118.h"
 +#include "hw/rtc/pl031.h"
 +#include "hw/ssi/pl022.h"
  #include "hw/timer/cmsdk-apb-dualtimer.h"
  #include "hw/watchdog/cmsdk-apb-watchdog.h"
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
      CMSDKAPBWatchdog watchdog;
      CMSDKAPBDualTimer dualtimer;
      ArmSbconI2CState i2c[5];
 +    PL022State spi[3];
 +    MPS2SCC scc;
 +    MPS2FPGAIO fpgaio;
 +    UnimplementedDeviceState i2s_audio;
 +    PL031State rtc;
      Clock *clk;
  };
@@ -XXX,XX +XXX,XX @@ static const RAMInfo an536_raminfo[] = {
      }
  };
 +static const int an536_oscclk[] = {
 +    24000000, /* 24MHz reference for RTC and timers */
 +    50000000, /* 50MHz ACLK */
 +    50000000, /* 50MHz MCLK */
 +    50000000, /* 50MHz GPUCLK */
 +    24576000, /* 24.576MHz AUDCLK */
 +    23750000, /* 23.75MHz HDLCDCLK */
 +    100000000, /* 100MHz DDR4_REF_CLK */
 +};
 +
-+DEF_HELPER_FLAGS_5(mve_viwdupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
+ static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
-+DEF_HELPER_FLAGS_5(mve_viwduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
+                                     const RAMInfo *raminfo)
-+DEF_HELPER_FLAGS_5(mve_viwdupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
+ {
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
      MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
      MemoryRegion *sysmem = get_system_memory();
      DeviceState *gicdev;
 +    QList *oscclk;
      mms->clk = clock_new(OBJECT(machine), "CLK");
      clock_set_hz(mms->clk, CLK_FRQ);
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
          }
      }
 +    for (int i = 0; i < ARRAY_SIZE(mms->spi); i++) {
 +        g_autofree char *s = g_strdup_printf("spi%d", i);
 +        hwaddr baseaddr = 0xe0104000 + i * 0x1000;
 +
-+DEF_HELPER_FLAGS_5(mve_vdwdupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
++        object_initialize_child(OBJECT(mms), s, &mms->spi[i], TYPE_PL022);
-+DEF_HELPER_FLAGS_5(mve_vdwduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
++        sysbus_realize(SYS_BUS_DEVICE(&mms->spi[i]), &error_fatal);
-+DEF_HELPER_FLAGS_5(mve_vdwdupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
++        sysbus_mmio_map(SYS_BUS_DEVICE(&mms->spi[i]), 0, baseaddr);
-+
++        sysbus_connect_irq(SYS_BUS_DEVICE(&mms->spi[i]), 0,
- DEF_HELPER_FLAGS_3(mve_vclsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
++                           qdev_get_gpio_in(gicdev, 22 + i));
  DEF_HELPER_FLAGS_3(mve_vclsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  &2scalar qd qn rm size
  &1imm qd imm cmode op
  &2shift qd qm shift size
 +&vidup qd rn size imm
 +&viwdup qd rn rm size imm
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@ VDUP             1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
  VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
  VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
 +# Incrementing and decrementing dup
 +
 +# VIDUP, VDDUP format immediate: 1 << (immh:imml)
 +%imm_vidup 7:1 0:1 !function=vidup_imm
 +
 +# VIDUP, VDDUP registers: Rm bits [3:1] from insn, bit 0 is 1;
 +# Rn bits [3:1] from insn, bit 0 is 0
 +%vidup_rm 1:3 !function=times_2_plus_1
 +%vidup_rn 17:3 !function=times_2
 +
 +@vidup           .... .... . . size:2 .... .... .... .... .... \
 +                 qd=%qd imm=%imm_vidup rn=%vidup_rn &vidup
 +@viwdup          .... .... . . size:2 .... .... .... .... .... \
 +                 qd=%qd imm=%imm_vidup rm=%vidup_rm rn=%vidup_rn &viwdup
 +{
 +  VIDUP          1110 1110 0 . .. ... 1 ... 0 1111 . 110 111 . @vidup
 +  VIWDUP         1110 1110 0 . .. ... 1 ... 0 1111 . 110 ... . @viwdup
 +}
 +{
 +  VDDUP          1110 1110 0 . .. ... 1 ... 1 1111 . 110 111 . @vidup
 +  VDWDUP         1110 1110 0 . .. ... 1 ... 1 1111 . 110 ... . @viwdup
 +}
 +
  # multiply-add long dual accumulate
  # rdahi: bits [3:1] from insn, bit 0 is 1
  # rdalo: bits [3:1] from insn, bit 0 is 0
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_sqrshr)(CPUARMState *env, uint32_t n, uint32_t shift)
  {
      return do_sqrshl_bhs(n, -(int8_t)shift, 32, true, &env->QF);
  }
 +
 +#define DO_VIDUP(OP, ESIZE, TYPE, FN)                           \
 +    uint32_t HELPER(mve_##OP)(CPUARMState *env, void *vd,       \
 +                           uint32_t offset, uint32_t imm)       \
 +    {                                                           \
 +        TYPE *d = vd;                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            mergemask(&d[H##ESIZE(e)], offset, mask);           \
 +            offset = FN(offset, imm);                           \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +        return offset;                                          \
 +    }
 +
-+#define DO_VIWDUP(OP, ESIZE, TYPE, FN)                          \
++    object_initialize_child(OBJECT(mms), "scc", &mms->scc, TYPE_MPS2_SCC);
-+    uint32_t HELPER(mve_##OP)(CPUARMState *env, void *vd,       \
++    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-cfg0", 0);
-+                              uint32_t offset, uint32_t wrap,   \
++    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-cfg4", 0x2);
-+                              uint32_t imm)                     \
++    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-aid", 0x00200008);
-+    {                                                           \
++    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-id", 0x41055360);
-+        TYPE *d = vd;                                           \
++    oscclk = qlist_new();
-+        uint16_t mask = mve_element_mask(env);                  \
++    for (int i = 0; i < ARRAY_SIZE(an536_oscclk); i++) {
-+        unsigned e;                                             \
++        qlist_append_int(oscclk, an536_oscclk[i]);
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            mergemask(&d[H##ESIZE(e)], offset, mask);           \
 +            offset = FN(offset, wrap, imm);                     \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +        return offset;                                          \
 +    }
++    qdev_prop_set_array(DEVICE(&mms->scc), "oscclk", oscclk);
++    sysbus_realize(SYS_BUS_DEVICE(&mms->scc), &error_fatal);
++    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->scc), 0, 0xe0200000);
 +
-+#define DO_VIDUP_ALL(OP, FN)                    \
++    create_unimplemented_device("i2s-audio", 0xe0201000, 0x1000);
 +    DO_VIDUP(OP##b, 1, int8_t, FN)              \
 +    DO_VIDUP(OP##h, 2, int16_t, FN)             \
 +    DO_VIDUP(OP##w, 4, int32_t, FN)
 +
-+#define DO_VIWDUP_ALL(OP, FN)                   \
++    object_initialize_child(OBJECT(mms), "fpgaio", &mms->fpgaio,
-+    DO_VIWDUP(OP##b, 1, int8_t, FN)             \
++                            TYPE_MPS2_FPGAIO);
-+    DO_VIWDUP(OP##h, 2, int16_t, FN)            \
++    qdev_prop_set_uint32(DEVICE(&mms->fpgaio), "prescale-clk", an536_oscclk[1]);
-+    DO_VIWDUP(OP##w, 4, int32_t, FN)
++    qdev_prop_set_uint32(DEVICE(&mms->fpgaio), "num-leds", 10);
 +    qdev_prop_set_bit(DEVICE(&mms->fpgaio), "has-switches", true);
 +    qdev_prop_set_bit(DEVICE(&mms->fpgaio), "has-dbgctrl", false);
 +    sysbus_realize(SYS_BUS_DEVICE(&mms->fpgaio), &error_fatal);
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->fpgaio), 0, 0xe0202000);
 +
-+static uint32_t do_add_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
++    create_unimplemented_device("clcd", 0xe0209000, 0x1000);
 +{
 +    offset += imm;
 +    if (offset == wrap) {
 +        offset = 0;
 +    }
 +    return offset;
 +}
 +
-+static uint32_t do_sub_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
++    object_initialize_child(OBJECT(mms), "rtc", &mms->rtc, TYPE_PL031);
-+{
++    sysbus_realize(SYS_BUS_DEVICE(&mms->rtc), &error_fatal);
-+    if (offset == 0) {
++    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->rtc), 0, 0xe020a000);
-+        offset = wrap;
++    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->rtc), 0,
-+    }
++                       qdev_get_gpio_in(gicdev, 4));
 +    offset -= imm;
 +    return offset;
 +}
 +
 +DO_VIDUP_ALL(vidup, DO_ADD)
 +DO_VIWDUP_ALL(viwdup, do_add_wrap)
 +DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
  #include "translate.h"
  #include "translate-a32.h"
 +static inline int vidup_imm(DisasContext *s, int x)
 +{
 +    return 1 << x;
 +}
 +
  /* Include the generated decoder */
  #include "decode-mve.c.inc"
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
  typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
 +typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
 +typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLC(DisasContext *s, arg_VSHLC *a)
      mve_update_eci(s);
      return true;
  }
 +
 +static bool do_vidup(DisasContext *s, arg_vidup *a, MVEGenVIDUPFn *fn)
 +{
 +    TCGv_ptr qd;
 +    TCGv_i32 rn;
 +
 +    /*
-+     * Vector increment/decrement with wrap and duplicate (VIDUP, VDDUP).
++     * In hardware this is a LAN9220; the LAN9118 is software compatible
-+     * This fills the vector with elements of successively increasing
++     * except that it doesn't support the checksum-offload feature.
 +     * or decreasing values, starting from Rn.
 +     */
-+    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd)) {
++    lan9118_init(0xe0300000,
-+        return false;
++                 qdev_get_gpio_in(gicdev, 18));
 +    }
 +    if (a->size == MO_64) {
 +        /* size 0b11 is another encoding */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    qd = mve_qreg_ptr(a->qd);
++    create_unimplemented_device("usb", 0xe0301000, 0x1000);
-+    rn = load_reg(s, a->rn);
++    create_unimplemented_device("qspi-write-config", 0xe0600000, 0x1000);
 +    fn(rn, cpu_env, qd, rn, tcg_constant_i32(a->imm));
 +    store_reg(s, a->rn, rn);
 +    tcg_temp_free_ptr(qd);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
-+static bool do_viwdup(DisasContext *s, arg_viwdup *a, MVEGenVIWDUPFn *fn)
+     mms->bootinfo.ram_size = machine->ram_size;
-+{
+     mms->bootinfo.board_id = -1;
-+    TCGv_ptr qd;
+     mms->bootinfo.loader_start = mmc->loader_start;
 +    TCGv_i32 rn, rm;
 +
 +    /*
 +     * Vector increment/decrement with wrap and duplicate (VIWDUp, VDWDUP)
 +     * This fills the vector with elements of successively increasing
 +     * or decreasing values, starting from Rn. Rm specifies a point where
 +     * the count wraps back around to 0. The updated offset is written back
 +     * to Rn.
 +     */
 +    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd)) {
 +        return false;
 +    }
 +    if (!fn || a->rm == 13 || a->rm == 15) {
 +        /*
 +         * size 0b11 is another encoding; Rm == 13 is UNPREDICTABLE;
 +         * Rm == 13 is VIWDUP, VDWDUP.
 +         */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    rn = load_reg(s, a->rn);
 +    rm = load_reg(s, a->rm);
 +    fn(rn, cpu_env, qd, rn, rm, tcg_constant_i32(a->imm));
 +    store_reg(s, a->rn, rn);
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_i32(rm);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +static bool trans_VIDUP(DisasContext *s, arg_vidup *a)
 +{
 +    static MVEGenVIDUPFn * const fns[] = {
 +        gen_helper_mve_vidupb,
 +        gen_helper_mve_viduph,
 +        gen_helper_mve_vidupw,
 +        NULL,
 +    };
 +    return do_vidup(s, a, fns[a->size]);
 +}
 +
 +static bool trans_VDDUP(DisasContext *s, arg_vidup *a)
 +{
 +    static MVEGenVIDUPFn * const fns[] = {
 +        gen_helper_mve_vidupb,
 +        gen_helper_mve_viduph,
 +        gen_helper_mve_vidupw,
 +        NULL,
 +    };
 +    /* VDDUP is just like VIDUP but with a negative immediate */
 +    a->imm = -a->imm;
 +    return do_vidup(s, a, fns[a->size]);
 +}
 +
 +static bool trans_VIWDUP(DisasContext *s, arg_viwdup *a)
 +{
 +    static MVEGenVIWDUPFn * const fns[] = {
 +        gen_helper_mve_viwdupb,
 +        gen_helper_mve_viwduph,
 +        gen_helper_mve_viwdupw,
 +        NULL,
 +    };
 +    return do_viwdup(s, a, fns[a->size]);
 +}
 +
 +static bool trans_VDWDUP(DisasContext *s, arg_viwdup *a)
 +{
 +    static MVEGenVIWDUPFn * const fns[] = {
 +        gen_helper_mve_vdwdupb,
 +        gen_helper_mve_vdwduph,
 +        gen_helper_mve_vdwdupw,
 +        NULL,
 +    };
 +    return do_viwdup(s, a, fns[a->size]);
 +}
 --
-.20.1
+.34.1

-[PULL 14/44] target/arm: Factor out gen_vpst()
+Deleted patch
-Factor out the "generate code to update VPR.MASK01/MASK23" part of
-trans_VPST(); we are going to want to reuse it for the VPT insns.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-mve.c | 31 +++++++++++++++++--------------
-file changed, 17 insertions(+), 14 deletions(-)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
-     return do_long_dual_acc(s, a, fns[a->x]);
- }
--static bool trans_VPST(DisasContext *s, arg_VPST *a)
-+static void gen_vpst(DisasContext *s, uint32_t mask)
- {
--    TCGv_i32 vpr;
--
--    /* mask == 0 is a "related encoding" */
--    if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
--        return false;
--    }
--    if (!mve_eci_check(s) || !vfp_access_check(s)) {
--        return true;
--    }
-     /*
-      * Set the VPR mask fields. We take advantage of MASK01 and MASK23
-      * being adjacent fields in the register.
-      *
--     * This insn is not predicated, but it is subject to beat-wise
-+     * Updating the masks is not predicated, but it is subject to beat-wise
-      * execution, and the mask is updated on the odd-numbered beats.
-      * So if PSR.ECI says we should skip beat 1, we mustn't update the
-      * 01 mask field.
-      */
--    vpr = load_cpu_field(v7m.vpr);
-+    TCGv_i32 vpr = load_cpu_field(v7m.vpr);
-     switch (s->eci) {
-     case ECI_NONE:
-     case ECI_A0:
-         /* Update both 01 and 23 fields */
-         tcg_gen_deposit_i32(vpr, vpr,
--                            tcg_constant_i32(a->mask | (a->mask << 4)),
-+                            tcg_constant_i32(mask | (mask << 4)),
-                             R_V7M_VPR_MASK01_SHIFT,
-                             R_V7M_VPR_MASK01_LENGTH + R_V7M_VPR_MASK23_LENGTH);
-         break;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
-     case ECI_A0A1A2B0:
-         /* Update only the 23 mask field */
-         tcg_gen_deposit_i32(vpr, vpr,
--                            tcg_constant_i32(a->mask),
-+                            tcg_constant_i32(mask),
-                             R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH);
-         break;
-     default:
-         g_assert_not_reached();
-     }
-     store_cpu_field(vpr, v7m.vpr);
-+}
-+
-+static bool trans_VPST(DisasContext *s, arg_VPST *a)
-+{
-+    /* mask == 0 is a "related encoding" */
-+    if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
-+        return false;
-+    }
-+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+        return true;
-+    }
-+    gen_vpst(s, a->mask);
-     mve_update_and_store_eci(s);
-     return true;
- }
---
-.20.1

-[PULL 15/44] target/arm: Implement MVE integer vector comparisons
+Deleted patch
-Implement the MVE integer vector comparison instructions.  These are
-"VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings
-T1, T2 and T3.
-These insns compare corresponding elements in each vector, and update
-the VPR.P0 predicate bits with the results of the comparison.  VPT
-also sets the VPR.MASK01 and VPR.MASK23 fields -- it is effectively
-"VCMP then VPST".
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/helper-mve.h    | 32 ++++++++++++++++++++++
- target/arm/mve.decode      | 18 +++++++++++-
- target/arm/mve_helper.c    | 56 ++++++++++++++++++++++++++++++++++++++
- target/arm/translate-mve.c | 47 ++++++++++++++++++++++++++++++++
-files changed, 152 insertions(+), 1 deletion(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_uqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
- DEF_HELPER_FLAGS_3(mve_sqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
- DEF_HELPER_FLAGS_3(mve_uqrshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
- DEF_HELPER_FLAGS_3(mve_sqrshr, TCG_CALL_NO_RWG, i32, env, i32, i32)
-+
-+DEF_HELPER_FLAGS_3(mve_vcmpeqb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpeqh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpeqw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_3(mve_vcmpneb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpneh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpnew, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_3(mve_vcmpcsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpcsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpcsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_3(mve_vcmphib, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmphih, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmphiw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_3(mve_vcmpgeb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpgeh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpgew, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_3(mve_vcmpltb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmplth, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpltw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_3(mve_vcmpgtb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpgth, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpgtw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_3(mve_vcmpleb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vcmplew, TCG_CALL_NO_WG, void, env, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@
- &2shift qd qm shift size
- &vidup qd rn size imm
- &viwdup qd rn rm size imm
-+&vcmp qm qn size mask
- @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
- # Note that both Rn and Qd are 3 bits only (no D bit)
-@@ -XXX,XX +XXX,XX @@
- @2_shr_w .... .... .. 1 ..... .... .... .... .... &2shift qd=%qd qm=%qm \
-          size=2 shift=%rshift_i5
-+# Vector comparison; 4-bit Qm but 3-bit Qn
-+%mask_22_13      22:1 13:3
-+@vcmp    .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
-+
- # Vector loads and stores
- # Widening loads and narrowing stores:
-@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
- }
- # Predicate operations
--%mask_22_13      22:1 13:3
- VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
- # Logical immediate operations (1 reg and modified-immediate)
-@@ -XXX,XX +XXX,XX @@ VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
- VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
- VSHLC             111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
-+
-+# Comparisons. We expand out the conditions which are split across
-+# encodings T1, T2, T3 and the fc bits. These include VPT, which is
-+# effectively "VCMP then VPST". A plain "VCMP" has a mask field of zero.
-+VCMPEQ            1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 0 @vcmp
-+VCMPNE            1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 0 @vcmp
-+VCMPCS            1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
-+VCMPHI            1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
-+VCMPGE            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
-+VCMPLT            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
-+VCMPGT            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
-+VCMPLE            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t do_sub_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
- DO_VIDUP_ALL(vidup, DO_ADD)
- DO_VIWDUP_ALL(viwdup, do_add_wrap)
- DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
-+
-+/*
-+ * Vector comparison.
-+ * P0 bits for non-executed beats (where eci_mask is 0) are unchanged.
-+ * P0 bits for predicated lanes in executed beats (where mask is 0) are 0.
-+ * P0 bits otherwise are updated with the results of the comparisons.
-+ * We must also keep unchanged the MASK fields at the top of v7m.vpr.
-+ */
-+#define DO_VCMP(OP, ESIZE, TYPE, FN)                                    \
-+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, void *vm)   \
-+    {                                                                   \
-+        TYPE *n = vn, *m = vm;                                          \
-+        uint16_t mask = mve_element_mask(env);                          \
-+        uint16_t eci_mask = mve_eci_mask(env);                          \
-+        uint16_t beatpred = 0;                                          \
-+        uint16_t emask = MAKE_64BIT_MASK(0, ESIZE);                     \
-+        unsigned e;                                                     \
-+        for (e = 0; e < 16 / ESIZE; e++) {                              \
-+            bool r = FN(n[H##ESIZE(e)], m[H##ESIZE(e)]);                \
-+            /* Comparison sets 0/1 bits for each byte in the element */ \
-+            beatpred |= r * emask;                                      \
-+            emask <<= ESIZE;                                            \
-+        }                                                               \
-+        beatpred &= mask;                                               \
-+        env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) |           \
-+            (beatpred & eci_mask);                                      \
-+        mve_advance_vpt(env);                                           \
-+    }
-+
-+#define DO_VCMP_S(OP, FN)                       \
-+    DO_VCMP(OP##b, 1, int8_t, FN)               \
-+    DO_VCMP(OP##h, 2, int16_t, FN)              \
-+    DO_VCMP(OP##w, 4, int32_t, FN)
-+
-+#define DO_VCMP_U(OP, FN)                       \
-+    DO_VCMP(OP##b, 1, uint8_t, FN)              \
-+    DO_VCMP(OP##h, 2, uint16_t, FN)             \
-+    DO_VCMP(OP##w, 4, uint32_t, FN)
-+
-+#define DO_EQ(N, M) ((N) == (M))
-+#define DO_NE(N, M) ((N) != (M))
-+#define DO_EQ(N, M) ((N) == (M))
-+#define DO_EQ(N, M) ((N) == (M))
-+#define DO_GE(N, M) ((N) >= (M))
-+#define DO_LT(N, M) ((N) < (M))
-+#define DO_GT(N, M) ((N) > (M))
-+#define DO_LE(N, M) ((N) <= (M))
-+
-+DO_VCMP_U(vcmpeq, DO_EQ)
-+DO_VCMP_U(vcmpne, DO_NE)
-+DO_VCMP_U(vcmpcs, DO_GE)
-+DO_VCMP_U(vcmphi, DO_GT)
-+DO_VCMP_S(vcmpge, DO_GE)
-+DO_VCMP_S(vcmplt, DO_LT)
-+DO_VCMP_S(vcmpgt, DO_GT)
-+DO_VCMP_S(vcmple, DO_LE)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
- typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
- typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
- typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
-+typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
- /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
- static inline long mve_qreg_offset(unsigned reg)
-@@ -XXX,XX +XXX,XX @@ static bool trans_VDWDUP(DisasContext *s, arg_viwdup *a)
-     };
-     return do_viwdup(s, a, fns[a->size]);
- }
-+
-+static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
-+{
-+    TCGv_ptr qn, qm;
-+
-+    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qm) ||
-+        !fn) {
-+        return false;
-+    }
-+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    qn = mve_qreg_ptr(a->qn);
-+    qm = mve_qreg_ptr(a->qm);
-+    fn(cpu_env, qn, qm);
-+    tcg_temp_free_ptr(qn);
-+    tcg_temp_free_ptr(qm);
-+    if (a->mask) {
-+        /* VPT */
-+        gen_vpst(s, a->mask);
-+    }
-+    mve_update_eci(s);
-+    return true;
-+}
-+
-+#define DO_VCMP(INSN, FN)                                       \
-+    static bool trans_##INSN(DisasContext *s, arg_vcmp *a)      \
-+    {                                                           \
-+        static MVEGenCmpFn * const fns[] = {                    \
-+            gen_helper_mve_##FN##b,                             \
-+            gen_helper_mve_##FN##h,                             \
-+            gen_helper_mve_##FN##w,                             \
-+            NULL,                                               \
-+        };                                                      \
-+        return do_vcmp(s, a, fns[a->size]);                     \
-+    }
-+
-+DO_VCMP(VCMPEQ, vcmpeq)
-+DO_VCMP(VCMPNE, vcmpne)
-+DO_VCMP(VCMPCS, vcmpcs)
-+DO_VCMP(VCMPHI, vcmphi)
-+DO_VCMP(VCMPGE, vcmpge)
-+DO_VCMP(VCMPLT, vcmplt)
-+DO_VCMP(VCMPGT, vcmpgt)
-+DO_VCMP(VCMPLE, vcmple)
---
-.20.1

-[PULL 17/44] target/arm: Implement MVE VPSEL
+[PULL 35/35] docs: Add documentation for the mps3-an536 board
-Implement the MVE VPSEL insn, which sets each byte of the destination
+Add documentation for the mps3-an536 board type.
 vector Qd to the byte from either Qn or Qm depending on the value of
 the corresponding bit in VPR.P0.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206132931.38376-14-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  2 ++
+ docs/system/arm/mps2.rst | 37 ++++++++++++++++++++++++++++++++++---
- target/arm/mve.decode      |  7 +++++--
+file changed, 34 insertions(+), 3 deletions(-)
  target/arm/mve_helper.c    | 19 +++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 28 insertions(+), 2 deletions(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/docs/system/arm/mps2.rst b/docs/system/arm/mps2.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/docs/system/arm/mps2.rst
-+++ b/target/arm/helper-mve.h
++++ b/docs/system/arm/mps2.rst
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@
- DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``, ``mps3-an547``)
- DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-=========================================================================================================================================================
++Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``, ``mps3-an536``, ``mps3-an547``)
-+DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++=========================================================================================================================================================================
 -These board models all use Arm M-profile CPUs.
 +These board models use Arm M-profile or R-profile CPUs.
  The Arm MPS2, MPS2+ and MPS3 dev boards are FPGA based (the 2+ has a
  bigger FPGA but is otherwise the same as the 2; the 3 has a bigger
@@ -XXX,XX +XXX,XX @@ FPGA image.
  QEMU models the following FPGA images:
 +FPGA images using M-profile CPUs:
 +
- DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ ``mps2-an385``
- DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+   Cortex-M3 as documented in Arm Application Note AN385
- DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ ``mps2-an386``
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+@@ -XXX,XX +XXX,XX @@ QEMU models the following FPGA images:
-index XXXXXXX..XXXXXXX 100644
+ ``mps3-an547``
---- a/target/arm/mve.decode
+   Cortex-M55 on an MPS3, as documented in Arm Application Note AN547
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VSHLC             111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
++FPGA images using R-profile CPUs:
  # effectively "VCMP then VPST". A plain "VCMP" has a mask field of zero.
  VCMPEQ            1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 0 @vcmp
  VCMPNE            1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 0 @vcmp
 -VCMPCS            1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
 -VCMPHI            1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
 +{
 +  VPSEL           1111 1110 0 . 11 ... 1 ... 0 1111 . 0 . 0 ... 1 @2op_nosz
 +  VCMPCS          1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
 +  VCMPHI          1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
 +}
  VCMPGE            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
  VCMPLT            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
  VCMPGT            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VCMP_S(vcmpge, DO_GE)
  DO_VCMP_S(vcmplt, DO_LT)
  DO_VCMP_S(vcmpgt, DO_GT)
  DO_VCMP_S(vcmple, DO_LE)
 +
-+void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
++``mps3-an536``
-+{
++  Dual Cortex-R52 on an MPS3, as documented in Arm Application Note AN536
 +    /*
 +     * Qd[n] = VPR.P0[n] ? Qn[n] : Qm[n]
 +     * but note that whether bytes are written to Qd is still subject
 +     * to (all forms of) predication in the usual way.
 +     */
 +    uint64_t *d = vd, *n = vn, *m = vm;
 +    uint16_t mask = mve_element_mask(env);
 +    uint16_t p0 = FIELD_EX32(env->v7m.vpr, V7M_VPR, P0);
 +    unsigned e;
 +    for (e = 0; e < 16 / 8; e++, mask >>= 8, p0 >>= 8) {
 +        uint64_t r = m[H8(e)];
 +        mergemask(&r, n[H8(e)], p0);
 +        mergemask(&d[H8(e)], r, mask);
 +    }
 +    mve_advance_vpt(env);
 +}
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_LOGIC(VORR, gen_helper_mve_vorr)
  DO_LOGIC(VORN, gen_helper_mve_vorn)
  DO_LOGIC(VEOR, gen_helper_mve_veor)
 +DO_LOGIC(VPSEL, gen_helper_mve_vpsel)
 +
- #define DO_2OP(INSN, FN) \
+ Differences between QEMU and real hardware:
-     static bool trans_##INSN(DisasContext *s, arg_2op *a)       \
-     {                                                           \
+ - AN385/AN386 remapping of low 16K of memory to either ZBT SSRAM1 or to
@@ -XXX,XX +XXX,XX @@ Differences between QEMU and real hardware:
    flash, but only as simple ROM, so attempting to rewrite the flash
    from the guest will fail
  - QEMU does not model the USB controller in MPS3 boards
 +- AN536 does not support runtime control of CPU reset and halt via
 +  the SCC CFG_REG0 register.
 +- AN536 does not support enabling or disabling the flash and ATCM
 +  interfaces via the SCC CFG_REG1 register.
 +- AN536 does not support setting of the initial vector table
 +  base address via the SCC CFG_REG6 and CFG_REG7 register config,
 +  and does not provide a mechanism for specifying these values at
 +  startup, so all guest images must be built to start from TCM
 +  (i.e. to expect the interrupt vector base at 0 from reset).
 +- AN536 defaults to only creating a single CPU; this is the equivalent
 +  of the way the real FPGA image usually runs with the second Cortex-R52
 +  held in halt via the initial SCC CFG_REG0 register setting. You can
 +  create the second CPU with ``-smp 2``; both CPUs will then start
 +  execution immediately on startup.
 +
 +Note that for the AN536 the first UART is accessible only by
 +CPU0, and the second UART is accessible only by CPU1. The
 +first UART accessible shared between both CPUs is the third
 +UART. Guest software might therefore be built to use either
 +the first UART or the third UART; if you don't see any output
 +from the UART you are looking at, try one of the others.
 +(Even if the AN536 machine is started with a single CPU and so
 +no "CPU1-only UART", the UART numbering remains the same,
 +with the third UART being the first of the shared ones.)
  Machine-specific options
  """"""""""""""""""""""""
 --
-.20.1
+.34.1

First set of arm patches for 6.2. I have a lot more in my
to-review queue still...

-- PMM

The following changes since commit d42685765653ec155fdf60910662f8830bdb2cef:

Open 6.2 development tree (2021-08-25 10:25:12 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210825

for you to fetch changes up to 24b1a6aa43615be22c7ee66bd68ec5675f6a6a9a:

docs: Document how to use gdb with unix sockets (2021-08-25 10:48:51 +0100)

----------------------------------------------------------------
target-arm queue:
 * More MVE emulation work
 * Implement M-profile trapping on division by zero
 * kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
 * hw/char/pl011: add support for sending break
 * fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
 * hw/dma/pl330: Add memory region to replace default
 * sbsa-ref: Rename SBSA_GWDT enum value
 * fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
 * docs: Document how to use gdb with unix sockets

----------------------------------------------------------------
Eduardo Habkost (1):
      sbsa-ref: Rename SBSA_GWDT enum value

Guenter Roeck (2):
      fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
      fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices

Hamza Mahfooz (1):
      target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()

Jan Luebbe (1):
      hw/char/pl011: add support for sending break

Peter Maydell (37):
      target/arm: Note that we handle VMOVL as a special case of VSHLL
      target/arm: Print MVE VPR in CPU dumps
      target/arm: Fix MVE VSLI by 0 and VSRI by <dt>
      target/arm: Fix signed VADDV
      target/arm: Fix mask handling for MVE narrowing operations
      target/arm: Fix 48-bit saturating shifts
      target/arm: Fix MVE 48-bit SQRSHRL for small right shifts
      target/arm: Fix calculation of LTP mask when LR is 0
      target/arm: Factor out mve_eci_mask()
      target/arm: Fix VPT advance when ECI is non-zero
      target/arm: Fix VLDRB/H/W for predicated elements
      target/arm: Implement MVE VMULL (polynomial)
      target/arm: Implement MVE incrementing/decrementing dup insns
      target/arm: Factor out gen_vpst()
      target/arm: Implement MVE integer vector comparisons
      target/arm: Implement MVE integer vector-vs-scalar comparisons
      target/arm: Implement MVE VPSEL
      target/arm: Implement MVE VMLAS
      target/arm: Implement MVE shift-by-scalar
      target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats
      target/arm: Implement MVE integer min/max across vector
      target/arm: Implement MVE VABAV
      target/arm: Implement MVE narrowing moves
      target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn
      target/arm: Implement MVE VMLADAV and VMLSLDAV
      target/arm: Implement MVE VMLA
      target/arm: Implement MVE saturating doubling multiply accumulates
      target/arm: Implement MVE VQABS, VQNEG
      target/arm: Implement MVE VMAXA, VMINA
      target/arm: Implement MVE VMOV to/from 2 general-purpose registers
      target/arm: Implement MVE VPNOT
      target/arm: Implement MVE VCTP
      target/arm: Implement MVE scatter-gather insns
      target/arm: Implement MVE scatter-gather immediate forms
      target/arm: Implement MVE interleaving loads/stores
      target/arm: Re-indent sdiv and udiv helpers
      target/arm: Implement M-profile trapping on division by zero

Sebastian Meyer (1):
      docs: Document how to use gdb with unix sockets

Wen, Jianxian (1):
      hw/dma/pl330: Add memory region to replace default

Although the architecture doesn't define it as an alias, VMOVL
(vector move long) is encoded as a VSHLL with a zero shift.
Add a comment in the decode file noting that we handle VMOVL
as part of VSHLL.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/mve.decode | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
 VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 
 # VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
+# Note that VMOVL is encoded as "VSHLL with a zero shift count"; we
+# implement it that way rather than special-casing it in the decode.
 VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
 VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
 
-- 
2.20.1

In the MVE shift-and-insert insns, we special case VSLI by 0
and VSRI by <dt>. VSRI by <dt> means "don't update the destination",
which is what we've implemented. However VSLI by 0 is "set
destination to the input", so we don't want to use the same
special-casing that we do for VSRI by <dt>.

Since the generic logic gives the right answer for a shift
by 0, just use that.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/mve_helper.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
         uint16_t mask;                                                  \
         uint64_t shiftmask;                                             \
         unsigned e;                                                     \
-        if (shift == 0 || shift == ESIZE * 8) {                         \
+        if (shift == ESIZE * 8) {                                       \
             /*                                                          \
-             * Only VSLI can shift by 0; only VSRI can shift by <dt>.   \
-             * The generic logic would give the right answer for 0 but  \
-             * fails for <dt>.                                          \
+             * Only VSRI can shift by <dt>; it should mean "don't       \
+             * update the destination". The generic logic can't handle  \
+             * this because it would try to shift by an out-of-range    \
+             * amount, so special case it here.                         \
              */                                                         \
             goto done;                                                  \
         }                                                               \
-- 
2.20.1

In the MVE helpers for the narrowing operations (DO_VSHRN and
DO_VSHRN_SAT) we were using the wrong bits of the predicate mask for
the 'top' versions of the insn.  This is because the loop works over
the double-sized input elements and shifts the predicate mask by that
many bits each time, but when we write out the half-sized output we
must look at the mask bits for whichever half of the element we are
writing to.

Correct this by shifting the whole mask right by ESIZE bits for the
'top' insns.  This allows us also to simplify the saturation bit
checking (where we had noticed that we needed to look at a different
mask bit for the 'top' insn.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/mve_helper.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VSHLL_ALL(vshllt, true)
         TYPE *d = vd;                                           \
         uint16_t mask = mve_element_mask(env);                  \
         unsigned le;                                            \
+        mask >>= ESIZE * TOP;                                   \
         for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
             TYPE r = FN(m[H##LESIZE(le)], shift);               \
             mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
@@ -XXX,XX +XXX,XX @@ static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max,
         uint16_t mask = mve_element_mask(env);                  \
         bool qc = false;                                        \
         unsigned le;                                            \
+        mask >>= ESIZE * TOP;                                   \
         for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
             bool sat = false;                                   \
             TYPE r = FN(m[H##LESIZE(le)], shift, &sat);         \
             mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
-            qc |= sat && (mask & 1 << (TOP * ESIZE));           \
+            qc |= sat & mask & 1;                               \
         }                                                       \
         if (qc) {                                               \
             env->vfp.qc[0] = qc;                                \
-- 
2.20.1

In do_sqrshl48_d() and do_uqrshl48_d() we got some of the edge
cases wrong and failed to saturate correctly:

(1) In do_sqrshl48_d() we used the same code that do_shrshl_bhs()
does to obtain the saturated most-negative and most-positive 48-bit
signed values for the large-shift-left case.  This gives (1 << 47)
for saturate-to-most-negative, but we weren't sign-extending this
value to the 64-bit output as the pseudocode requires.

(2) For left shifts by less than 48, we copied the "8/16 bit" code
from do_sqrshl_bhs() and do_uqrshl_bhs().  This doesn't do the right
thing because it assumes the C type we're working with is at least
twice the number of bits we're saturating to (so that a shift left by
bits-1 can't shift anything off the top of the value).  This isn't
true for bits == 48, so we would incorrectly return 0 rather than the
most-positive value for situations like "shift (1 << 44) right by
20".  Instead check for saturation by doing the shift and signextend
and then testing whether shifting back left again gives the original
value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/mve_helper.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
         }
         return src >> -shift;
     } else if (shift < 48) {
-        int64_t val = src << shift;
-        int64_t extval = sextract64(val, 0, 48);
-        if (!sat || val == extval) {
+        int64_t extval = sextract64(src << shift, 0, 48);
+        if (!sat || src == (extval >> shift)) {
             return extval;
         }
     } else if (!sat || src == 0) {
@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
     }
 
     *sat = 1;
-    return (1ULL << 47) - (src >= 0);
+    return src >= 0 ? MAKE_64BIT_MASK(0, 47) : MAKE_64BIT_MASK(47, 17);
 }
 
 /* Operate on 64-bit values, but saturate at 48 bits */
@@ -XXX,XX +XXX,XX @@ static inline uint64_t do_uqrshl48_d(uint64_t src, int64_t shift,
             return extval;
         }
     } else if (shift < 48) {
-        uint64_t val = src << shift;
-        uint64_t extval = extract64(val, 0, 48);
-        if (!sat || val == extval) {
+        uint64_t extval = extract64(src << shift, 0, 48);
+        if (!sat || src == (extval >> shift)) {
             return extval;
         }
     } else if (!sat || src == 0) {
-- 
2.20.1

We got an edge case wrong in the 48-bit SQRSHRL implementation: if
the shift is to the right, although it always makes the result
smaller than the input value it might not be within the 48-bit range
the result is supposed to be if the input had some bits in [63..48]
set and the shift didn't bring all of those within the [47..0] range.

Handle this similarly to the way we already do for this case in
do_uqrshl48_d(): extend the calculated result from 48 bits,
and return that if not saturating or if it doesn't change the
result; otherwise fall through to return a saturated value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/mve_helper.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t n, uint32_t shift)
 static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
                                     bool round, uint32_t *sat)
 {
+    int64_t val, extval;
+
     if (shift <= -48) {
         /* Rounding the sign bit always produces 0. */
         if (round) {
@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
     } else if (shift < 0) {
         if (round) {
             src >>= -shift - 1;
-            return (src >> 1) + (src & 1);
+            val = (src >> 1) + (src & 1);
+        } else {
+            val = src >> -shift;
+        }
+        extval = sextract64(val, 0, 48);
+        if (!sat || val == extval) {
+            return extval;
         }
-        return src >> -shift;
     } else if (shift < 48) {
         int64_t extval = sextract64(src << shift, 0, 48);
         if (!sat || src == (extval >> shift)) {
-- 
2.20.1

In mve_element_mask(), we calculate a mask for tail predication which
should have a number of 1 bits based on the value of LR.  However,
our MAKE_64BIT_MASK() macro has undefined behaviour when passed a
zero length.  Special case this to give the all-zeroes mask we
require.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/mve_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static uint16_t mve_element_mask(CPUARMState *env)
          */
         int masklen = env->regs[14] << env->v7m.ltpsize;
         assert(masklen <= 16);
-        mask &= MAKE_64BIT_MASK(0, masklen);
+        uint16_t ltpmask = masklen ? MAKE_64BIT_MASK(0, masklen) : 0;
+        mask &= ltpmask;
     }
 
     if ((env->condexec_bits & 0xf) == 0) {
-- 
2.20.1

In some situations we need a mask telling us which parts of the
vector correspond to beats that are not being executed because of
ECI, separately from the combined "which bytes are predicated away"
mask.  Factor this mask calculation out of mve_element_mask() into
its own function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/mve_helper.c | 58 ++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 24 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@
 #include "exec/exec-all.h"
 #include "tcg/tcg.h"
 
+static uint16_t mve_eci_mask(CPUARMState *env)
+{
+    /*
+     * Return the mask of which elements in the MVE vector correspond
+     * to beats being executed. The mask has 1 bits for executed lanes
+     * and 0 bits where ECI says this beat was already executed.
+     */
+    int eci;
+
+    if ((env->condexec_bits & 0xf) != 0) {
+        return 0xffff;
+    }
+
+    eci = env->condexec_bits >> 4;
+    switch (eci) {
+    case ECI_NONE:
+        return 0xffff;
+    case ECI_A0:
+        return 0xfff0;
+    case ECI_A0A1:
+        return 0xff00;
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        return 0xf000;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static uint16_t mve_element_mask(CPUARMState *env)
 {
     /*
@@ -XXX,XX +XXX,XX @@ static uint16_t mve_element_mask(CPUARMState *env)
         mask &= ltpmask;
     }
 
-    if ((env->condexec_bits & 0xf) == 0) {
-        /*
-         * ECI bits indicate which beats are already executed;
-         * we handle this by effectively predicating them out.
-         */
-        int eci = env->condexec_bits >> 4;
-        switch (eci) {
-        case ECI_NONE:
-            break;
-        case ECI_A0:
-            mask &= 0xfff0;
-            break;
-        case ECI_A0A1:
-            mask &= 0xff00;
-            break;
-        case ECI_A0A1A2:
-        case ECI_A0A1A2B0:
-            mask &= 0xf000;
-            break;
-        default:
-            g_assert_not_reached();
-        }
-    }
-
+    /*
+     * ECI bits indicate which beats are already executed;
+     * we handle this by effectively predicating them out.
+     */
+    mask &= mve_eci_mask(env);
     return mask;
 }
 
-- 
2.20.1

We were not paying attention to the ECI state when advancing the VPT
state.  Architecturally, VPT state advance happens for every beat
(see the pseudocode VPTAdvance()), so on every beat the 4 bits of
VPR.P0 corresponding to the current beat are inverted if required,
and at the end of beats 1 and 3 the VPR MASK fields are updated.
This means that if the ECI state says we should not be executing all
4 beats then we need to skip some of the updating of the VPR that we
currently do in mve_advance_vpt().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/mve_helper.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
     /* Advance the VPT and ECI state if necessary */
     uint32_t vpr = env->v7m.vpr;
     unsigned mask01, mask23;
+    uint16_t inv_mask;
+    uint16_t eci_mask = mve_eci_mask(env);
 
     if ((env->condexec_bits & 0xf) == 0) {
         env->condexec_bits = (env->condexec_bits == (ECI_A0A1A2B0 << 4)) ?
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
         return;
     }
 
+    /* Invert P0 bits if needed, but only for beats we actually executed */
     mask01 = FIELD_EX32(vpr, V7M_VPR, MASK01);
     mask23 = FIELD_EX32(vpr, V7M_VPR, MASK23);
-    if (mask01 > 8) {
-        /* high bit set, but not 0b1000: invert the relevant half of P0 */
-        vpr ^= 0xff;
+    /* Start by assuming we invert all bits corresponding to executed beats */
+    inv_mask = eci_mask;
+    if (mask01 <= 8) {
+        /* MASK01 says don't invert low half of P0 */
+        inv_mask &= ~0xff;
     }
-    if (mask23 > 8) {
-        /* high bit set, but not 0b1000: invert the relevant half of P0 */
-        vpr ^= 0xff00;
+    if (mask23 <= 8) {
+        /* MASK23 says don't invert high half of P0 */
+        inv_mask &= ~0xff00;
     }
-    vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
+    vpr ^= inv_mask;
+    /* Only update MASK01 if beat 1 executed */
+    if (eci_mask & 0xf0) {
+        vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
+    }
+    /* Beat 3 always executes, so update MASK23 */
     vpr = FIELD_DP32(vpr, V7M_VPR, MASK23, mask23 << 1);
     env->v7m.vpr = vpr;
 }
-- 
2.20.1

For vector loads, predicated elements are zeroed, instead of
retaining their previous values (as happens for most data
processing operations). This means we need to distinguish
"beat not executed due to ECI" (don't touch destination
element) from "beat executed but predicated out" (zero
destination element).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/mve_helper.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
     env->v7m.vpr = vpr;
 }
 
-
+/* For loads, predicated lanes are zeroed instead of keeping their old values */
 #define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE)                         \
     void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
     {                                                                   \
         TYPE *d = vd;                                                   \
         uint16_t mask = mve_element_mask(env);                          \
+        uint16_t eci_mask = mve_eci_mask(env);                          \
         unsigned b, e;                                                  \
         /*                                                              \
          * R_SXTM allows the dest reg to become UNKNOWN for abandoned   \
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
          * then take an exception.                                      \
          */                                                             \
         for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
-            if (mask & (1 << b)) {                                      \
-                d[H##ESIZE(e)] = cpu_##LDTYPE##_data_ra(env, addr, GETPC()); \
+            if (eci_mask & (1 << b)) {                                  \
+                d[H##ESIZE(e)] = (mask & (1 << b)) ?                    \
+                    cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0;     \
             }                                                           \
             addr += MSIZE;                                              \
         }                                                               \
-- 
2.20.1

Implement the MVE VMULL (polynomial) insn.  Unlike Neon, this comes
in two flavours: 8x8->16 and a 16x16->32.  Also unlike Neon, the
inputs are in either the low or the high half of each double-width
element.

The assembler for this insn indicates the size with "P8" or "P16",
encoded into bit 28 as size = 0 or 1. We choose to follow the
same encoding as VQDMULL and decode this into a->size as MO_16
or MO_32 indicating the size of the result elements. This then
carries through to the helper function names where it then
matches up with the existing pmull_h() which does an 8x8->16
operation and a new pmull_w() which does the 16x16->32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  5 +++++
 target/arm/vec_internal.h  | 11 +++++++++++
 target/arm/mve.decode      | 14 ++++++++++----
 target/arm/mve_helper.c    | 16 ++++++++++++++++
 target/arm/translate-mve.c | 28 ++++++++++++++++++++++++++++
 target/arm/vec_helper.c    | 14 +++++++++++++-
 6 files changed, 83 insertions(+), 5 deletions(-)

Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP,
VIWDUP and VDWDUP.  These fill the elements of a vector with
successively incrementing values, starting at the offset specified in
a general purpose register.  The final value of the offset is written
back to this register.  The wrapping variants take a second general
purpose register which specifies the point where the count should
wrap back to 0.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  12 ++++
 target/arm/mve.decode      |  25 ++++++++
 target/arm/mve_helper.c    |  63 +++++++++++++++++++
 target/arm/translate-mve.c | 120 +++++++++++++++++++++++++++++++++++++
 4 files changed, 220 insertions(+)

Factor out the "generate code to update VPR.MASK01/MASK23" part of
trans_VPST(); we are going to want to reuse it for the VPT insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-mve.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
     return do_long_dual_acc(s, a, fns[a->x]);
 }
 
-static bool trans_VPST(DisasContext *s, arg_VPST *a)
+static void gen_vpst(DisasContext *s, uint32_t mask)
 {
-    TCGv_i32 vpr;
-
-    /* mask == 0 is a "related encoding" */
-    if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
-        return false;
-    }
-    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-        return true;
-    }
     /*
      * Set the VPR mask fields. We take advantage of MASK01 and MASK23
      * being adjacent fields in the register.
      *
-     * This insn is not predicated, but it is subject to beat-wise
+     * Updating the masks is not predicated, but it is subject to beat-wise
      * execution, and the mask is updated on the odd-numbered beats.
      * So if PSR.ECI says we should skip beat 1, we mustn't update the
      * 01 mask field.
      */
-    vpr = load_cpu_field(v7m.vpr);
+    TCGv_i32 vpr = load_cpu_field(v7m.vpr);
     switch (s->eci) {
     case ECI_NONE:
     case ECI_A0:
         /* Update both 01 and 23 fields */
         tcg_gen_deposit_i32(vpr, vpr,
-                            tcg_constant_i32(a->mask | (a->mask << 4)),
+                            tcg_constant_i32(mask | (mask << 4)),
                             R_V7M_VPR_MASK01_SHIFT,
                             R_V7M_VPR_MASK01_LENGTH + R_V7M_VPR_MASK23_LENGTH);
         break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
     case ECI_A0A1A2B0:
         /* Update only the 23 mask field */
         tcg_gen_deposit_i32(vpr, vpr,
-                            tcg_constant_i32(a->mask),
+                            tcg_constant_i32(mask),
                             R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH);
         break;
     default:
         g_assert_not_reached();
     }
     store_cpu_field(vpr, v7m.vpr);
+}
+
+static bool trans_VPST(DisasContext *s, arg_VPST *a)
+{
+    /* mask == 0 is a "related encoding" */
+    if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
+        return false;
+    }
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+    gen_vpst(s, a->mask);
     mve_update_and_store_eci(s);
     return true;
 }
-- 
2.20.1

Implement the MVE integer vector comparison instructions.  These are
"VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings
T1, T2 and T3.

These insns compare corresponding elements in each vector, and update
the VPR.P0 predicate bits with the results of the comparison.  VPT
also sets the VPR.MASK01 and VPR.MASK23 fields -- it is effectively
"VCMP then VPST".

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    | 32 ++++++++++++++++++++++
 target/arm/mve.decode      | 18 +++++++++++-
 target/arm/mve_helper.c    | 56 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 47 ++++++++++++++++++++++++++++++++
 4 files changed, 152 insertions(+), 1 deletion(-)

Implement the MVE integer vector comparison instructions that compare
each element against a scalar from a general purpose register.  These
are "VCMP (vector)" encodings T4, T5 and T6 and "VPT (vector)"
encodings T4, T5 and T6.

We have to move the decodetree pattern for VPST, because it
overlaps with VCMP T4 with size = 0b11.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    | 32 +++++++++++++++++++++++++++
 target/arm/mve.decode      | 18 +++++++++++++---
 target/arm/mve_helper.c    | 44 +++++++++++++++++++++++++++++++-------
 target/arm/translate-mve.c | 43 +++++++++++++++++++++++++++++++++++++
 4 files changed, 126 insertions(+), 11 deletions(-)

Implement the MVE VPSEL insn, which sets each byte of the destination
vector Qd to the byte from either Qn or Qm depending on the value of
the corresponding bit in VPR.P0.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  2 ++
 target/arm/mve.decode      |  7 +++++--
 target/arm/mve_helper.c    | 19 +++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 28 insertions(+), 2 deletions(-)

Implement the MVE VMLAS insn, which multiplies a vector by a vector
and adds a scalar.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  4 ++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
 target/arm/translate-mve.c |  1 +
 4 files changed, 34 insertions(+)

Implement the MVE instructions which perform shifts by a scalar.
These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2.  They take the
shift amount in a general purpose register and shift every element in
the vector by that amount.

Mostly we can reuse the helper functions for shift-by-immediate; we
do need two new helpers for VQRSHL.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  8 +++++++
 target/arm/mve.decode      | 23 ++++++++++++++++---
 target/arm/mve_helper.c    |  2 ++
 target/arm/translate-mve.c | 46 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 76 insertions(+), 3 deletions(-)

All the users of the vmlaldav formats have an 'x bit in bit 12 and an
'a' bit in bit 5; move these to the format rather than specifying them
in each insn pattern.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/mve.decode | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
 
 &vmlaldav rdahi rdalo size qn qm x a
 
-@vmlaldav        .... .... . ... ... . ... . .... .... qm:3 . \
+@vmlaldav        .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
                  qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
-@vmlaldav_nosz   .... .... . ... ... . ... . .... .... qm:3 . \
+@vmlaldav_nosz   .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
                  qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
-VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
-VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
+VMLALDAV_S       1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
+VMLALDAV_U       1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
 
-VMLSLDAV         1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav
+VMLSLDAV         1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
 
-VRMLALDAVH_S     1110 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
-VRMLALDAVH_U     1111 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
+VRMLALDAVH_S     1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
+VRMLALDAVH_U     1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
 
-VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_nosz
+VRMLSLDAVH       1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
 
 # Scalar operations
 
-- 
2.20.1

Implement the MVE integer min/max across vector insns
VMAXV, VMINV, VMAXAV and VMINAV, which find the maximum
from the vector elements and a general purpose register,
and store the maximum back into the general purpose
register.

These insns overlap with VRMLALDAVH (they use what would
be RdaHi=0b110).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    | 20 ++++++++++++
 target/arm/mve.decode      | 18 +++++++++--
 target/arm/mve_helper.c    | 66 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 48 +++++++++++++++++++++++++++
 4 files changed, 150 insertions(+), 2 deletions(-)

Implement the MVE VABAV insn, which computes absolute differences
between elements of two vectors and accumulates the result into
a general purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  6 ++++++
 target/arm/mve_helper.c    | 26 +++++++++++++++++++++++
 target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+)

Implement the MVE narrowing move insns VMOVN, VQMOVN and VQMOVUN.
These take a double-width input, narrow it (possibly saturating) and
store the result to either the top or bottom half of the output
element.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    | 20 ++++++++++
 target/arm/mve.decode      | 12 ++++++
 target/arm/mve_helper.c    | 78 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 22 +++++++++++
 4 files changed, 132 insertions(+)

The MVEGenDualAccOpFn is a bit misnamed, since it is used for
the "long dual accumulate" operations that use a 64-bit
accumulator. Rename it to MVEGenLongDualAccOpFn so we can
use the former name for the 32-bit accumulator insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-mve.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
-typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
+typedef void MVEGenLongDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
 typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
 typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
 }
 
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
-                             MVEGenDualAccOpFn *fn)
+                             MVEGenLongDualAccOpFn *fn)
 {
     TCGv_ptr qn, qm;
     TCGv_i64 rda;
@@ -XXX,XX +XXX,XX @@ static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
 
 static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
 {
-    static MVEGenDualAccOpFn * const fns[4][2] = {
+    static MVEGenLongDualAccOpFn * const fns[4][2] = {
         { NULL, NULL },
         { gen_helper_mve_vmlaldavsh, gen_helper_mve_vmlaldavxsh },
         { gen_helper_mve_vmlaldavsw, gen_helper_mve_vmlaldavxsw },
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
 
 static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
 {
-    static MVEGenDualAccOpFn * const fns[4][2] = {
+    static MVEGenLongDualAccOpFn * const fns[4][2] = {
         { NULL, NULL },
         { gen_helper_mve_vmlaldavuh, NULL },
         { gen_helper_mve_vmlaldavuw, NULL },
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
 
 static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
 {
-    static MVEGenDualAccOpFn * const fns[4][2] = {
+    static MVEGenLongDualAccOpFn * const fns[4][2] = {
         { NULL, NULL },
         { gen_helper_mve_vmlsldavsh, gen_helper_mve_vmlsldavxsh },
         { gen_helper_mve_vmlsldavsw, gen_helper_mve_vmlsldavxsw },
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
 
 static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
 {
-    static MVEGenDualAccOpFn * const fns[] = {
+    static MVEGenLongDualAccOpFn * const fns[] = {
         gen_helper_mve_vrmlaldavhsw, gen_helper_mve_vrmlaldavhxsw,
     };
     return do_long_dual_acc(s, a, fns[a->x]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
 
 static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
 {
-    static MVEGenDualAccOpFn * const fns[] = {
+    static MVEGenLongDualAccOpFn * const fns[] = {
         gen_helper_mve_vrmlaldavhuw, NULL,
     };
     return do_long_dual_acc(s, a, fns[a->x]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
 
 static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
 {
-    static MVEGenDualAccOpFn * const fns[] = {
+    static MVEGenLongDualAccOpFn * const fns[] = {
         gen_helper_mve_vrmlsldavhsw, gen_helper_mve_vrmlsldavhxsw,
     };
     return do_long_dual_acc(s, a, fns[a->x]);
-- 
2.20.1

Implement the MVE VMLADAV and VMLSLDAV insns.  Like the VMLALDAV and
VMLSLDAV insns already implemented, these accumulate multiplied
vector elements; but they accumulate a 32-bit result rather than a
64-bit one.

Note that these encodings overlap with what would be RdaHi=0b111 for
VMLALDAV, VMLSLDAV, VRMLALDAVH and VRMLSLDAVH.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    | 17 ++++++++++
 target/arm/mve.decode      | 33 +++++++++++++++++---
 target/arm/mve_helper.c    | 41 ++++++++++++++++++++++++
 target/arm/translate-mve.c | 64 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 150 insertions(+), 5 deletions(-)

Implement the MVE VMLA insn, which multiplies a vector by a scalar
and accumulates into another vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    | 4 ++++
 target/arm/mve.decode      | 1 +
 target/arm/mve_helper.c    | 5 +++++
 target/arm/translate-mve.c | 1 +
 4 files changed, 11 insertions(+)

Implement the MVE saturating doubling multiply accumulate insns
VQDMLAH, VQRDMLAH, VQDMLASH and VQRDMLASH.  These perform a multiply,
double, add the accumulator shifted by the element size, possibly
round, saturate to twice the element size, then take the high half of
the result.  The *MLAH insns do vector * scalar + vector, and the
*MLASH insns do vector * vector + scalar.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    | 16 +++++++
 target/arm/mve.decode      |  5 ++
 target/arm/mve_helper.c    | 95 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++
 4 files changed, 120 insertions(+)

Implement the MVE 1-operand saturating operations VQABS and VQNEG.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 37 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 50 insertions(+)

Implement the MVE VMAXA and VMINA insns, which take the absolute
value of the signed elements in the input vector and then accumulate
the unsigned max or min into the destination vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  4 ++++
 target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 40 insertions(+)

Implement the MVE VMOV forms that move data between 2 general-purpose
registers and 2 32-bit lanes in a vector register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a32.h |  1 +
 target/arm/mve.decode      |  4 ++
 target/arm/translate-mve.c | 85 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-vfp.c |  2 +-
 4 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@ void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
 void clear_eci_state(DisasContext *s);
 bool mve_eci_check(DisasContext *s);
 void mve_update_and_store_eci(DisasContext *s);
+bool mve_skip_vmov(DisasContext *s, int vn, int index, int size);
 
 static inline TCGv_i32 load_cpu_offset(int offset)
 {
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
 VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
                  size=2 p=1
 
+# Moves between 2 32-bit vector lanes and 2 general purpose registers
+VMOV_to_2gp      1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
+VMOV_from_2gp    1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
+
 # Vector 2-op
 VAND             1110 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 VBIC             1110 1111 0 . 01 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool do_vabav(DisasContext *s, arg_vabav *a, MVEGenVABAVFn *fn)
 
 DO_VABAV(VABAV_S, vabavs)
 DO_VABAV(VABAV_U, vabavu)
+
+static bool trans_VMOV_to_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
+{
+    /*
+     * VMOV two 32-bit vector lanes to two general-purpose registers.
+     * This insn is not predicated but it is subject to beat-wise
+     * execution if it is not in an IT block. For us this means
+     * only that if PSR.ECI says we should not be executing the beat
+     * corresponding to the lane of the vector register being accessed
+     * then we should skip perfoming the move, and that we need to do
+     * the usual check for bad ECI state and advance of ECI state.
+     * (If PSR.ECI is non-zero then we cannot be in an IT block.)
+     */
+    TCGv_i32 tmp;
+    int vd;
+
+    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
+        a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15 ||
+        a->rt == a->rt2) {
+        /* Rt/Rt2 cases are UNPREDICTABLE */
+        return false;
+    }
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    /* Convert Qreg index to Dreg for read_neon_element32() etc */
+    vd = a->qd * 2;
+
+    if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
+        tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, vd, a->idx, MO_32);
+        store_reg(s, a->rt, tmp);
+    }
+    if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
+        tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, vd + 1, a->idx, MO_32);
+        store_reg(s, a->rt2, tmp);
+    }
+
+    mve_update_and_store_eci(s);
+    return true;
+}
+
+static bool trans_VMOV_from_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
+{
+    /*
+     * VMOV two general-purpose registers to two 32-bit vector lanes.
+     * This insn is not predicated but it is subject to beat-wise
+     * execution if it is not in an IT block. For us this means
+     * only that if PSR.ECI says we should not be executing the beat
+     * corresponding to the lane of the vector register being accessed
+     * then we should skip perfoming the move, and that we need to do
+     * the usual check for bad ECI state and advance of ECI state.
+     * (If PSR.ECI is non-zero then we cannot be in an IT block.)
+     */
+    TCGv_i32 tmp;
+    int vd;
+
+    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
+        a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15) {
+        /* Rt/Rt2 cases are UNPREDICTABLE */
+        return false;
+    }
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    /* Convert Qreg idx to Dreg for read_neon_element32() etc */
+    vd = a->qd * 2;
+
+    if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
+        tmp = load_reg(s, a->rt);
+        write_neon_element32(tmp, vd, a->idx, MO_32);
+        tcg_temp_free_i32(tmp);
+    }
+    if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
+        tmp = load_reg(s, a->rt2);
+        write_neon_element32(tmp, vd + 1, a->idx, MO_32);
+        tcg_temp_free_i32(tmp);
+    }
+
+    mve_update_and_store_eci(s);
+    return true;
+}
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
     return true;
 }
 
-static bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
+bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
 {
     /*
      * In a CPU with MVE, the VMOV (vector lane to general-purpose register)
-- 
2.20.1

Implement the MVE VPNOT insn, which inverts the bits in VPR.P0
(subject to both predication and to beatwise execution).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  1 +
 target/arm/mve.decode      |  1 +
 target/arm/mve_helper.c    | 17 +++++++++++++++++
 target/arm/translate-mve.c | 19 +++++++++++++++++++
 4 files changed, 38 insertions(+)

Implement the MVE VCTP insn, which sets the VPR.P0 predicate bits so
as to predicate any element at index Rn or greater is predicated.  As
with VPNOT, this insn itself is predicable and subject to beatwise
execution.

The calculation of the mask is the same as is used to determine
ltpmask in mve_element_mask(), but we precalculate masklen in
generated code to avoid having to have 4 helpers specialized by size.

We put the decode line in with the low-overhead-loop insns in
t32.decode because it's logically part of that collection of insn
patterns, even though it is an MVE only insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  2 ++
 target/arm/translate-a32.h |  1 +
 target/arm/t32.decode      |  1 +
 target/arm/mve_helper.c    | 20 ++++++++++++++++++++
 target/arm/translate-mve.c |  2 +-
 target/arm/translate.c     | 33 +++++++++++++++++++++++++++++++++
 6 files changed, 58 insertions(+), 1 deletion(-)

Implement the MVE gather-loads and scatter-stores which
form the address by adding a base value from a scalar
register to an offset in each element of a vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  32 +++++++++
 target/arm/mve.decode      |  12 ++++
 target/arm/mve_helper.c    | 129 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  97 ++++++++++++++++++++++++++++
 4 files changed, 270 insertions(+)

Implement the MVE VLDR/VSTR insns which do scatter-gather using base
addresses from Qm plus or minus an immediate offset (possibly with
writeback). Note that writeback is not predicated but it does have
to honour ECI state, so we have to add an eci_mask check to the
VSTR_SG macros (the VLDR_SG macros already needed this to be able
to distinguish "skip beat" from "set predicated element to 0").

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  5 +++
 target/arm/mve.decode      | 10 +++++
 target/arm/mve_helper.c    | 91 ++++++++++++++++++++++++--------------
 target/arm/translate-mve.c | 72 ++++++++++++++++++++++++++++++
 4 files changed, 146 insertions(+), 32 deletions(-)

Implement the MVE interleaving load/store functions VLD2, VLD4, VST2
and VST4.  VLD2 loads 16 bytes of data from memory and writes to 2
consecutive Qregs; VLD4 loads 16 bytes of data from memory and writes
to 4 consecutive Qregs.  The 'pattern' field in the encoding
determines the offset into memory which is accessed and also which
elements in the Qregs are written to.  (The intention is that a
sequence of four consecutive VLD4 with different pattern values
performs a complete de-interleaving load of 64 bytes into all
elements of the 4 Qregs.) VST2 and VST4 do the same, but for stores.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-mve.h    |  48 ++++++
 target/arm/mve.decode      |  11 ++
 target/arm/mve_helper.c    | 342 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  94 ++++++++++
 4 files changed, 495 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vldrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vstrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vstrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(mve_vld20b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld20h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld20w, TCG_CALL_NO_WG, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vld21b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld21h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld21w, TCG_CALL_NO_WG, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vld40b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld40h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld40w, TCG_CALL_NO_WG, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vld41b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld41h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld41w, TCG_CALL_NO_WG, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vld42b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld42h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld42w, TCG_CALL_NO_WG, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vld43b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld43h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vld43w, TCG_CALL_NO_WG, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vst20b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst20h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst20w, TCG_CALL_NO_WG, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vst21b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst21h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst21w, TCG_CALL_NO_WG, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vst40b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst40h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst40w, TCG_CALL_NO_WG, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vst41b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst41h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst41w, TCG_CALL_NO_WG, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vst42b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst42h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst42w, TCG_CALL_NO_WG, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vst43b, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst43h, TCG_CALL_NO_WG, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(mve_vst43w, TCG_CALL_NO_WG, void, env, i32, i32)
+
 DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
 
 DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 &vabav qn qm rda size
 &vldst_sg qd qm rn size msize os
 &vldst_sg_imm qd qm a w imm
+&vldst_il qd rn size pat w
 
 # scatter-gather memory size is in bits 6:4
 %sg_msize 6:1 4:1
@@ -XXX,XX +XXX,XX @@
 @vldst_sg_imm .... .... a:1 . w:1 . .... .... .... . imm:7 &vldst_sg_imm \
               qd=%qd qm=%qn
 
+# Deinterleaving load/interleaving store
+@vldst_il .... .... .. w:1 . rn:4 .... ... size:2 pat:2 ..... &vldst_il \
+          qd=%qd
+
 @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
 @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
 @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
@@ -XXX,XX +XXX,XX @@ VLDRD_sg_imm     111 1 1101 ... 1 ... 0 ... 1 1111 .... .... @vldst_sg_imm
 VSTRW_sg_imm     111 1 1101 ... 0 ... 0 ... 1 1110 .... .... @vldst_sg_imm
 VSTRD_sg_imm     111 1 1101 ... 0 ... 0 ... 1 1111 .... .... @vldst_sg_imm
 
+# deinterleaving loads/interleaving stores
+VLD2             1111 1100 1 .. 1 .... ... 1 111 .. .. 00000 @vldst_il
+VLD4             1111 1100 1 .. 1 .... ... 1 111 .. .. 00001 @vldst_il
+VST2             1111 1100 1 .. 0 .... ... 1 111 .. .. 00000 @vldst_il
+VST4             1111 1100 1 .. 0 .... ... 1 111 .. .. 00001 @vldst_il
+
 # Moves between 2 32-bit vector lanes and 2 general purpose registers
 VMOV_to_2gp      1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
 VMOV_from_2gp    1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true)
 DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true)
 DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true)
 
+/*
+ * Deinterleaving loads/interleaving stores.
+ *
+ * For these helpers we are passed the index of the first Qreg
+ * (VLD2/VST2 will also access Qn+1, VLD4/VST4 access Qn .. Qn+3)
+ * and the value of the base address register Rn.
+ * The helpers are specialized for pattern and element size, so
+ * for instance vld42h is VLD4 with pattern 2, element size MO_16.
+ *
+ * These insns are beatwise but not predicated, so we must honour ECI,
+ * but need not look at mve_element_mask().
+ *
+ * The pseudocode implements these insns with multiple memory accesses
+ * of the element size, but rules R_VVVG and R_FXDM permit us to make
+ * one 32-bit memory access per beat.
+ */
+#define DO_VLD4B(OP, O1, O2, O3, O4)                                    \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat, e;                                                    \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
+        uint32_t addr, data;                                            \
+        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat] * 4;                                \
+            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
+            for (e = 0; e < 4; e++, data >>= 8) {                       \
+                uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \
+                qd[H1(off[beat])] = data;                               \
+            }                                                           \
+        }                                                               \
+    }
+
+#define DO_VLD4H(OP, O1, O2)                                            \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat;                                                       \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O1, O2, O2 };               \
+        uint32_t addr, data;                                            \
+        int y; /* y counts 0 2 0 2 */                                   \
+        uint16_t *qd;                                                   \
+        for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) {   \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat] * 8 + (beat & 1) * 4;               \
+            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
+            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y);             \
+            qd[H2(off[beat])] = data;                                   \
+            data >>= 16;                                                \
+            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1);         \
+            qd[H2(off[beat])] = data;                                   \
+        }                                                               \
+    }
+
+#define DO_VLD4W(OP, O1, O2, O3, O4)                                    \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat;                                                       \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
+        uint32_t addr, data;                                            \
+        uint32_t *qd;                                                   \
+        int y;                                                          \
+        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat] * 4;                                \
+            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
+            y = (beat + (O1 & 2)) & 3;                                  \
+            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y);             \
+            qd[H4(off[beat] >> 2)] = data;                              \
+        }                                                               \
+    }
+
+DO_VLD4B(vld40b, 0, 1, 10, 11)
+DO_VLD4B(vld41b, 2, 3, 12, 13)
+DO_VLD4B(vld42b, 4, 5, 14, 15)
+DO_VLD4B(vld43b, 6, 7, 8, 9)
+
+DO_VLD4H(vld40h, 0, 5)
+DO_VLD4H(vld41h, 1, 6)
+DO_VLD4H(vld42h, 2, 7)
+DO_VLD4H(vld43h, 3, 4)
+
+DO_VLD4W(vld40w, 0, 1, 10, 11)
+DO_VLD4W(vld41w, 2, 3, 12, 13)
+DO_VLD4W(vld42w, 4, 5, 14, 15)
+DO_VLD4W(vld43w, 6, 7, 8, 9)
+
+#define DO_VLD2B(OP, O1, O2, O3, O4)                                    \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat, e;                                                    \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
+        uint32_t addr, data;                                            \
+        uint8_t *qd;                                                    \
+        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat] * 2;                                \
+            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
+            for (e = 0; e < 4; e++, data >>= 8) {                       \
+                qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1));    \
+                qd[H1(off[beat] + (e >> 1))] = data;                    \
+            }                                                           \
+        }                                                               \
+    }
+
+#define DO_VLD2H(OP, O1, O2, O3, O4)                                    \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat;                                                       \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
+        uint32_t addr, data;                                            \
+        int e;                                                          \
+        uint16_t *qd;                                                   \
+        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat] * 4;                                \
+            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
+            for (e = 0; e < 2; e++, data >>= 16) {                      \
+                qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e);         \
+                qd[H2(off[beat])] = data;                               \
+            }                                                           \
+        }                                                               \
+    }
+
+#define DO_VLD2W(OP, O1, O2, O3, O4)                                    \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat;                                                       \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
+        uint32_t addr, data;                                            \
+        uint32_t *qd;                                                   \
+        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat];                                    \
+            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
+            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1));    \
+            qd[H4(off[beat] >> 3)] = data;                              \
+        }                                                               \
+    }
+
+DO_VLD2B(vld20b, 0, 2, 12, 14)
+DO_VLD2B(vld21b, 4, 6, 8, 10)
+
+DO_VLD2H(vld20h, 0, 1, 6, 7)
+DO_VLD2H(vld21h, 2, 3, 4, 5)
+
+DO_VLD2W(vld20w, 0, 4, 24, 28)
+DO_VLD2W(vld21w, 8, 12, 16, 20)
+
+#define DO_VST4B(OP, O1, O2, O3, O4)                                    \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat, e;                                                    \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
+        uint32_t addr, data;                                            \
+        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat] * 4;                                \
+            data = 0;                                                   \
+            for (e = 3; e >= 0; e--) {                                  \
+                uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \
+                data = (data << 8) | qd[H1(off[beat])];                 \
+            }                                                           \
+            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
+        }                                                               \
+    }
+
+#define DO_VST4H(OP, O1, O2)                                            \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat;                                                       \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O1, O2, O2 };               \
+        uint32_t addr, data;                                            \
+        int y; /* y counts 0 2 0 2 */                                   \
+        uint16_t *qd;                                                   \
+        for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) {   \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat] * 8 + (beat & 1) * 4;               \
+            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y);             \
+            data = qd[H2(off[beat])];                                   \
+            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1);         \
+            data |= qd[H2(off[beat])] << 16;                            \
+            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
+        }                                                               \
+    }
+
+#define DO_VST4W(OP, O1, O2, O3, O4)                                    \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat;                                                       \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
+        uint32_t addr, data;                                            \
+        uint32_t *qd;                                                   \
+        int y;                                                          \
+        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat] * 4;                                \
+            y = (beat + (O1 & 2)) & 3;                                  \
+            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y);             \
+            data = qd[H4(off[beat] >> 2)];                              \
+            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
+        }                                                               \
+    }
+
+DO_VST4B(vst40b, 0, 1, 10, 11)
+DO_VST4B(vst41b, 2, 3, 12, 13)
+DO_VST4B(vst42b, 4, 5, 14, 15)
+DO_VST4B(vst43b, 6, 7, 8, 9)
+
+DO_VST4H(vst40h, 0, 5)
+DO_VST4H(vst41h, 1, 6)
+DO_VST4H(vst42h, 2, 7)
+DO_VST4H(vst43h, 3, 4)
+
+DO_VST4W(vst40w, 0, 1, 10, 11)
+DO_VST4W(vst41w, 2, 3, 12, 13)
+DO_VST4W(vst42w, 4, 5, 14, 15)
+DO_VST4W(vst43w, 6, 7, 8, 9)
+
+#define DO_VST2B(OP, O1, O2, O3, O4)                                    \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat, e;                                                    \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
+        uint32_t addr, data;                                            \
+        uint8_t *qd;                                                    \
+        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat] * 2;                                \
+            data = 0;                                                   \
+            for (e = 3; e >= 0; e--) {                                  \
+                qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1));    \
+                data = (data << 8) | qd[H1(off[beat] + (e >> 1))];      \
+            }                                                           \
+            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
+        }                                                               \
+    }
+
+#define DO_VST2H(OP, O1, O2, O3, O4)                                    \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat;                                                       \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
+        uint32_t addr, data;                                            \
+        int e;                                                          \
+        uint16_t *qd;                                                   \
+        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat] * 4;                                \
+            data = 0;                                                   \
+            for (e = 1; e >= 0; e--) {                                  \
+                qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e);         \
+                data = (data << 16) | qd[H2(off[beat])];                \
+            }                                                           \
+            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
+        }                                                               \
+    }
+
+#define DO_VST2W(OP, O1, O2, O3, O4)                                    \
+    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
+                          uint32_t base)                                \
+    {                                                                   \
+        int beat;                                                       \
+        uint16_t mask = mve_eci_mask(env);                              \
+        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
+        uint32_t addr, data;                                            \
+        uint32_t *qd;                                                   \
+        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
+            if ((mask & 1) == 0) {                                      \
+                /* ECI says skip this beat */                           \
+                continue;                                               \
+            }                                                           \
+            addr = base + off[beat];                                    \
+            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1));    \
+            data = qd[H4(off[beat] >> 3)];                              \
+            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
+        }                                                               \
+    }
+
+DO_VST2B(vst20b, 0, 2, 12, 14)
+DO_VST2B(vst21b, 4, 6, 8, 10)
+
+DO_VST2H(vst20h, 0, 1, 6, 7)
+DO_VST2H(vst21h, 2, 3, 4, 5)
+
+DO_VST2W(vst20w, 0, 4, 24, 28)
+DO_VST2W(vst21w, 8, 12, 16, 20)
+
 /*
  * The mergemask(D, R, M) macro performs the operation "*D = R" but
  * storing only the bytes which correspond to 1 bits in M,
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static inline int vidup_imm(DisasContext *s, int x)
 
 typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenLdStSGFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void MVEGenLdStIlFn(TCGv_ptr, TCGv_i32, TCGv_i32);
 typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSTRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
     return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
 }
 
+static bool do_vldst_il(DisasContext *s, arg_vldst_il *a, MVEGenLdStIlFn *fn,
+                        int addrinc)
+{
+    TCGv_i32 rn;
+
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !mve_check_qreg_bank(s, a->qd) ||
+        !fn || (a->rn == 13 && a->w) || a->rn == 15) {
+        /* Variously UNPREDICTABLE or UNDEF or related-encoding */
+        return false;
+    }
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    rn = load_reg(s, a->rn);
+    /*
+     * We pass the index of Qd, not a pointer, because the helper must
+     * access multiple Q registers starting at Qd and working up.
+     */
+    fn(cpu_env, tcg_constant_i32(a->qd), rn);
+
+    if (a->w) {
+        tcg_gen_addi_i32(rn, rn, addrinc);
+        store_reg(s, a->rn, rn);
+    } else {
+        tcg_temp_free_i32(rn);
+    }
+    mve_update_and_store_eci(s);
+    return true;
+}
+
+/* This macro is just to make the arrays more compact in these functions */
+#define F(N) gen_helper_mve_##N
+
+static bool trans_VLD2(DisasContext *s, arg_vldst_il *a)
+{
+    static MVEGenLdStIlFn * const fns[4][4] = {
+        { F(vld20b), F(vld20h), F(vld20w), NULL, },
+        { F(vld21b), F(vld21h), F(vld21w), NULL, },
+        { NULL, NULL, NULL, NULL },
+        { NULL, NULL, NULL, NULL },
+    };
+    if (a->qd > 6) {
+        return false;
+    }
+    return do_vldst_il(s, a, fns[a->pat][a->size], 32);
+}
+
+static bool trans_VLD4(DisasContext *s, arg_vldst_il *a)
+{
+    static MVEGenLdStIlFn * const fns[4][4] = {
+        { F(vld40b), F(vld40h), F(vld40w), NULL, },
+        { F(vld41b), F(vld41h), F(vld41w), NULL, },
+        { F(vld42b), F(vld42h), F(vld42w), NULL, },
+        { F(vld43b), F(vld43h), F(vld43w), NULL, },
+    };
+    if (a->qd > 4) {
+        return false;
+    }
+    return do_vldst_il(s, a, fns[a->pat][a->size], 64);
+}
+
+static bool trans_VST2(DisasContext *s, arg_vldst_il *a)
+{
+    static MVEGenLdStIlFn * const fns[4][4] = {
+        { F(vst20b), F(vst20h), F(vst20w), NULL, },
+        { F(vst21b), F(vst21h), F(vst21w), NULL, },
+        { NULL, NULL, NULL, NULL },
+        { NULL, NULL, NULL, NULL },
+    };
+    if (a->qd > 6) {
+        return false;
+    }
+    return do_vldst_il(s, a, fns[a->pat][a->size], 32);
+}
+
+static bool trans_VST4(DisasContext *s, arg_vldst_il *a)
+{
+    static MVEGenLdStIlFn * const fns[4][4] = {
+        { F(vst40b), F(vst40h), F(vst40w), NULL, },
+        { F(vst41b), F(vst41h), F(vst41w), NULL, },
+        { F(vst42b), F(vst42h), F(vst42w), NULL, },
+        { F(vst43b), F(vst43h), F(vst43w), NULL, },
+    };
+    if (a->qd > 4) {
+        return false;
+    }
+    return do_vldst_il(s, a, fns[a->pat][a->size], 64);
+}
+
+#undef F
+
 static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
 {
     TCGv_ptr qd;
-- 
2.20.1

We're about to make a code change to the sdiv and udiv helper
functions, so first fix their indentation and coding style.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210730151636.17254-2-peter.maydell@linaro.org
---
 target/arm/helper.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(uxtb16)(uint32_t x)
 
 int32_t HELPER(sdiv)(int32_t num, int32_t den)
 {
-    if (den == 0)
-      return 0;
-    if (num == INT_MIN && den == -1)
-      return INT_MIN;
+    if (den == 0) {
+        return 0;
+    }
+    if (num == INT_MIN && den == -1) {
+        return INT_MIN;
+    }
     return num / den;
 }
 
 uint32_t HELPER(udiv)(uint32_t num, uint32_t den)
 {
-    if (den == 0)
-      return 0;
+    if (den == 0) {
+        return 0;
+    }
     return num / den;
 }
 
-- 
2.20.1

Unlike A-profile, for M-profile the UDIV and SDIV insns can be
configured to raise an exception on division by zero, using the CCR
DIV_0_TRP bit.

Implement support for setting this bit by making the helper functions
raise the appropriate exception.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210730151636.17254-3-peter.maydell@linaro.org
---
 target/arm/cpu.h       |  1 +
 target/arm/helper.h    |  4 ++--
 target/arm/helper.c    | 19 +++++++++++++++++--
 target/arm/m_helper.c  |  4 ++++
 target/arm/translate.c |  4 ++--
 5 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@
 #define EXCP_LAZYFP         20   /* v7M fault during lazy FP stacking */
 #define EXCP_LSERR          21   /* v8M LSERR SecureFault */
 #define EXCP_UNALIGNED      22   /* v7M UNALIGNED UsageFault */
+#define EXCP_DIVBYZERO      23   /* v7M DIVBYZERO UsageFault */
 /* NB: add new EXCP_ defines to the array in arm_log_exception() too */
 
 #define ARMV7M_EXCP_RESET   1
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(add_saturate, i32, env, i32, i32)
 DEF_HELPER_3(sub_saturate, i32, env, i32, i32)
 DEF_HELPER_3(add_usaturate, i32, env, i32, i32)
 DEF_HELPER_3(sub_usaturate, i32, env, i32, i32)
-DEF_HELPER_FLAGS_2(sdiv, TCG_CALL_NO_RWG_SE, s32, s32, s32)
-DEF_HELPER_FLAGS_2(udiv, TCG_CALL_NO_RWG_SE, i32, i32, i32)
+DEF_HELPER_FLAGS_3(sdiv, TCG_CALL_NO_RWG, s32, env, s32, s32)
+DEF_HELPER_FLAGS_3(udiv, TCG_CALL_NO_RWG, i32, env, i32, i32)
 DEF_HELPER_FLAGS_1(rbit, TCG_CALL_NO_RWG_SE, i32, i32)
 
 #define PAS_OP(pfx)  \
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sxtb16)(uint32_t x)
     return res;
 }
 
+static void handle_possible_div0_trap(CPUARMState *env, uintptr_t ra)
+{
+    /*
+     * Take a division-by-zero exception if necessary; otherwise return
+     * to get the usual non-trapping division behaviour (result of 0)
+     */
+    if (arm_feature(env, ARM_FEATURE_M)
+        && (env->v7m.ccr[env->v7m.secure] & R_V7M_CCR_DIV_0_TRP_MASK)) {
+        raise_exception_ra(env, EXCP_DIVBYZERO, 0, 1, ra);
+    }
+}
+
 uint32_t HELPER(uxtb16)(uint32_t x)
 {
     uint32_t res;
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(uxtb16)(uint32_t x)
     return res;
 }
 
-int32_t HELPER(sdiv)(int32_t num, int32_t den)
+int32_t HELPER(sdiv)(CPUARMState *env, int32_t num, int32_t den)
 {
     if (den == 0) {
+        handle_possible_div0_trap(env, GETPC());
         return 0;
     }
     if (num == INT_MIN && den == -1) {
@@ -XXX,XX +XXX,XX @@ int32_t HELPER(sdiv)(int32_t num, int32_t den)
     return num / den;
 }
 
-uint32_t HELPER(udiv)(uint32_t num, uint32_t den)
+uint32_t HELPER(udiv)(CPUARMState *env, uint32_t num, uint32_t den)
 {
     if (den == 0) {
+        handle_possible_div0_trap(env, GETPC());
         return 0;
     }
     return num / den;
@@ -XXX,XX +XXX,XX @@ void arm_log_exception(int idx)
             [EXCP_LAZYFP] = "v7M exception during lazy FP stacking",
             [EXCP_LSERR] = "v8M LSERR UsageFault",
             [EXCP_UNALIGNED] = "v7M UNALIGNED UsageFault",
+            [EXCP_DIVBYZERO] = "v7M DIVBYZERO UsageFault",
         };
 
         if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
         armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
         env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_UNALIGNED_MASK;
         break;
+    case EXCP_DIVBYZERO:
+        armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
+        env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_DIVBYZERO_MASK;
+        break;
     case EXCP_SWI:
         /* The PC already points to the next instruction.  */
         armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SVC, env->v7m.secure);
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool op_div(DisasContext *s, arg_rrr *a, bool u)
     t1 = load_reg(s, a->rn);
     t2 = load_reg(s, a->rm);
     if (u) {
-        gen_helper_udiv(t1, t1, t2);
+        gen_helper_udiv(t1, cpu_env, t1, t2);
     } else {
-        gen_helper_sdiv(t1, t1, t2);
+        gen_helper_sdiv(t1, cpu_env, t1, t2);
     }
     tcg_temp_free_i32(t2);
     store_reg(s, a->rd, t1);
-- 
2.20.1

From: Hamza Mahfooz <someguy@effective-light.com>

As per commit 5626f8c6d468 ("rcu: Add automatically released rcu_read_lock
variants"), RCU_READ_LOCK_GUARD() should be used instead of
rcu_read_{un}lock().

Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210727235201.11491-1-someguy@effective-light.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/kvm.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
     hwaddr xlat, len, doorbell_gpa;
     MemoryRegionSection mrs;
     MemoryRegion *mr;
-    int ret = 1;
 
     if (as == &address_space_memory) {
         return 0;
@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
 
     /* MSI doorbell address is translated by an IOMMU */
 
-    rcu_read_lock();
+    RCU_READ_LOCK_GUARD();
+
     mr = address_space_translate(as, address, &xlat, &len, true,
                                  MEMTXATTRS_UNSPECIFIED);
+
     if (!mr) {
-        goto unlock;
+        return 1;
     }
+
     mrs = memory_region_find(mr, xlat, 1);
+
     if (!mrs.mr) {
-        goto unlock;
+        return 1;
     }
 
     doorbell_gpa = mrs.offset_within_address_space;
@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
 
     trace_kvm_arm_fixup_msi_route(address, doorbell_gpa);
 
-    ret = 0;
-
-unlock:
-    rcu_read_unlock();
-    return ret;
+    return 0;
 }
 
 int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
-- 
2.20.1

From: Jan Luebbe <jlu@pengutronix.de>

Break events are currently only handled by chardev/char-serial.c, so we
just ignore errors, which results in no behaviour change for other
chardevs.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Message-id: 20210806144700.3751979-1-jlu@pengutronix.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/char/pl011.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/qdev-properties-system.h"
 #include "migration/vmstate.h"
 #include "chardev/char-fe.h"
+#include "chardev/char-serial.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
 #include "trace.h"
@@ -XXX,XX +XXX,XX @@ static void pl011_write(void *opaque, hwaddr offset,
             s->read_count = 0;
             s->read_pos = 0;
         }
+        if ((s->lcr ^ value) & 0x1) {
+            int break_enable = value & 0x1;
+            qemu_chr_fe_ioctl(&s->chr, CHR_IOCTL_SERIAL_SET_BREAK,
+                              &break_enable);
+        }
         s->lcr = value;
         pl011_set_read_trigger(s);
         break;
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

Instantiate SAI1/2/3 and ASRC as unimplemented devices to avoid random
Linux kernel crashes, such as

Unhandled fault: external abort on non-linefetch (0x808) at 0xd1580010
pgd = (ptrval)
[d1580010] *pgd=8231b811, *pte=02034653, *ppte=02034453
Internal error: : 808 [#1] SMP ARM
...
[<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
[<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
[<c09580f4>] (_regmap_write) from [<c095837c>] (_regmap_update_bits+0xe4/0xec)
[<c095837c>] (_regmap_update_bits) from [<c09599b4>] (regmap_update_bits_base+0x50/0x74)
[<c09599b4>] (regmap_update_bits_base) from [<c0d3e9e4>] (fsl_asrc_runtime_resume+0x1e4/0x21c)
[<c0d3e9e4>] (fsl_asrc_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
[<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
[<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
[<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
[<c0942dfc>] (__pm_runtime_resume) from [<c0d3ecc4>] (fsl_asrc_probe+0x2a8/0x708)
[<c0d3ecc4>] (fsl_asrc_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
[<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
[<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
[<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
[<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
[<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
[<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
[<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
[<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
[<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
[<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
[<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)

Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000
pgd = (ptrval)
[d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453
Internal error: : 808 [#1] SMP ARM
...
[<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
[<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
[<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60)
[<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec)
[<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
[<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
[<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
[<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
[<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c)
[<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
[<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
[<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
[<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
[<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
[<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
[<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
[<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
[<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
[<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
[<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
[<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Message-id: 20210810160318.87376-1-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/fsl-imx6ul.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx6ul.c
+++ b/hw/arm/fsl-imx6ul.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
      */
     create_unimplemented_device("sdma", FSL_IMX6UL_SDMA_ADDR, 0x4000);
 
+    /*
+     * SAI (Audio SSI (Synchronous Serial Interface))
+     */
+    create_unimplemented_device("sai1", FSL_IMX6UL_SAI1_ADDR, 0x4000);
+    create_unimplemented_device("sai2", FSL_IMX6UL_SAI2_ADDR, 0x4000);
+    create_unimplemented_device("sai3", FSL_IMX6UL_SAI3_ADDR, 0x4000);
+
     /*
      * PWM
      */
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
     create_unimplemented_device("pwm3", FSL_IMX6UL_PWM3_ADDR, 0x4000);
     create_unimplemented_device("pwm4", FSL_IMX6UL_PWM4_ADDR, 0x4000);
 
+    /*
+     * Audio ASRC (asynchronous sample rate converter)
+     */
+    create_unimplemented_device("asrc", FSL_IMX6UL_ASRC_ADDR, 0x4000);
+
     /*
      * CAN
      */
-- 
2.20.1

From: "Wen, Jianxian" <Jianxian.Wen@verisilicon.com>

Add property memory region which can connect with IOMMU region to support SMMU translate.

Signed-off-by: Jianxian Wen <jianxian.wen@verisilicon.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 4C23C17B8E87E74E906A25A3254A03F4FA1FEC31@SHASXM03.verisilicon.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/exynos4210.c  |  3 +++
 hw/arm/xilinx_zynq.c |  3 +++
 hw/dma/pl330.c       | 26 ++++++++++++++++++++++----
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -XXX,XX +XXX,XX @@ static DeviceState *pl330_create(uint32_t base, qemu_or_irq *orgate,
     int i;
 
     dev = qdev_new("pl330");
+    object_property_set_link(OBJECT(dev), "memory",
+                             OBJECT(get_system_memory()),
+                             &error_fatal);
     qdev_prop_set_uint8(dev, "num_events", nevents);
     qdev_prop_set_uint8(dev, "num_chnls",  8);
     qdev_prop_set_uint8(dev, "num_periph_req",  nreq);
diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -XXX,XX +XXX,XX @@ static void zynq_init(MachineState *machine)
     sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[39-IRQ_OFFSET]);
 
     dev = qdev_new("pl330");
+    object_property_set_link(OBJECT(dev), "memory",
+                             OBJECT(address_space_mem),
+                             &error_fatal);
     qdev_prop_set_uint8(dev, "num_chnls",  8);
     qdev_prop_set_uint8(dev, "num_periph_req",  4);
     qdev_prop_set_uint8(dev, "num_events",  16);
diff --git a/hw/dma/pl330.c b/hw/dma/pl330.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/dma/pl330.c
+++ b/hw/dma/pl330.c
@@ -XXX,XX +XXX,XX @@ struct PL330State {
     uint8_t num_faulting;
     uint8_t periph_busy[PL330_PERIPH_NUM];
 
+    /* Memory region that DMA operation access */
+    MemoryRegion *mem_mr;
+    AddressSpace *mem_as;
 };
 
 #define TYPE_PL330 "pl330"
@@ -XXX,XX +XXX,XX @@ static inline const PL330InsnDesc *pl330_fetch_insn(PL330Chan *ch)
     uint8_t opcode;
     int i;
 
-    dma_memory_read(&address_space_memory, ch->pc, &opcode, 1);
+    dma_memory_read(ch->parent->mem_as, ch->pc, &opcode, 1);
     for (i = 0; insn_desc[i].size; i++) {
         if ((opcode & insn_desc[i].opmask) == insn_desc[i].opcode) {
             return &insn_desc[i];
@@ -XXX,XX +XXX,XX @@ static inline void pl330_exec_insn(PL330Chan *ch, const PL330InsnDesc *insn)
     uint8_t buf[PL330_INSN_MAXSIZE];
 
     assert(insn->size <= PL330_INSN_MAXSIZE);
-    dma_memory_read(&address_space_memory, ch->pc, buf, insn->size);
+    dma_memory_read(ch->parent->mem_as, ch->pc, buf, insn->size);
     insn->exec(ch, buf[0], &buf[1], insn->size - 1);
 }
 
@@ -XXX,XX +XXX,XX @@ static int pl330_exec_cycle(PL330Chan *channel)
     if (q != NULL && q->len <= pl330_fifo_num_free(&s->fifo)) {
         int len = q->len - (q->addr & (q->len - 1));
 
-        dma_memory_read(&address_space_memory, q->addr, buf, len);
+        dma_memory_read(s->mem_as, q->addr, buf, len);
         trace_pl330_exec_cycle(q->addr, len);
         if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
             pl330_hexdump(buf, len);
@@ -XXX,XX +XXX,XX @@ static int pl330_exec_cycle(PL330Chan *channel)
             fifo_res = pl330_fifo_get(&s->fifo, buf, len, q->tag);
         }
         if (fifo_res == PL330_FIFO_OK || q->z) {
-            dma_memory_write(&address_space_memory, q->addr, buf, len);
+            dma_memory_write(s->mem_as, q->addr, buf, len);
             trace_pl330_exec_cycle(q->addr, len);
             if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
                 pl330_hexdump(buf, len);
@@ -XXX,XX +XXX,XX @@ static void pl330_realize(DeviceState *dev, Error **errp)
                           "dma", PL330_IOMEM_SIZE);
     sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
 
+    if (!s->mem_mr) {
+        error_setg(errp, "'memory' link is not set");
+        return;
+    } else if (s->mem_mr == get_system_memory()) {
+        /* Avoid creating new AS for system memory. */
+        s->mem_as = &address_space_memory;
+    } else {
+        s->mem_as = g_new0(AddressSpace, 1);
+        address_space_init(s->mem_as, s->mem_mr,
+                           memory_region_name(s->mem_mr));
+    }
+
     s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, pl330_exec_cycle_timer, s);
 
     s->cfg[0] = (s->mgr_ns_at_rst ? 0x4 : 0) |
@@ -XXX,XX +XXX,XX @@ static Property pl330_properties[] = {
     DEFINE_PROP_UINT8("rd_q_dep", PL330State, rd_q_dep, 16),
     DEFINE_PROP_UINT16("data_buffer_dep", PL330State, data_buffer_dep, 256),
 
+    DEFINE_PROP_LINK("memory", PL330State, mem_mr,
+                     TYPE_MEMORY_REGION, MemoryRegion *),
+
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.20.1

From: Eduardo Habkost <ehabkost@redhat.com>

The SBSA_GWDT enum value conflicts with the SBSA_GWDT() QOM type
checking helper, preventing us from using a OBJECT_DEFINE* or
DEFINE_INSTANCE_CHECKER macro for the SBSA_GWDT() wrapper.

If I understand the SBSA 6.0 specification correctly, the signal
being connected to IRQ 16 is the WS0 output signal from the
Generic Watchdog.  Rename the enum value to SBSA_GWDT_WS0 to be
more explicit and avoid the name conflict.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-id: 20210806023119.431680-1-ehabkost@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/sbsa-ref.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -XXX,XX +XXX,XX @@ enum {
     SBSA_GIC_DIST,
     SBSA_GIC_REDIST,
     SBSA_SECURE_EC,
-    SBSA_GWDT,
+    SBSA_GWDT_WS0,
     SBSA_GWDT_REFRESH,
     SBSA_GWDT_CONTROL,
     SBSA_SMMU,
@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
     [SBSA_AHCI] = 10,
     [SBSA_EHCI] = 11,
     [SBSA_SMMU] = 12, /* ... to 15 */
-    [SBSA_GWDT] = 16,
+    [SBSA_GWDT_WS0] = 16,
 };
 
 static const char * const valid_cpus[] = {
@@ -XXX,XX +XXX,XX @@ static void create_wdt(const SBSAMachineState *sms)
     hwaddr cbase = sbsa_ref_memmap[SBSA_GWDT_CONTROL].base;
     DeviceState *dev = qdev_new(TYPE_WDT_SBSA);
     SysBusDevice *s = SYS_BUS_DEVICE(dev);
-    int irq = sbsa_ref_irqmap[SBSA_GWDT];
+    int irq = sbsa_ref_irqmap[SBSA_GWDT_WS0];
 
     sysbus_realize_and_unref(s, &error_fatal);
     sysbus_mmio_map(s, 0, rbase);
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

Instantiate SAI1/2/3 as unimplemented devices to avoid Linux kernel crashes
such as the following.

Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000
pgd = (ptrval)
[d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453
Internal error: : 808 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc5 #1
...
[<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
[<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
[<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60)
[<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec)
[<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
[<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
[<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
[<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
[<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c)
[<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
[<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
[<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
[<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
[<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
[<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
[<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
[<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
[<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
[<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
[<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
[<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Message-id: 20210810175607.538090-1-linux@roeck-us.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/fsl-imx7.h | 5 +++++
 hw/arm/fsl-imx7.c         | 7 +++++++
 2 files changed, 12 insertions(+)

diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/fsl-imx7.h
+++ b/include/hw/arm/fsl-imx7.h
@@ -XXX,XX +XXX,XX @@ enum FslIMX7MemoryMap {
     FSL_IMX7_UART6_ADDR           = 0x30A80000,
     FSL_IMX7_UART7_ADDR           = 0x30A90000,
 
+    FSL_IMX7_SAI1_ADDR            = 0x308A0000,
+    FSL_IMX7_SAI2_ADDR            = 0x308B0000,
+    FSL_IMX7_SAI3_ADDR            = 0x308C0000,
+    FSL_IMX7_SAIn_SIZE            = 0x10000,
+
     FSL_IMX7_ENET1_ADDR           = 0x30BE0000,
     FSL_IMX7_ENET2_ADDR           = 0x30BF0000,
 
diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx7.c
+++ b/hw/arm/fsl-imx7.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
     create_unimplemented_device("can1", FSL_IMX7_CAN1_ADDR, FSL_IMX7_CANn_SIZE);
     create_unimplemented_device("can2", FSL_IMX7_CAN2_ADDR, FSL_IMX7_CANn_SIZE);
 
+    /*
+     * SAI (Audio SSI (Synchronous Serial Interface))
+     */
+    create_unimplemented_device("sai1", FSL_IMX7_SAI1_ADDR, FSL_IMX7_SAIn_SIZE);
+    create_unimplemented_device("sai2", FSL_IMX7_SAI2_ADDR, FSL_IMX7_SAIn_SIZE);
+    create_unimplemented_device("sai2", FSL_IMX7_SAI3_ADDR, FSL_IMX7_SAIn_SIZE);
+
     /*
      * OCOTP
      */
-- 
2.20.1

From: Sebastian Meyer <meyer@absint.com>

With gdb 9.0 and better it is possible to connect to a gdbstub
over unix sockets, which is better than a TCP socket connection
in some situations. The QEMU command line to set this up is
non-obvious; document it.

Signed-off-by: Sebastian Meyer <meyer@absint.com>
Message-id: 162867284829.27377.4784930719350564918-0@git.sr.ht
[PMM: Tweaked commit message; adjusted wording in a couple of
places; fixed rST formatting issue; moved section up out of
the 'advanced debugging options' subsection]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/gdb.rst | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/docs/system/gdb.rst b/docs/system/gdb.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/gdb.rst
+++ b/docs/system/gdb.rst
@@ -XXX,XX +XXX,XX @@ The ``-s`` option will make QEMU listen for an incoming connection
 from gdb on TCP port 1234, and ``-S`` will make QEMU not start the
 guest until you tell it to from gdb. (If you want to specify which
 TCP port to use or to use something other than TCP for the gdbstub
-connection, use the ``-gdb dev`` option instead of ``-s``.)
+connection, use the ``-gdb dev`` option instead of ``-s``. See
+`Using unix sockets`_ for an example.)
 
 .. parsed-literal::
 
@@ -XXX,XX +XXX,XX @@ not just those in the cluster you are currently working on::
 
   (gdb) set schedule-multiple on
 
+Using unix sockets
+==================
+
+An alternate method for connecting gdb to the QEMU gdbstub is to use
+a unix socket (if supported by your operating system). This is useful when
+running several tests in parallel, or if you do not have a known free TCP
+port (e.g. when running automated tests).
+
+First create a chardev with the appropriate options, then
+instruct the gdbserver to use that device:
+
+.. parsed-literal::
+
+   |qemu_system| -chardev socket,path=/tmp/gdb-socket,server=on,wait=off,id=gdb0 -gdb chardev:gdb0 -S ...
+
+Start gdb as before, but this time connect using the path to
+the socket::
+
+   (gdb) target remote /tmp/gdb-socket
+
+Note that to use a unix socket for the connection you will need
+gdb version 9.0 or newer.
+
 Advanced debugging options
 ==========================
 
-- 
2.20.1

The following changes since commit 5767815218efd3cbfd409505ed824d5f356044ae:

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2024-02-14 15:45:52 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240215

for you to fetch changes up to f780e63fe731b058fe52d43653600d8729a1b5f2:

docs: Add documentation for the mps3-an536 board (2024-02-15 14:32:39 +0000)

----------------------------------------------------------------
target-arm queue:
 * hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
 * linux-user/aarch64: Choose SYNC as the preferred MTE mode
 * Fix some errors in SVE/SME handling of MTE tags
 * hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
 * hw/block/tc58128: Don't emit deprecation warning under qtest
 * tests/qtest: Fix handling of npcm7xx and GMAC tests
 * hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
 * tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
 * Don't assert on vmload/vmsave of M-profile CPUs
 * hw/arm/smmuv3: add support for stage 1 access fault
 * hw/arm/stellaris: QOM cleanups
 * Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
 * Improve Cortex_R52 IMPDEF sysreg modelling
 * Allow access to SPSR_hyp from hyp mode
 * New board model mps3-an536 (Cortex-R52)

----------------------------------------------------------------
Luc Michel (1):
      hw/arm/smmuv3: add support for stage 1 access fault

Nabih Estefan (1):
      tests/qtest: Fix GMAC test to run on a machine in upstream QEMU

Peter Maydell (22):
      hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
      hw/block/tc58128: Don't emit deprecation warning under qtest
      tests/qtest/meson.build: Don't include qtests_npcm7xx in qtests_aarch64
      tests/qtest/bios-tables-test: Allow changes to virt GTDT
      hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
      tests/qtest/bios-tables-tests: Update virt golden reference
      hw/arm/npcm7xx: Call qemu_configure_nic_device() for GMAC modules
      tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
      target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
      target/arm: Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
      target/arm: The Cortex-R52 has a read-only CBAR
      target/arm: Add Cortex-R52 IMPDEF sysregs
      target/arm: Allow access to SPSR_hyp from hyp mode
      hw/misc/mps2-scc: Fix condition for CFG3 register
      hw/misc/mps2-scc: Factor out which-board conditionals
      hw/misc/mps2-scc: Make changes needed for AN536 FPGA image
      hw/arm/mps3r: Initial skeleton for mps3-an536 board
      hw/arm/mps3r: Add CPUs, GIC, and per-CPU RAM
      hw/arm/mps3r: Add UARTs
      hw/arm/mps3r: Add GPIO, watchdog, dual-timer, I2C devices
      hw/arm/mps3r: Add remaining devices
      docs: Add documentation for the mps3-an536 board

Philippe Mathieu-Daudé (5):
      hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
      hw/arm/stellaris: Convert ADC controller to Resettable interface
      hw/arm/stellaris: Convert I2C controller to Resettable interface
      hw/arm/stellaris: Add missing QOM 'machine' parent
      hw/arm/stellaris: Add missing QOM 'SoC' parent

Richard Henderson (6):
      linux-user/aarch64: Choose SYNC as the preferred MTE mode
      target/arm: Fix nregs computation in do_{ld,st}_zpa
      target/arm: Adjust and validate mtedesc sizem1
      target/arm: Split out make_svemte_desc
      target/arm: Handle mte in do_ldrq, do_ldro
      target/arm: Fix SVE/SME gross MTE suppression checks

From: Philippe Mathieu-Daudé <philmd@linaro.org>

Similarly to commits dadbb58f59..5ae79fe825 for other ARM boards,
connect FIQ output of the GIC CPU interfaces to the CPU.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20240130152548.17855-1-philmd@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xilinx_zynq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -XXX,XX +XXX,XX @@ static void zynq_init(MachineState *machine)
     sysbus_mmio_map(busdev, 0, MPCORE_PERIPHBASE);
     sysbus_connect_irq(busdev, 0,
                        qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));
+    sysbus_connect_irq(busdev, 1,
+                       qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ));
 
     for (n = 0; n < 64; n++) {
         pic[n] = qdev_get_gpio_in(dev, n);
-- 
2.34.1