Series comparison

-[PULL 00/57] target-arm queue
+[PULL 00/35] target-arm queue
-The following changes since commit 53f306f316549d20c76886903181413d20842423:
+The following changes since commit 5767815218efd3cbfd409505ed824d5f356044ae:
-  Merge remote-tracking branch 'remotes/ehabkost-gl/tags/x86-next-pull-request' into staging (2021-06-21 11:26:04 +0100)
+  Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2024-02-14 15:45:52 +0000)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210621
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240215
-for you to fetch changes up to a83f1d9263d281f938a3984cda7104d55affd43a:
+for you to fetch changes up to f780e63fe731b058fe52d43653600d8729a1b5f2:
-  docs/system: arm: Add nRF boards description (2021-06-21 17:24:33 +0100)
+  docs: Add documentation for the mps3-an536 board (2024-02-15 14:32:39 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * Don't require 'virt' board to be compiled in for ACPI GHES code
+ * hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
- * docs: Document which architecture extensions we emulate
+ * linux-user/aarch64: Choose SYNC as the preferred MTE mode
- * Fix bugs in M-profile FPCXT_NS accesses
+ * Fix some errors in SVE/SME handling of MTE tags
- * First slice of MVE patches
+ * hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
- * Implement MTE3
+ * hw/block/tc58128: Don't emit deprecation warning under qtest
- * docs/system: arm: Add nRF boards description
+ * tests/qtest: Fix handling of npcm7xx and GMAC tests
  * hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
  * tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
  * Don't assert on vmload/vmsave of M-profile CPUs
  * hw/arm/smmuv3: add support for stage 1 access fault
  * hw/arm/stellaris: QOM cleanups
  * Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
  * Improve Cortex_R52 IMPDEF sysreg modelling
  * Allow access to SPSR_hyp from hyp mode
  * New board model mps3-an536 (Cortex-R52)
 ----------------------------------------------------------------
-Alexandre Iooss (1):
+Luc Michel (1):
-      docs/system: arm: Add nRF boards description
+      hw/arm/smmuv3: add support for stage 1 access fault
-Peter Collingbourne (1):
+Nabih Estefan (1):
-      target/arm: Implement MTE3
+      tests/qtest: Fix GMAC test to run on a machine in upstream QEMU
-Peter Maydell (55):
+Peter Maydell (22):
-      hw/acpi: Provide stub version of acpi_ghes_record_errors()
+      hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
-      hw/acpi: Provide function acpi_ghes_present()
+      hw/block/tc58128: Don't emit deprecation warning under qtest
-      target/arm: Use acpi_ghes_present() to see if we report ACPI memory errors
+      tests/qtest/meson.build: Don't include qtests_npcm7xx in qtests_aarch64
-      docs/system/arm: Document which architecture extensions we emulate
+      tests/qtest/bios-tables-test: Allow changes to virt GTDT
-      target/arm/translate-vfp.c: Whitespace fixes
+      hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
-      target/arm: Handle FPU being disabled in FPCXT_NS accesses
+      tests/qtest/bios-tables-tests: Update virt golden reference
-      target/arm: Don't NOCP fault for FPCXT_NS accesses
+      hw/arm/npcm7xx: Call qemu_configure_nic_device() for GMAC modules
-      target/arm: Handle writeback in VLDR/VSTR sysreg with no memory access
+      tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
-      target/arm: Factor FP context update code out into helper function
+      target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
-      target/arm: Split vfp_access_check() into A and M versions
+      target/arm: Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
-      target/arm: Handle FPU check for FPCXT_NS insns via vfp_access_check_m()
+      target/arm: The Cortex-R52 has a read-only CBAR
-      target/arm: Implement MVE VLDR/VSTR (non-widening forms)
+      target/arm: Add Cortex-R52 IMPDEF sysregs
-      target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
+      target/arm: Allow access to SPSR_hyp from hyp mode
-      target/arm: Implement MVE VCLZ
+      hw/misc/mps2-scc: Fix condition for CFG3 register
-      target/arm: Implement MVE VCLS
+      hw/misc/mps2-scc: Factor out which-board conditionals
-      target/arm: Implement MVE VREV16, VREV32, VREV64
+      hw/misc/mps2-scc: Make changes needed for AN536 FPGA image
-      target/arm: Implement MVE VMVN (register)
+      hw/arm/mps3r: Initial skeleton for mps3-an536 board
-      target/arm: Implement MVE VABS
+      hw/arm/mps3r: Add CPUs, GIC, and per-CPU RAM
-      target/arm: Implement MVE VNEG
+      hw/arm/mps3r: Add UARTs
-      tcg: Make gen_dup_i32/i64() public as tcg_gen_dup_i32/i64
+      hw/arm/mps3r: Add GPIO, watchdog, dual-timer, I2C devices
-      target/arm: Implement MVE VDUP
+      hw/arm/mps3r: Add remaining devices
-      target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
+      docs: Add documentation for the mps3-an536 board
       target/arm: Implement MVE VADD, VSUB, VMUL
       target/arm: Implement MVE VMULH
       target/arm: Implement MVE VRMULH
       target/arm: Implement MVE VMAX, VMIN
       target/arm: Implement MVE VABD
       target/arm: Implement MVE VHADD, VHSUB
       target/arm: Implement MVE VMULL
       target/arm: Implement MVE VMLALDAV
       target/arm: Implement MVE VMLSLDAV
       target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
       target/arm: Implement MVE VADD (scalar)
       target/arm: Implement MVE VSUB, VMUL (scalar)
       target/arm: Implement MVE VHADD, VHSUB (scalar)
       target/arm: Implement MVE VBRSR
       target/arm: Implement MVE VPST
       target/arm: Implement MVE VQADD and VQSUB
       target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
       target/arm: Implement MVE VQDMULL scalar
       target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
       target/arm: Implement MVE VQADD, VQSUB (vector)
       target/arm: Implement MVE VQSHL (vector)
       target/arm: Implement MVE VQRSHL
       target/arm: Implement MVE VSHL insn
       target/arm: Implement MVE VRSHL
       target/arm: Implement MVE VQDMLADH and VQRDMLADH
       target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
       target/arm: Implement MVE VQDMULL (vector)
       target/arm: Implement MVE VRHADD
       target/arm: Implement MVE VADC, VSBC
       target/arm: Implement MVE VCADD
       target/arm: Implement MVE VHCADD
       target/arm: Implement MVE VADDV
       target/arm: Make VMOV scalar <-> gpreg beatwise for MVE
- docs/system/arm/emulation.rst |  103 ++++
+Philippe Mathieu-Daudé (5):
- docs/system/arm/nrf.rst       |   51 ++
+      hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
- docs/system/target-arm.rst    |    7 +
+      hw/arm/stellaris: Convert ADC controller to Resettable interface
- include/hw/acpi/ghes.h        |    9 +
+      hw/arm/stellaris: Convert I2C controller to Resettable interface
- include/tcg/tcg-op.h          |    8 +
+      hw/arm/stellaris: Add missing QOM 'machine' parent
- include/tcg/tcg.h             |    1 -
+      hw/arm/stellaris: Add missing QOM 'SoC' parent
  target/arm/helper-mve.h       |  357 +++++++++++++
  target/arm/helper.h           |    2 +
  target/arm/internals.h        |   11 +
  target/arm/translate-a32.h    |    3 +
  target/arm/translate.h        |   10 +
  target/arm/m-nocp.decode      |   24 +
  target/arm/mve.decode         |  240 +++++++++
  target/arm/vfp.decode         |   14 -
  hw/acpi/ghes-stub.c           |   22 +
  hw/acpi/ghes.c                |   17 +
  target/arm/cpu64.c            |    2 +-
  target/arm/kvm64.c            |    6 +-
  target/arm/mte_helper.c       |   82 +--
  target/arm/mve_helper.c       | 1160 +++++++++++++++++++++++++++++++++++++++++
  target/arm/translate-m-nocp.c |  550 +++++++++++++++++++
  target/arm/translate-mve.c    |  759 +++++++++++++++++++++++++++
  target/arm/translate-vfp.c    |  741 +++++++-------------------
  tcg/tcg-op-gvec.c             |   20 +-
  MAINTAINERS                   |    1 +
  hw/acpi/meson.build           |    6 +-
  target/arm/meson.build        |    1 +
 files changed, 3578 insertions(+), 629 deletions(-)
  create mode 100644 docs/system/arm/emulation.rst
  create mode 100644 docs/system/arm/nrf.rst
  create mode 100644 target/arm/helper-mve.h
  create mode 100644 hw/acpi/ghes-stub.c
  create mode 100644 target/arm/mve_helper.c
+Richard Henderson (6):
+      linux-user/aarch64: Choose SYNC as the preferred MTE mode
+      target/arm: Fix nregs computation in do_{ld,st}_zpa
+      target/arm: Adjust and validate mtedesc sizem1
+      target/arm: Split out make_svemte_desc
+      target/arm: Handle mte in do_ldrq, do_ldro
+      target/arm: Fix SVE/SME gross MTE suppression checks
+ MAINTAINERS                             |   3 +-
+ docs/system/arm/mps2.rst                |  37 +-
+ configs/devices/arm-softmmu/default.mak |   1 +
+ hw/arm/smmuv3-internal.h                |   1 +
+ include/hw/arm/smmu-common.h            |   1 +
+ include/hw/arm/virt.h                   |   2 +
+ include/hw/misc/mps2-scc.h              |   1 +
+ linux-user/aarch64/target_prctl.h       |  29 +-
+ target/arm/internals.h                  |   2 +-
+ target/arm/tcg/translate-a64.h          |   2 +
+ hw/arm/mps3r.c                          | 640 ++++++++++++++++++++++++++++++++
+ hw/arm/npcm7xx.c                        |   1 +
+ hw/arm/smmu-common.c                    |  11 +
+ hw/arm/smmuv3.c                         |   1 +
+ hw/arm/stellaris.c                      |  47 ++-
+ hw/arm/virt-acpi-build.c                |  20 +-
+ hw/arm/virt.c                           |  60 ++-
+ hw/arm/xilinx_zynq.c                    |   2 +
+ hw/block/tc58128.c                      |   4 +-
+ hw/misc/mps2-scc.c                      | 138 ++++++-
+ hw/pci-host/raven.c                     |   1 +
+ target/arm/helper.c                     |  14 +-
+ target/arm/tcg/cpu32.c                  | 109 ++++++
+ target/arm/tcg/op_helper.c              |  43 ++-
+ target/arm/tcg/sme_helper.c             |   8 +-
+ target/arm/tcg/sve_helper.c             |  12 +-
+ target/arm/tcg/translate-sme.c          |  15 +-
+ target/arm/tcg/translate-sve.c          |  83 +++--
+ target/arm/tcg/translate.c              |  19 +-
+ tests/qtest/npcm7xx_emc-test.c          |   5 +-
+ tests/qtest/npcm_gmac-test.c            |  84 +----
+ hw/arm/Kconfig                          |   5 +
+ hw/arm/meson.build                      |   1 +
+ tests/data/acpi/virt/FACP               | Bin 276 -> 276 bytes
+ tests/data/acpi/virt/GTDT               | Bin 96 -> 104 bytes
+ tests/qtest/meson.build                 |   4 +-
+files changed, 1184 insertions(+), 222 deletions(-)
+ create mode 100644 hw/arm/mps3r.c

-[PULL 01/57] hw/acpi: Provide stub version of acpi_ghes_record_errors()
+Deleted patch
-Generic code in target/arm wants to call acpi_ghes_record_errors();
-provide a stub version so that we don't fail to link when
-CONFIG_ACPI_APEI is not set. This requires us to add a new
-ghes-stub.c file to contain it and the meson.build mechanics
-to use it when appropriate.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
-Message-id: 20210603171259.27962-2-peter.maydell@linaro.org
----
- hw/acpi/ghes-stub.c | 17 +++++++++++++++++
- hw/acpi/meson.build |  6 +++---
-files changed, 20 insertions(+), 3 deletions(-)
- create mode 100644 hw/acpi/ghes-stub.c
-diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
-new file mode 100644
-index XXXXXXX..XXXXXXX
---- /dev/null
-+++ b/hw/acpi/ghes-stub.c
-@@ -XXX,XX +XXX,XX @@
-+/*
-+ * Support for generating APEI tables and recording CPER for Guests:
-+ * stub functions.
-+ *
-+ * Copyright (c) 2021 Linaro, Ltd
-+ *
-+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
-+ * See the COPYING file in the top-level directory.
-+ */
-+
-+#include "qemu/osdep.h"
-+#include "hw/acpi/ghes.h"
-+
-+int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
-+{
-+    return -1;
-+}
-diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
-index XXXXXXX..XXXXXXX 100644
---- a/hw/acpi/meson.build
-+++ b/hw/acpi/meson.build
-@@ -XXX,XX +XXX,XX @@ acpi_ss.add(when: 'CONFIG_ACPI_PCI', if_true: files('pci.c'))
- acpi_ss.add(when: 'CONFIG_ACPI_VMGENID', if_true: files('vmgenid.c'))
- acpi_ss.add(when: 'CONFIG_ACPI_HW_REDUCED', if_true: files('generic_event_device.c'))
- acpi_ss.add(when: 'CONFIG_ACPI_HMAT', if_true: files('hmat.c'))
--acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'))
-+acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'), if_false: files('ghes-stub.c'))
- acpi_ss.add(when: 'CONFIG_ACPI_X86', if_true: files('core.c', 'piix4.c', 'pcihp.c'), if_false: files('acpi-stub.c'))
- acpi_ss.add(when: 'CONFIG_ACPI_X86_ICH', if_true: files('ich9.c', 'tco.c'))
- acpi_ss.add(when: 'CONFIG_IPMI', if_true: files('ipmi.c'), if_false: files('ipmi-stub.c'))
- acpi_ss.add(when: 'CONFIG_PC', if_false: files('acpi-x86-stub.c'))
- acpi_ss.add(when: 'CONFIG_TPM', if_true: files('tpm.c'))
--softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c'))
-+softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c'))
- softmmu_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
- softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('acpi-stub.c', 'aml-build-stub.c',
--                                                  'acpi-x86-stub.c', 'ipmi-stub.c'))
-+                                                  'acpi-x86-stub.c', 'ipmi-stub.c', 'ghes-stub.c'))
---
-.20.1

-[PULL 02/57] hw/acpi: Provide function acpi_ghes_present()
+Deleted patch
-Allow code elsewhere in the system to check whether the ACPI GHES
-table is present, so it can determine whether it is OK to try to
-record an error by calling acpi_ghes_record_errors().
-(We don't need to migrate the new 'present' field in AcpiGhesState,
-because it is set once at system initialization and doesn't change.)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
-Message-id: 20210603171259.27962-3-peter.maydell@linaro.org
----
- include/hw/acpi/ghes.h |  9 +++++++++
- hw/acpi/ghes-stub.c    |  5 +++++
- hw/acpi/ghes.c         | 17 +++++++++++++++++
-files changed, 31 insertions(+)
-diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/acpi/ghes.h
-+++ b/include/hw/acpi/ghes.h
-@@ -XXX,XX +XXX,XX @@ enum {
- typedef struct AcpiGhesState {
-     uint64_t ghes_addr_le;
-+    bool present; /* True if GHES is present at all on this board */
- } AcpiGhesState;
- void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
-@@ -XXX,XX +XXX,XX @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
- void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
-                           GArray *hardware_errors);
- int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
-+
-+/**
-+ * acpi_ghes_present: Report whether ACPI GHES table is present
-+ *
-+ * Returns: true if the system has an ACPI GHES table and it is
-+ * safe to call acpi_ghes_record_errors() to record a memory error.
-+ */
-+bool acpi_ghes_present(void);
- #endif
-diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/acpi/ghes-stub.c
-+++ b/hw/acpi/ghes-stub.c
-@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
- {
-     return -1;
- }
-+
-+bool acpi_ghes_present(void)
-+{
-+    return false;
-+}
-diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/acpi/ghes.c
-+++ b/hw/acpi/ghes.c
-@@ -XXX,XX +XXX,XX @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
-     /* Create a read-write fw_cfg file for Address */
-     fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
-         NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
-+
-+    ags->present = true;
- }
- int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
-@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
-     return ret;
- }
-+
-+bool acpi_ghes_present(void)
-+{
-+    AcpiGedState *acpi_ged_state;
-+    AcpiGhesState *ags;
-+
-+    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
-+                                                       NULL));
-+
-+    if (!acpi_ged_state) {
-+        return false;
-+    }
-+    ags = &acpi_ged_state->ghes_state;
-+    return ags->present;
-+}
---
-.20.1

-[PULL 03/57] target/arm: Use acpi_ghes_present() to see if we report ACPI memory errors
+Deleted patch
-The virt_is_acpi_enabled() function is specific to the virt board, as
-is the check for its 'ras' property.  Use the new acpi_ghes_present()
-function to check whether we should report memory errors via
-acpi_ghes_record_errors().
-This avoids a link error if QEMU was built without support for the
-virt board, and provides a mechanism that can be used by any future
-board models that want to add ACPI memory error reporting support
-(they only need to call acpi_ghes_add_fw_cfg()).
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
-Message-id: 20210603171259.27962-4-peter.maydell@linaro.org
----
- target/arm/kvm64.c | 6 +-----
-file changed, 1 insertion(+), 5 deletions(-)
-diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm64.c
-+++ b/target/arm/kvm64.c
-@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
- {
-     ram_addr_t ram_addr;
-     hwaddr paddr;
--    Object *obj = qdev_get_machine();
--    VirtMachineState *vms = VIRT_MACHINE(obj);
--    bool acpi_enabled = virt_is_acpi_enabled(vms);
-     assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
--    if (acpi_enabled && addr &&
--            object_property_get_bool(obj, "ras", NULL)) {
-+    if (acpi_ghes_present() && addr) {
-         ram_addr = qemu_ram_addr_from_host(addr);
-         if (ram_addr != RAM_ADDR_INVALID &&
-             kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
---
-.20.1

-[PULL 54/57] target/arm: Implement MVE VADDV
+[PULL 01/35] hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
-Implement the MVE VADDV insn, which performs an addition
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
 across vector lanes.
+Similarly to commits dadbb58f59..5ae79fe825 for other ARM boards,
+connect FIQ output of the GIC CPU interfaces to the CPU.
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20240130152548.17855-1-philmd@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-44-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  7 +++++++
+ hw/arm/xilinx_zynq.c | 2 ++
- target/arm/mve.decode      |  2 ++
+file changed, 2 insertions(+)
  target/arm/mve_helper.c    | 24 +++++++++++++++++++++
  target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++++++
 files changed, 76 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/xilinx_zynq.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/xilinx_zynq.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrmlaldavhuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+@@ -XXX,XX +XXX,XX @@ static void zynq_init(MachineState *machine)
+     sysbus_mmio_map(busdev, 0, MPCORE_PERIPHBASE);
- DEF_HELPER_FLAGS_4(mve_vrmlsldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+     sysbus_connect_irq(busdev, 0,
- DEF_HELPER_FLAGS_4(mve_vrmlsldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+                        qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));
-+
++    sysbus_connect_irq(busdev, 1,
-+DEF_HELPER_FLAGS_3(mve_vaddvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
++                       qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ));
-+DEF_HELPER_FLAGS_3(mve_vaddvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vaddvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+     for (n = 0; n < 64; n++) {
-+DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+         pic[n] = qdev_get_gpio_in(dev, n);
 +DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
  VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 +# Vector add across vector
 +VADDV            111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
  # Predicate operations
  %mask_22_13      22:1 13:3
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64
  DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
  DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
 +
 +/* Vector add across vector */
 +#define DO_VADDV(OP, ESIZE, TYPE)                               \
 +    uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
 +                                    uint32_t ra)                \
 +    {                                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        TYPE *m = vm;                                           \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            if (mask & 1) {                                     \
 +                ra += m[H##ESIZE(e)];                           \
 +            }                                                   \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +        return ra;                                              \
 +    }                                                           \
 +
 +DO_VADDV(vaddvsb, 1, uint8_t)
 +DO_VADDV(vaddvsh, 2, uint16_t)
 +DO_VADDV(vaddvsw, 4, uint32_t)
 +DO_VADDV(vaddvub, 1, uint8_t)
 +DO_VADDV(vaddvuh, 2, uint16_t)
 +DO_VADDV(vaddvuw, 4, uint32_t)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
 +typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
      mve_update_and_store_eci(s);
      return true;
  }
 +
 +static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
 +{
 +    /* VADDV: vector add across vector */
 +    static MVEGenVADDVFn * const fns[4][2] = {
 +        { gen_helper_mve_vaddvsb, gen_helper_mve_vaddvub },
 +        { gen_helper_mve_vaddvsh, gen_helper_mve_vaddvuh },
 +        { gen_helper_mve_vaddvsw, gen_helper_mve_vaddvuw },
 +        { NULL, NULL }
 +    };
 +    TCGv_ptr qm;
 +    TCGv_i32 rda;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        a->size == 3) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    /*
 +     * This insn is subject to beat-wise execution. Partial execution
 +     * of an A=0 (no-accumulate) insn which does not execute the first
 +     * beat must start with the current value of Rda, not zero.
 +     */
 +    if (a->a || mve_skip_first_beat(s)) {
 +        /* Accumulate input from Rda */
 +        rda = load_reg(s, a->rda);
 +    } else {
 +        /* Accumulate starting at zero */
 +        rda = tcg_const_i32(0);
 +    }
 +
 +    qm = mve_qreg_ptr(a->qm);
 +    fns[a->size][a->u](rda, cpu_env, qm, rda);
 +    store_reg(s, a->rda, rda);
 +    tcg_temp_free_ptr(qm);
 +
 +    mve_update_eci(s);
 +    return true;
 +}
 --
-.20.1
+.34.1

-[PULL 53/57] target/arm: Implement MVE VHCADD
+[PULL 02/35] linux-user/aarch64: Choose SYNC as the preferred MTE mode
-Implement the MVE VHCADD insn, which is similar to VCADD
+From: Richard Henderson <richard.henderson@linaro.org>
 but performs a halving step. This one overlaps with VADC.
+The API does not generate an error for setting ASYNC | SYNC; that merely
+constrains the selection vs the per-cpu default.  For qemu linux-user,
+choose SYNC as the default.
+Cc: qemu-stable@nongnu.org
+Reported-by: Gustavo Romero <gustavo.romero@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-43-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 8 ++++++++
+ linux-user/aarch64/target_prctl.h | 29 +++++++++++++++++------------
- target/arm/mve.decode      | 8 ++++++--
+file changed, 17 insertions(+), 12 deletions(-)
  target/arm/mve_helper.c    | 2 ++
  target/arm/translate-mve.c | 4 +++-
 files changed, 19 insertions(+), 3 deletions(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/linux-user/aarch64/target_prctl.h
-+++ b/target/arm/helper-mve.h
++++ b/linux-user/aarch64/target_prctl.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vcadd270b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_set_tagged_addr_ctrl(CPUArchState *env, abi_long arg2)
- DEF_HELPER_FLAGS_4(mve_vcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     env->tagged_addr_enable = arg2 & PR_TAGGED_ADDR_ENABLE;
- DEF_HELPER_FLAGS_4(mve_vcadd270w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     if (cpu_isar_feature(aa64_mte, cpu)) {
-+DEF_HELPER_FLAGS_4(mve_vhcadd90b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-        switch (arg2 & PR_MTE_TCF_MASK) {
-+DEF_HELPER_FLAGS_4(mve_vhcadd90h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-        case PR_MTE_TCF_NONE:
-+DEF_HELPER_FLAGS_4(mve_vhcadd90w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-        case PR_MTE_TCF_SYNC:
-+
+-        case PR_MTE_TCF_ASYNC:
-+DEF_HELPER_FLAGS_4(mve_vhcadd270b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-            break;
-+DEF_HELPER_FLAGS_4(mve_vhcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-        default:
-+DEF_HELPER_FLAGS_4(mve_vhcadd270w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-            return -EINVAL;
-+
+-        }
- DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-
- DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+         /*
- DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+          * Write PR_MTE_TCF to SCTLR_EL1[TCF0].
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+-         * Note that the syscall values are consistent with hw.
-index XXXXXXX..XXXXXXX 100644
++         *
---- a/target/arm/mve.decode
++         * The kernel has a per-cpu configuration for the sysadmin,
-+++ b/target/arm/mve.decode
++         * /sys/devices/system/cpu/cpu<N>/mte_tcf_preferred,
-@@ -XXX,XX +XXX,XX @@ VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
++         * which qemu does not implement.
- VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
++         *
- VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
++         * Because there is no performance difference between the modes, and
++         * because SYNC is most useful for debugging MTE errors, choose SYNC
--VADC             1110 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
++         * as the preferred mode.  With this preference, and the way the API
--VADCI            1110 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
++         * uses only two bits, there is no way for the program to select
-+{
++         * ASYMM mode.
-+  VADC           1110 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
+          */
-+  VADCI          1110 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
+-        env->cp15.sctlr_el[1] =
-+  VHCADD90       1110 1110 0 . .. ... 0 ... 0 1111 . 0 . 0 ... 0 @2op
+-            deposit64(env->cp15.sctlr_el[1], 38, 2, arg2 >> PR_MTE_TCF_SHIFT);
-+  VHCADD270      1110 1110 0 . .. ... 0 ... 1 1111 . 0 . 0 ... 0 @2op
++        unsigned tcf = 0;
-+}
++        if (arg2 & PR_MTE_TCF_SYNC) {
++            tcf = 1;
- {
++        } else if (arg2 & PR_MTE_TCF_ASYNC) {
-   VSBC           1111 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
++            tcf = 2;
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
++        }
-index XXXXXXX..XXXXXXX 100644
++        env->cp15.sctlr_el[1] = deposit64(env->cp15.sctlr_el[1], 38, 2, tcf);
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
+         /*
-@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vsbci)(CPUARMState *env, void *vd, void *vn, void *vm)
+          * Write PR_MTE_TAG to GCR_EL1[Exclude].
  DO_VCADD_ALL(vcadd90, DO_SUB, DO_ADD)
  DO_VCADD_ALL(vcadd270, DO_ADD, DO_SUB)
 +DO_VCADD_ALL(vhcadd90, do_vhsub_s, do_vhadd_s)
 +DO_VCADD_ALL(vhcadd270, do_vhadd_s, do_vhsub_s)
  static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  {
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VRHADD_U, vrhaddu)
  /*
   * VCADD Qd == Qm at size MO_32 is UNPREDICTABLE; we choose not to diagnose
   * so we can reuse the DO_2OP macro. (Our implementation calculates the
 - * "expected" results in this case.)
 + * "expected" results in this case.) Similarly for VHCADD.
   */
  DO_2OP(VCADD90, vcadd90)
  DO_2OP(VCADD270, vcadd270)
 +DO_2OP(VHCADD90, vhcadd90)
 +DO_2OP(VHCADD270, vhcadd270)
  static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
  {
 --
-.20.1
+.34.1

-[PULL 20/57] tcg: Make gen_dup_i32/i64() public as tcg_gen_dup_i32/i64
+[PULL 03/35] target/arm: Fix nregs computation in do_{ld,st}_zpa
-The Arm MVE VDUP implementation would like to be able to emit code to
+From: Richard Henderson <richard.henderson@linaro.org>
 duplicate a byte or halfword value into an i32.  We have code to do
 this already in tcg-op-gvec.c, so all we need to do is make the
 functions global.
-For consistency with other functions made available to the frontends:
+The field is encoded as [0-3], which is convenient for
- * we rename to tcg_gen_dup_*
+indexing our array of function pointers, but the true
- * we expose both the _i32 and _i64 forms
+value is [1-4].  Adjust before calling do_mem_zpa.
  * we provide the #define for a _tl form
-Suggested-by: Richard Henderson <richard.henderson@linaro.org>
+Add an assert, and move the comment re passing ZT to
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+the helper back next to the relevant code.
 Cc: qemu-stable@nongnu.org
 Fixes: 206adacfb8d ("target/arm: Add mte helpers for sve scalar + int loads")
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
 Message-id: 20240207025210.8837-3-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20210617121628.20116-10-peter.maydell@linaro.org
 ---
- include/tcg/tcg-op.h |  8 ++++++++
+ target/arm/tcg/translate-sve.c | 16 ++++++++--------
- include/tcg/tcg.h    |  1 -
+file changed, 8 insertions(+), 8 deletions(-)
  tcg/tcg-op-gvec.c    | 20 ++++++++++----------
 files changed, 18 insertions(+), 11 deletions(-)
-diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/tcg/tcg-op.h
+--- a/target/arm/tcg/translate-sve.c
-+++ b/include/tcg/tcg-op.h
++++ b/target/arm/tcg/translate-sve.c
-@@ -XXX,XX +XXX,XX @@ void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
- void tcg_gen_umax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
+     TCGv_ptr t_pg;
- void tcg_gen_abs_i32(TCGv_i32, TCGv_i32);
+     int desc = 0;
-+/* Replicate a value of size @vece from @in to all the lanes in @out */
+-    /*
-+void tcg_gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in);
+-     * For e.g. LD4, there are not enough arguments to pass all 4
-+
+-     * registers as pointers, so encode the regno into the data field.
- static inline void tcg_gen_discard_i32(TCGv_i32 arg)
+-     * For consistency, do this even for LD1.
- {
+-     */
-     tcg_gen_op1_i32(INDEX_op_discard, arg);
++    assert(mte_n >= 1 && mte_n <= 4);
-@@ -XXX,XX +XXX,XX @@ void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
+     if (s->mte_active[0]) {
- void tcg_gen_umax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
+         int msz = dtype_msz(dtype);
- void tcg_gen_abs_i64(TCGv_i64, TCGv_i64);
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-+/* Replicate a value of size @vece from @in to all the lanes in @out */
+         addr = clean_data_tbi(s, addr);
-+void tcg_gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in);
+     }
-+
- #if TCG_TARGET_REG_BITS == 64
++    /*
- static inline void tcg_gen_discard_i64(TCGv_i64 arg)
++     * For e.g. LD4, there are not enough arguments to pass all 4
- {
++     * registers as pointers, so encode the regno into the data field.
-@@ -XXX,XX +XXX,XX @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
++     * For consistency, do this even for LD1.
- #define tcg_gen_atomic_smax_fetch_tl tcg_gen_atomic_smax_fetch_i64
++     */
- #define tcg_gen_atomic_umax_fetch_tl tcg_gen_atomic_umax_fetch_i64
+     desc = simd_desc(vsz, vsz, zt | desc);
- #define tcg_gen_dup_tl_vec  tcg_gen_dup_i64_vec
+     t_pg = tcg_temp_new_ptr();
-+#define tcg_gen_dup_tl tcg_gen_dup_i64
- #else
+@@ -XXX,XX +XXX,XX @@ static void do_ld_zpa(DisasContext *s, int zt, int pg,
- #define tcg_gen_movi_tl tcg_gen_movi_i32
+      * accessible via the instruction encoding.
- #define tcg_gen_mov_tl tcg_gen_mov_i32
+      */
-@@ -XXX,XX +XXX,XX @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
+     assert(fn != NULL);
- #define tcg_gen_atomic_smax_fetch_tl tcg_gen_atomic_smax_fetch_i32
+-    do_mem_zpa(s, zt, pg, addr, dtype, nreg, false, fn);
- #define tcg_gen_atomic_umax_fetch_tl tcg_gen_atomic_umax_fetch_i32
++    do_mem_zpa(s, zt, pg, addr, dtype, nreg + 1, false, fn);
  #define tcg_gen_dup_tl_vec  tcg_gen_dup_i32_vec
 +#define tcg_gen_dup_tl tcg_gen_dup_i32
  #endif
  #if UINTPTR_MAX == UINT32_MAX
 diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/tcg/tcg.h
 +++ b/include/tcg/tcg.h
@@ -XXX,XX +XXX,XX @@ uint64_t dup_const(unsigned vece, uint64_t c);
          : (qemu_build_not_reached_always(), 0))                    \
       : dup_const(VECE, C))
 -
  /*
   * Memory helpers that will be used by TCG generated code.
   */
 diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tcg/tcg-op-gvec.c
 +++ b/tcg/tcg-op-gvec.c
@@ -XXX,XX +XXX,XX @@ uint64_t (dup_const)(unsigned vece, uint64_t c)
  }
- /* Duplicate IN into OUT as per VECE.  */
+ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a)
--static void gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
+@@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-+void tcg_gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
+     if (nreg == 0) {
- {
+         /* ST1 */
-     switch (vece) {
+         fn = fn_single[s->mte_active[0]][be][msz][esz];
-     case MO_8:
+-        nreg = 1;
-@@ -XXX,XX +XXX,XX @@ static void gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
+     } else {
          /* ST2, ST3, ST4 -- msz == esz, enforced by encoding */
          assert(msz == esz);
          fn = fn_multiple[s->mte_active[0]][be][nreg - 1][msz];
      }
+     assert(fn != NULL);
+-    do_mem_zpa(s, zt, pg, addr, msz_dtype(s, msz), nreg, true, fn);
++    do_mem_zpa(s, zt, pg, addr, msz_dtype(s, msz), nreg + 1, true, fn);
  }
--static void gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in)
+ static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a)
 +void tcg_gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in)
  {
      switch (vece) {
      case MO_8:
@@ -XXX,XX +XXX,XX @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz,
                  && (vece != MO_32 || !check_size_impl(oprsz, 4))) {
                  t_64 = tcg_temp_new_i64();
                  tcg_gen_extu_i32_i64(t_64, in_32);
 -                gen_dup_i64(vece, t_64, t_64);
 +                tcg_gen_dup_i64(vece, t_64, t_64);
              } else {
                  t_32 = tcg_temp_new_i32();
 -                gen_dup_i32(vece, t_32, in_32);
 +                tcg_gen_dup_i32(vece, t_32, in_32);
              }
          } else if (in_64) {
              /* We are given a 64-bit variable input.  */
              t_64 = tcg_temp_new_i64();
 -            gen_dup_i64(vece, t_64, in_64);
 +            tcg_gen_dup_i64(vece, t_64, in_64);
          } else {
              /* We are given a constant input.  */
              /* For 64-bit hosts, use 64-bit constants for "simple" constants
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_2s(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
      } else if (g->fni8 && check_size_impl(oprsz, 8)) {
          TCGv_i64 t64 = tcg_temp_new_i64();
 -        gen_dup_i64(g->vece, t64, c);
 +        tcg_gen_dup_i64(g->vece, t64, c);
          expand_2s_i64(dofs, aofs, oprsz, t64, g->scalar_first, g->fni8);
          tcg_temp_free_i64(t64);
      } else if (g->fni4 && check_size_impl(oprsz, 4)) {
          TCGv_i32 t32 = tcg_temp_new_i32();
          tcg_gen_extrl_i64_i32(t32, c);
 -        gen_dup_i32(g->vece, t32, t32);
 +        tcg_gen_dup_i32(g->vece, t32, t32);
          expand_2s_i32(dofs, aofs, oprsz, t32, g->scalar_first, g->fni4);
          tcg_temp_free_i32(t32);
      } else {
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_ands(unsigned vece, uint32_t dofs, uint32_t aofs,
                         TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
  {
      TCGv_i64 tmp = tcg_temp_new_i64();
 -    gen_dup_i64(vece, tmp, c);
 +    tcg_gen_dup_i64(vece, tmp, c);
      tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_ands);
      tcg_temp_free_i64(tmp);
  }
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_xors(unsigned vece, uint32_t dofs, uint32_t aofs,
                         TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
  {
      TCGv_i64 tmp = tcg_temp_new_i64();
 -    gen_dup_i64(vece, tmp, c);
 +    tcg_gen_dup_i64(vece, tmp, c);
      tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_xors);
      tcg_temp_free_i64(tmp);
  }
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_ors(unsigned vece, uint32_t dofs, uint32_t aofs,
                        TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
  {
      TCGv_i64 tmp = tcg_temp_new_i64();
 -    gen_dup_i64(vece, tmp, c);
 +    tcg_gen_dup_i64(vece, tmp, c);
      tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_ors);
      tcg_temp_free_i64(tmp);
  }
 --
-.20.1
+.34.1

-[PULL 06/57] target/arm: Handle FPU being disabled in FPCXT_NS accesses
+[PULL 04/35] target/arm: Adjust and validate mtedesc sizem1
-If the guest makes an FPCXT_NS access when the FPU is disabled,
+From: Richard Henderson <richard.henderson@linaro.org>
 one of two things happens:
  * if there is no active FP context, then the insn behaves the
    same way as if the FPU was enabled: writes ignored, reads
    same value as FPDSCR_NS
  * if there is an active FP context, then we take a NOCP
    exception
-Add code to the sysreg read/write functions which emits
+When we added SVE_MTEDESC_SHIFT, we effectively limited the
-code to take the NOCP exception in the latter case.
+maximum size of MTEDESC.  Adjust SIZEM1 to consume the remaining
+bits (32 - 10 - 5 - 12 == 5).  Assert that the data to be stored
-At the moment this will never be used, because the NOCP checks in
+fits within the field (expecting 8 * 4 - 1 == 31, exact fit).
 m-nocp.decode happen first, and so the trans functions are never
 called when the FPU is disabled.  The code will be needed when we
 move the sysreg access insns to before the NOCP patterns in the
 following commit.
 Cc: qemu-stable@nongnu.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210618141019.10671-3-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.c | 32 ++++++++++++++++++++++++++++++--
+ target/arm/internals.h         | 2 +-
-file changed, 30 insertions(+), 2 deletions(-)
+ target/arm/tcg/translate-sve.c | 7 ++++---
 files changed, 5 insertions(+), 4 deletions(-)
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
+--- a/target/arm/internals.h
-+++ b/target/arm/translate-vfp.c
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
+@@ -XXX,XX +XXX,XX @@ FIELD(MTEDESC, TBI,   4, 2)
-         lab_end = gen_new_label();
+ FIELD(MTEDESC, TCMA,  6, 2)
-         /* fpInactive case: write is a NOP, so branch to end */
+ FIELD(MTEDESC, WRITE, 8, 1)
-         gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
+ FIELD(MTEDESC, ALIGN, 9, 3)
--        /* !fpInactive: PreserveFPState(), and reads same as FPCXT_S */
+-FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - 12)  /* size - 1 */
-+        /*
++FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - SVE_MTEDESC_SHIFT - 12)  /* size - 1 */
-+         * !fpInactive: if FPU disabled, take NOCP exception;
-+         * otherwise PreserveFPState(), and then FPCXT_NS writes
+ bool mte_probe(CPUARMState *env, uint32_t desc, uint64_t ptr);
-+         * behave the same as FPCXT_S writes.
+ uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra);
-+         */
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
-+        if (s->fp_excp_el) {
+index XXXXXXX..XXXXXXX 100644
-+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+--- a/target/arm/tcg/translate-sve.c
-+                               syn_uncategorized(), s->fp_excp_el);
++++ b/target/arm/tcg/translate-sve.c
-+            /*
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-+             * This was only a conditional exception, so override
+ {
-+             * gen_exception_insn()'s default to DISAS_NORETURN
+     unsigned vsz = vec_full_reg_size(s);
-+             */
+     TCGv_ptr t_pg;
-+            s->base.is_jmp = DISAS_NEXT;
++    uint32_t sizem1;
-+            break;
+     int desc = 0;
-+        }
-         gen_preserve_fp_state(s);
+     assert(mte_n >= 1 && mte_n <= 4);
-         /* fall through */
++    sizem1 = (mte_n << dtype_msz(dtype)) - 1;
-     case ARM_VFP_FPCXT_S:
++    assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
-@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
+     if (s->mte_active[0]) {
-         tcg_gen_br(lab_end);
+-        int msz = dtype_msz(dtype);
+-
-         gen_set_label(lab_active);
+         desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
--        /* !fpInactive: Reads the same as FPCXT_S, but side effects differ */
+         desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
-+        /*
+         desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
-+         * !fpInactive: if FPU disabled, take NOCP exception;
+         desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
-+         * otherwise PreserveFPState(), and then FPCXT_NS
+-        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (mte_n << msz) - 1);
-+         * reads the same as FPCXT_S.
++        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
-+         */
+         desc <<= SVE_MTEDESC_SHIFT;
-+        if (s->fp_excp_el) {
+     } else {
-+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+         addr = clean_data_tbi(s, addr);
 +                               syn_uncategorized(), s->fp_excp_el);
 +            /*
 +             * This was only a conditional exception, so override
 +             * gen_exception_insn()'s default to DISAS_NORETURN
 +             */
 +            s->base.is_jmp = DISAS_NEXT;
 +            break;
 +        }
          gen_preserve_fp_state(s);
          tmp = tcg_temp_new_i32();
          sfpa = tcg_temp_new_i32();
 --
-.20.1
+.34.1

-[PULL 51/57] target/arm: Implement MVE VADC, VSBC
+[PULL 05/35] target/arm: Split out make_svemte_desc
-Implement the MVE VADC and VSBC insns.  These perform an
+From: Richard Henderson <richard.henderson@linaro.org>
 add-with-carry or subtract-with-carry of the 32-bit elements in each
 lane of the input vectors, where the carry-out of each add is the
 carry-in of the next.  The initial carry input is either 1 or is from
 FPSCR.C; the carry out at the end is written back to FPSCR.C.
+Share code that creates mtedesc and embeds within simd_desc.
+Cc: qemu-stable@nongnu.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-5-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-41-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  5 ++++
+ target/arm/tcg/translate-a64.h |  2 ++
- target/arm/mve.decode      |  5 ++++
+ target/arm/tcg/translate-sme.c | 15 +++--------
- target/arm/mve_helper.c    | 52 ++++++++++++++++++++++++++++++++++++++
+ target/arm/tcg/translate-sve.c | 47 ++++++++++++++++++----------------
- target/arm/translate-mve.c | 37 +++++++++++++++++++++++++++
+files changed, 31 insertions(+), 33 deletions(-)
 files changed, 99 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/target/arm/tcg/translate-a64.h
-+++ b/target/arm/helper-mve.h
++++ b/target/arm/tcg/translate-a64.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrhaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
- DEF_HELPER_FLAGS_4(mve_vrhadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ bool sve_access_check(DisasContext *s);
- DEF_HELPER_FLAGS_4(mve_vrhadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ bool sme_enabled_check(DisasContext *s);
+ bool sme_enabled_check_with_svcr(DisasContext *s, unsigned);
-+DEF_HELPER_FLAGS_4(mve_vadc, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs,
-+DEF_HELPER_FLAGS_4(mve_vadci, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++                          uint32_t msz, bool is_write, uint32_t data);
-+DEF_HELPER_FLAGS_4(mve_vsbc, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vsbci, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ /* This function corresponds to CheckStreamingSVEEnabled. */
  static inline bool sme_sm_enabled_check(DisasContext *s)
 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sme.c
 +++ b/target/arm/tcg/translate-sme.c
@@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
      TCGv_ptr t_za, t_pg;
      TCGv_i64 addr;
 -    int svl, desc = 0;
 +    uint32_t desc;
      bool be = s->be_data == MO_BE;
      bool mte = s->mte_active[0];
@@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
      tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz);
      tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
 -    if (mte) {
 -        desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
 -        desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
 -        desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
 -        desc = FIELD_DP32(desc, MTEDESC, WRITE, a->st);
 -        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << a->esz) - 1);
 -        desc <<= SVE_MTEDESC_SHIFT;
 -    } else {
 +    if (!mte) {
          addr = clean_data_tbi(s, addr);
      }
 -    svl = streaming_vec_reg_size(s);
 -    desc = simd_desc(svl, svl, desc);
 +
- DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++    desc = make_svemte_desc(s, streaming_vec_reg_size(s), 1, a->esz, a->st, 0);
- DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+     fns[a->esz][be][a->v][mte][a->st](tcg_env, t_za, t_pg, addr,
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+                                       tcg_constant_i32(desc));
 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/target/arm/tcg/translate-sve.c
-+++ b/target/arm/mve.decode
++++ b/target/arm/tcg/translate-sve.c
-@@ -XXX,XX +XXX,XX @@ VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
+@@ -XXX,XX +XXX,XX @@ static const uint8_t dtype_esz[16] = {
- VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
+, 2, 1, 3
- VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
+ };
-+VADC             1110 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
+-static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-+VSBC             1111 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
+-                       int dtype, uint32_t mte_n, bool is_write,
-+VADCI            1110 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
+-                       gen_helper_gvec_mem *fn)
-+VSBCI            1111 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
++uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs,
 +                          uint32_t msz, bool is_write, uint32_t data)
  {
 -    unsigned vsz = vec_full_reg_size(s);
 -    TCGv_ptr t_pg;
      uint32_t sizem1;
 -    int desc = 0;
 +    uint32_t desc = 0;
 -    assert(mte_n >= 1 && mte_n <= 4);
 -    sizem1 = (mte_n << dtype_msz(dtype)) - 1;
 +    /* Assert all of the data fits, with or without MTE enabled. */
 +    assert(nregs >= 1 && nregs <= 4);
 +    sizem1 = (nregs << msz) - 1;
      assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
 +    assert(data < 1u << SVE_MTEDESC_SHIFT);
 +
- # Vector miscellaneous
+     if (s->mte_active[0]) {
+         desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
- VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
+         desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-index XXXXXXX..XXXXXXX 100644
+         desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
---- a/target/arm/mve_helper.c
+         desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
-+++ b/target/arm/mve_helper.c
+         desc <<= SVE_MTEDESC_SHIFT;
-@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vrshlu, DO_VRSHLU)
+-    } else {
  DO_2OP_S(vrhadds, DO_RHADD_S)
  DO_2OP_U(vrhaddu, DO_RHADD_U)
 +static void do_vadc(CPUARMState *env, uint32_t *d, uint32_t *n, uint32_t *m,
 +                    uint32_t inv, uint32_t carry_in, bool update_flags)
 +{
 +    uint16_t mask = mve_element_mask(env);
 +    unsigned e;
 +
 +    /* If any additions trigger, we will update flags. */
 +    if (mask & 0x1111) {
 +        update_flags = true;
 +    }
-+
++    return simd_desc(vsz, vsz, desc | data);
 +    for (e = 0; e < 16 / 4; e++, mask >>= 4) {
 +        uint64_t r = carry_in;
 +        r += n[H4(e)];
 +        r += m[H4(e)] ^ inv;
 +        if (mask & 1) {
 +            carry_in = r >> 32;
 +        }
 +        mergemask(&d[H4(e)], r, mask);
 +    }
 +
 +    if (update_flags) {
 +        /* Store C, clear NZV. */
 +        env->vfp.xregs[ARM_VFP_FPSCR] &= ~FPCR_NZCV_MASK;
 +        env->vfp.xregs[ARM_VFP_FPSCR] |= carry_in * FPCR_C;
 +    }
 +    mve_advance_vpt(env);
 +}
 +
-+void HELPER(mve_vadc)(CPUARMState *env, void *vd, void *vn, void *vm)
++static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
 +                       int dtype, uint32_t nregs, bool is_write,
 +                       gen_helper_gvec_mem *fn)
 +{
-+    bool carry_in = env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_C;
++    TCGv_ptr t_pg;
-+    do_vadc(env, vd, vn, vm, 0, carry_in, false);
++    uint32_t desc;
 +}
 +
-+void HELPER(mve_vsbc)(CPUARMState *env, void *vd, void *vn, void *vm)
++    if (!s->mte_active[0]) {
-+{
+         addr = clean_data_tbi(s, addr);
-+    bool carry_in = env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_C;
+     }
-+    do_vadc(env, vd, vn, vm, -1, carry_in, false);
-+}
+@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
       * registers as pointers, so encode the regno into the data field.
       * For consistency, do this even for LD1.
       */
 -    desc = simd_desc(vsz, vsz, zt | desc);
 +    desc = make_svemte_desc(s, vec_full_reg_size(s), nregs,
 +                            dtype_msz(dtype), is_write, zt);
      t_pg = tcg_temp_new_ptr();
      tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
@@ -XXX,XX +XXX,XX @@ static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm,
                         int scale, TCGv_i64 scalar, int msz, bool is_write,
                         gen_helper_gvec_mem_scatter *fn)
  {
 -    unsigned vsz = vec_full_reg_size(s);
      TCGv_ptr t_zm = tcg_temp_new_ptr();
      TCGv_ptr t_pg = tcg_temp_new_ptr();
      TCGv_ptr t_zt = tcg_temp_new_ptr();
 -    int desc = 0;
 -
 -    if (s->mte_active[0]) {
 -        desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
 -        desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
 -        desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
 -        desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
 -        desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << msz) - 1);
 -        desc <<= SVE_MTEDESC_SHIFT;
 -    }
 -    desc = simd_desc(vsz, vsz, desc | scale);
 +    uint32_t desc;
      tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
      tcg_gen_addi_ptr(t_zm, tcg_env, vec_full_reg_offset(s, zm));
      tcg_gen_addi_ptr(t_zt, tcg_env, vec_full_reg_offset(s, zt));
 +
-+
++    desc = make_svemte_desc(s, vec_full_reg_size(s), 1, msz, is_write, scale);
-+void HELPER(mve_vadci)(CPUARMState *env, void *vd, void *vn, void *vm)
+     fn(tcg_env, t_zt, t_pg, t_zm, scalar, tcg_constant_i32(desc));
 +{
 +    do_vadc(env, vd, vn, vm, 0, 0, true);
 +}
 +
 +void HELPER(mve_vsbci)(CPUARMState *env, void *vd, void *vn, void *vm)
 +{
 +    do_vadc(env, vd, vn, vm, -1, 1, true);
 +}
 +
  static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  {
      if (val > max) {
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMULLT(DisasContext *s, arg_2op *a)
      return do_2op(s, a, fns[a->size]);
  }
-+/*
-+ * VADC and VSBC: these perform an add-with-carry or subtract-with-carry
-+ * of the 32-bit elements in each lane of the input vectors, where the
-+ * carry-out of each add is the carry-in of the next.  The initial carry
-+ * input is either fixed (0 for VADCI, 1 for VSBCI) or is from FPSCR.C
-+ * (for VADC and VSBC); the carry out at the end is written back to FPSCR.C.
-+ * These insns are subject to beat-wise execution.  Partial execution
-+ * of an I=1 (initial carry input fixed) insn which does not
-+ * execute the first beat must start with the current FPSCR.NZCV
-+ * value, not the fixed constant input.
-+ */
-+static bool trans_VADC(DisasContext *s, arg_2op *a)
-+{
-+    return do_2op(s, a, gen_helper_mve_vadc);
-+}
-+
-+static bool trans_VADCI(DisasContext *s, arg_2op *a)
-+{
-+    if (mve_skip_first_beat(s)) {
-+        return trans_VADC(s, a);
-+    }
-+    return do_2op(s, a, gen_helper_mve_vadci);
-+}
-+
-+static bool trans_VSBC(DisasContext *s, arg_2op *a)
-+{
-+    return do_2op(s, a, gen_helper_mve_vsbc);
-+}
-+
-+static bool trans_VSBCI(DisasContext *s, arg_2op *a)
-+{
-+    if (mve_skip_first_beat(s)) {
-+        return trans_VSBC(s, a);
-+    }
-+    return do_2op(s, a, gen_helper_mve_vsbci);
-+}
-+
- static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
-                           MVEGenTwoOpScalarFn fn)
- {
 --
-.20.1
+.34.1

-[PULL 52/57] target/arm: Implement MVE VCADD
+[PULL 06/35] target/arm: Handle mte in do_ldrq, do_ldro
-Implement the MVE VCADD insn, which performs a complex add with
+From: Richard Henderson <richard.henderson@linaro.org>
 rotate.  Note that the size=0b11 encoding is VSBC.
-The architecture grants some leeway for the "destination and Vm
+These functions "use the standard load helpers", but
-source overlap" case for the size MO_32 case, but we choose not to
+fail to clean_data_tbi or populate mtedesc.
 make use of it, instead always calculating all 16 bytes worth of
 results before setting the destination register.
+Cc: qemu-stable@nongnu.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-6-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-42-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  8 ++++++++
+ target/arm/tcg/translate-sve.c | 15 +++++++++++++--
- target/arm/mve.decode      |  9 +++++++--
+file changed, 13 insertions(+), 2 deletions(-)
  target/arm/mve_helper.c    | 29 +++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  7 +++++++
 files changed, 51 insertions(+), 2 deletions(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/target/arm/tcg/translate-sve.c
-+++ b/target/arm/helper-mve.h
++++ b/target/arm/tcg/translate-sve.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vadci, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
- DEF_HELPER_FLAGS_4(mve_vsbc, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     unsigned vsz = vec_full_reg_size(s);
- DEF_HELPER_FLAGS_4(mve_vsbci, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     TCGv_ptr t_pg;
+     int poff;
-+DEF_HELPER_FLAGS_4(mve_vcadd90b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++    uint32_t desc;
-+DEF_HELPER_FLAGS_4(mve_vcadd90h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vcadd90w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     /* Load the first quadword using the normal predicated load helpers.  */
-+
++    if (!s->mte_active[0]) {
-+DEF_HELPER_FLAGS_4(mve_vcadd270b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++        addr = clean_data_tbi(s, addr);
 +DEF_HELPER_FLAGS_4(mve_vcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vcadd270w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
  VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
  VADC             1110 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
 -VSBC             1111 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
  VADCI            1110 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
 -VSBCI            1111 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
 +
 +{
 +  VSBC           1111 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
 +  VSBCI          1111 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
 +  VCADD90        1111 1110 0 . .. ... 0 ... 0 1111 . 0 . 0 ... 0 @2op
 +  VCADD270       1111 1110 0 . .. ... 0 ... 1 1111 . 0 . 0 ... 0 @2op
 +}
  # Vector miscellaneous
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vsbci)(CPUARMState *env, void *vd, void *vn, void *vm)
      do_vadc(env, vd, vn, vm, -1, 1, true);
  }
 +#define DO_VCADD(OP, ESIZE, TYPE, FN0, FN1)                             \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \
 +    {                                                                   \
 +        TYPE *d = vd, *n = vn, *m = vm;                                 \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        TYPE r[16 / ESIZE];                                             \
 +        /* Calculate all results first to avoid overwriting inputs */   \
 +        for (e = 0; e < 16 / ESIZE; e++) {                              \
 +            if (!(e & 1)) {                                             \
 +                r[e] = FN0(n[H##ESIZE(e)], m[H##ESIZE(e + 1)]);         \
 +            } else {                                                    \
 +                r[e] = FN1(n[H##ESIZE(e)], m[H##ESIZE(e - 1)]);         \
 +            }                                                           \
 +        }                                                               \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            mergemask(&d[H##ESIZE(e)], r[e], mask);                     \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-+#define DO_VCADD_ALL(OP, FN0, FN1)              \
+     poff = pred_full_reg_offset(s, pg);
-+    DO_VCADD(OP##b, 1, int8_t, FN0, FN1)        \
+     if (vsz > 16) {
-+    DO_VCADD(OP##h, 2, int16_t, FN0, FN1)       \
+         /*
-+    DO_VCADD(OP##w, 4, int32_t, FN0, FN1)
+@@ -XXX,XX +XXX,XX @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
-+
-+DO_VCADD_ALL(vcadd90, DO_SUB, DO_ADD)
+     gen_helper_gvec_mem *fn
-+DO_VCADD_ALL(vcadd270, DO_ADD, DO_SUB)
+         = ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
-+
+-    fn(tcg_env, t_pg, addr, tcg_constant_i32(simd_desc(16, 16, zt)));
- static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
++    desc = make_svemte_desc(s, 16, 1, dtype_msz(dtype), false, zt);
- {
++    fn(tcg_env, t_pg, addr, tcg_constant_i32(desc));
-     if (val > max) {
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+     /* Replicate that first quadword.  */
-index XXXXXXX..XXXXXXX 100644
+     if (vsz > 16) {
---- a/target/arm/translate-mve.c
+@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
-+++ b/target/arm/translate-mve.c
+     unsigned vsz_r32;
-@@ -XXX,XX +XXX,XX @@ DO_2OP(VQRDMLSDH, vqrdmlsdh)
+     TCGv_ptr t_pg;
- DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
+     int poff, doff;
- DO_2OP(VRHADD_S, vrhadds)
++    uint32_t desc;
- DO_2OP(VRHADD_U, vrhaddu)
-+/*
+     if (vsz < 32) {
-+ * VCADD Qd == Qm at size MO_32 is UNPREDICTABLE; we choose not to diagnose
+         /*
-+ * so we can reuse the DO_2OP macro. (Our implementation calculates the
+@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
-+ * "expected" results in this case.)
+     }
-+ */
-+DO_2OP(VCADD90, vcadd90)
+     /* Load the first octaword using the normal predicated load helpers.  */
-+DO_2OP(VCADD270, vcadd270)
++    if (!s->mte_active[0]) {
++        addr = clean_data_tbi(s, addr);
- static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
++    }
- {
      poff = pred_full_reg_offset(s, pg);
      if (vsz > 32) {
@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
      gen_helper_gvec_mem *fn
          = ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
 -    fn(tcg_env, t_pg, addr, tcg_constant_i32(simd_desc(32, 32, zt)));
 +    desc = make_svemte_desc(s, 32, 1, dtype_msz(dtype), false, zt);
 +    fn(tcg_env, t_pg, addr, tcg_constant_i32(desc));
      /*
       * Replicate that first octaword.
 --
-.20.1
+.34.1

-[PULL 10/57] target/arm: Split vfp_access_check() into A and M versions
+[PULL 07/35] target/arm: Fix SVE/SME gross MTE suppression checks
-vfp_access_check and its helper routine full_vfp_access_check() has
+From: Richard Henderson <richard.henderson@linaro.org>
 gradually grown and is now an awkward mix of A-profile only and
 M-profile only pieces.  Refactor it into an A-profile only and an
 M-profile only version, taking advantage of the fact that now the
 only direct call to full_vfp_access_check() is in A-profile-only
 code.
+The TBI and TCMA bits are located within mtedesc, not desc.
+Cc: qemu-stable@nongnu.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
+Message-id: 20240207025210.8837-7-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210618141019.10671-7-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.c | 79 +++++++++++++++++++++++---------------
+ target/arm/tcg/sme_helper.c |  8 ++++----
-file changed, 48 insertions(+), 31 deletions(-)
+ target/arm/tcg/sve_helper.c | 12 ++++++------
 files changed, 10 insertions(+), 10 deletions(-)
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
+--- a/target/arm/tcg/sme_helper.c
-+++ b/target/arm/translate-vfp.c
++++ b/target/arm/tcg/sme_helper.c
-@@ -XXX,XX +XXX,XX @@ static void gen_update_fp_context(DisasContext *s)
+@@ -XXX,XX +XXX,XX @@ void sme_ld1_mte(CPUARMState *env, void *za, uint64_t *vg,
- }
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
- /*
+     /* Perform gross MTE suppression early. */
-- * Check that VFP access is enabled. If it is, do the necessary
+-    if (!tbi_check(desc, bit55) ||
-- * M-profile lazy-FP handling and then return true.
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
-- * If not, emit code to generate an appropriate exception and
++    if (!tbi_check(mtedesc, bit55) ||
-- * return false.
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
-+ * Check that VFP access is enabled, A-profile specific version.
+         mtedesc = 0;
 + *
 + * If VFP is enabled, return true. If not, emit code to generate an
 + * appropriate exception and return false.
   * The ignore_vfp_enabled argument specifies that we should ignore
 - * whether VFP is enabled via FPEXC[EN]: this should be true for FMXR/FMRX
 + * whether VFP is enabled via FPEXC.EN: this should be true for FMXR/FMRX
   * accesses to FPSID, FPEXC, MVFR0, MVFR1, MVFR2, and false for all other insns.
   */
 -static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
 +static bool vfp_access_check_a(DisasContext *s, bool ignore_vfp_enabled)
  {
      if (s->fp_excp_el) {
 -        if (arm_dc_feature(s, ARM_FEATURE_M)) {
 -            /*
 -             * M-profile mostly catches the "FPU disabled" case early, in
 -             * disas_m_nocp(), but a few insns (eg LCTP, WLSTP, DLSTP)
 -             * which do coprocessor-checks are outside the large ranges of
 -             * the encoding space handled by the patterns in m-nocp.decode,
 -             * and for them we may need to raise NOCP here.
 -             */
 -            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 -                               syn_uncategorized(), s->fp_excp_el);
 -        } else {
 -            gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
 -                               syn_fp_access_trap(1, 0xe, false),
 -                               s->fp_excp_el);
 -        }
 +        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
 +                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
          return false;
      }
-@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+@@ -XXX,XX +XXX,XX @@ void sme_st1_mte(CPUARMState *env, void *za, uint64_t *vg, target_ulong addr,
-         unallocated_encoding(s);
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
-         return false;
      /* Perform gross MTE suppression early. */
 -    if (!tbi_check(desc, bit55) ||
 -        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
 +    if (!tbi_check(mtedesc, bit55) ||
 +        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
          mtedesc = 0;
      }
-+    return true;
-+}
+diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
+index XXXXXXX..XXXXXXX 100644
--    if (arm_dc_feature(s, ARM_FEATURE_M)) {
+--- a/target/arm/tcg/sve_helper.c
--        /* Handle M-profile lazy FP state mechanics */
++++ b/target/arm/tcg/sve_helper.c
--
+@@ -XXX,XX +XXX,XX @@ void sve_ldN_r_mte(CPUARMState *env, uint64_t *vg, target_ulong addr,
--        /* Trigger lazy-state preservation if necessary */
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
--        gen_preserve_fp_state(s);
--
+     /* Perform gross MTE suppression early. */
--        /* Update ownership of FP context and create new FP context if needed */
+-    if (!tbi_check(desc, bit55) ||
--        gen_update_fp_context(s);
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
-+/*
++    if (!tbi_check(mtedesc, bit55) ||
-+ * Check that VFP access is enabled, M-profile specific version.
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
-+ *
+         mtedesc = 0;
 + * If VFP is enabled, do the necessary M-profile lazy-FP handling and then
 + * return true. If not, emit code to generate an appropriate exception and
 + * return false.
 + */
 +static bool vfp_access_check_m(DisasContext *s)
 +{
 +    if (s->fp_excp_el) {
 +        /*
 +         * M-profile mostly catches the "FPU disabled" case early, in
 +         * disas_m_nocp(), but a few insns (eg LCTP, WLSTP, DLSTP)
 +         * which do coprocessor-checks are outside the large ranges of
 +         * the encoding space handled by the patterns in m-nocp.decode,
 +         * and for them we may need to raise NOCP here.
 +         */
 +        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 +                           syn_uncategorized(), s->fp_excp_el);
 +        return false;
      }
-+    /* Handle M-profile lazy FP state mechanics */
+@@ -XXX,XX +XXX,XX @@ void sve_ldnfff1_r_mte(CPUARMState *env, void *vg, target_ulong addr,
-+
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
-+    /* Trigger lazy-state preservation if necessary */
-+    gen_preserve_fp_state(s);
+     /* Perform gross MTE suppression early. */
-+
+-    if (!tbi_check(desc, bit55) ||
-+    /* Update ownership of FP context and create new FP context if needed */
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
-+    gen_update_fp_context(s);
++    if (!tbi_check(mtedesc, bit55) ||
-+
++        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
-     return true;
+         mtedesc = 0;
  }
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
   */
  bool vfp_access_check(DisasContext *s)
  {
 -    return full_vfp_access_check(s, false);
 +    if (arm_dc_feature(s, ARM_FEATURE_M)) {
 +        return vfp_access_check_m(s);
 +    } else {
 +        return vfp_access_check_a(s, false);
 +    }
  }
  static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
          return false;
      }
--    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
+@@ -XXX,XX +XXX,XX @@ void sve_stN_r_mte(CPUARMState *env, uint64_t *vg, target_ulong addr,
-+    /*
+     desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
-+     * Call vfp_access_check_a() directly, because we need to tell
-+     * it to ignore FPEXC.EN for some register accesses.
+     /* Perform gross MTE suppression early. */
-+     */
+-    if (!tbi_check(desc, bit55) ||
-+    if (!vfp_access_check_a(s, ignore_vfp_enabled)) {
+-        tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
-         return true;
++    if (!tbi_check(mtedesc, bit55) ||
 +        tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
          mtedesc = 0;
      }
 --
-.20.1
+.34.1

-[PULL 04/57] docs/system/arm: Document which architecture extensions we emulate
+[PULL 08/35] hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
-These days the Arm architecture has a wide range of fine-grained
+The raven_io_ops MemoryRegionOps is the only one in the source tree
-optional extra architectural features. We implement quite a lot
+which sets .valid.unaligned to indicate that it should support
-of these but by no means all of them. Document what we do implement,
+unaligned accesses and which does not also set .impl.unaligned to
-so that users can find out without having to dig through back-issues
+indicate that its read and write functions can do the unaligned
-of our Changelog on the wiki.
+handling themselves.  This is a problem, because at the moment the
 core memory system does not implement the support for handling
 unaligned accesses by doing a series of aligned accesses and
 combining them (system/memory.c:access_with_adjusted_size() has a
 TODO comment noting this).
+Fortunately raven_io_read() and raven_io_write() will correctly deal
+with the case of being passed an unaligned address, so we can fix the
+missing unaligned access support by setting .impl.unaligned in the
+MemoryRegionOps struct.
+Fixes: 9a1839164c9c8f06 ("raven: Implement non-contiguous I/O region")
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Tested-by: Cédric Le Goater <clg@redhat.com>
-Message-id: 20210617140328.28622-1-peter.maydell@linaro.org
+Reviewed-by: Cédric Le Goater <clg@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240112134640.1775041-1-peter.maydell@linaro.org
 ---
- docs/system/arm/emulation.rst | 102 ++++++++++++++++++++++++++++++++++
+ hw/pci-host/raven.c | 1 +
- docs/system/target-arm.rst    |   6 ++
+file changed, 1 insertion(+)
 files changed, 108 insertions(+)
  create mode 100644 docs/system/arm/emulation.rst
-diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
+diff --git a/hw/pci-host/raven.c b/hw/pci-host/raven.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@
 +A-profile CPU architecture support
 +==================================
 +
 +QEMU's TCG emulation includes support for the Armv5, Armv6, Armv7 and
 +Armv8 versions of the A-profile architecture. It also has support for
 +the following architecture extensions:
 +
 +- FEAT_AA32BF16 (AArch32 BFloat16 instructions)
 +- FEAT_AA32HPD (AArch32 hierarchical permission disables)
 +- FEAT_AA32I8MM (AArch32 Int8 matrix multiplication instructions)
 +- FEAT_AES (AESD and AESE instructions)
 +- FEAT_BF16 (AArch64 BFloat16 instructions)
 +- FEAT_BTI (Branch Target Identification)
 +- FEAT_DIT (Data Independent Timing instructions)
 +- FEAT_DPB (DC CVAP instruction)
 +- FEAT_DotProd (Advanced SIMD dot product instructions)
 +- FEAT_FCMA (Floating-point complex number instructions)
 +- FEAT_FHM (Floating-point half-precision multiplication instructions)
 +- FEAT_FP16 (Half-precision floating-point data processing)
 +- FEAT_FRINTTS (Floating-point to integer instructions)
 +- FEAT_FlagM (Flag manipulation instructions v2)
 +- FEAT_FlagM2 (Enhancements to flag manipulation instructions)
 +- FEAT_HPDS (Hierarchical permission disables)
 +- FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
 +- FEAT_JSCVT (JavaScript conversion instructions)
 +- FEAT_LOR (Limited ordering regions)
 +- FEAT_LRCPC (Load-acquire RCpc instructions)
 +- FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
 +- FEAT_LSE (Large System Extensions)
 +- FEAT_MTE (Memory Tagging Extension)
 +- FEAT_MTE2 (Memory Tagging Extension)
 +- FEAT_PAN (Privileged access never)
 +- FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN)
 +- FEAT_PAuth (Pointer authentication)
 +- FEAT_PMULL (PMULL, PMULL2 instructions)
 +- FEAT_PMUv3p1 (PMU Extensions v3.1)
 +- FEAT_PMUv3p4 (PMU Extensions v3.4)
 +- FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
 +- FEAT_RNG (Random number generator)
 +- FEAT_SB (Speculation Barrier)
 +- FEAT_SEL2 (Secure EL2)
 +- FEAT_SHA1 (SHA1 instructions)
 +- FEAT_SHA256 (SHA256 instructions)
 +- FEAT_SHA3 (Advanced SIMD SHA3 instructions)
 +- FEAT_SHA512 (Advanced SIMD SHA512 instructions)
 +- FEAT_SM3 (Advanced SIMD SM3 instructions)
 +- FEAT_SM4 (Advanced SIMD SM4 instructions)
 +- FEAT_SPECRES (Speculation restriction instructions)
 +- FEAT_SSBS (Speculative Store Bypass Safe)
 +- FEAT_TLBIOS (TLB invalidate instructions in Outer Shareable domain)
 +- FEAT_TLBIRANGE (TLB invalidate range instructions)
 +- FEAT_TTCNP (Translation table Common not private translations)
 +- FEAT_TTST (Small translation tables)
 +- FEAT_UAO (Unprivileged Access Override control)
 +- FEAT_VHE (Virtualization Host Extensions)
 +- FEAT_VMID16 (16-bit VMID)
 +- FEAT_XNX (Translation table stage 2 Unprivileged Execute-never)
 +- SVE (The Scalable Vector Extension)
 +- SVE2 (The Scalable Vector Extension v2)
 +
 +For information on the specifics of these extensions, please refer
 +to the `Armv8-A Arm Architecture Reference Manual
 +<https://developer.arm.com/documentation/ddi0487/latest>`_.
 +
 +When a specific named CPU is being emulated, only those features which
 +are present in hardware for that CPU are emulated. (If a feature is
 +not in the list above then it is not supported, even if the real
 +hardware should have it.) The ``max`` CPU enables all features.
 +
 +R-profile CPU architecture support
 +==================================
 +
 +QEMU's TCG emulation support for R-profile CPUs is currently limited.
 +We emulate only the Cortex-R5 and Cortex-R5F CPUs.
 +
 +M-profile CPU architecture support
 +==================================
 +
 +QEMU's TCG emulation includes support for Armv6-M, Armv7-M, Armv8-M, and
 +Armv8.1-M versions of the M-profile architucture.  It also has support
 +for the following architecture extensions:
 +
 +- FP (Floating-point Extension)
 +- FPCXT (FPCXT access instructions)
 +- HP (Half-precision floating-point instructions)
 +- LOB (Low Overhead loops and Branch future)
 +- M (Main Extension)
 +- MPU (Memory Protection Unit Extension)
 +- PXN (Privileged Execute Never)
 +- RAS (Reliability, Serviceability and Availability): "minimum RAS Extension" only
 +- S (Security Extension)
 +- ST (System Timer Extension)
 +
 +For information on the specifics of these extensions, please refer
 +to the `Armv8-M Arm Architecture Reference Manual
 +<https://developer.arm.com/documentation/ddi0553/latest>`_.
 +
 +When a specific named CPU is being emulated, only those features which
 +are present in hardware for that CPU are emulated. (If a feature is
 +not in the list above then it is not supported, even if the real
 +hardware should have it.) There is no equivalent of the ``max`` CPU for
 +M-profile.
 diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
 index XXXXXXX..XXXXXXX 100644
---- a/docs/system/target-arm.rst
+--- a/hw/pci-host/raven.c
-+++ b/docs/system/target-arm.rst
++++ b/hw/pci-host/raven.c
-@@ -XXX,XX +XXX,XX @@ undocumented; you can get a complete list by running
+@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps raven_io_ops = {
-    arm/virt
+     .write = raven_io_write,
-    arm/xlnx-versal-virt
+     .endianness = DEVICE_LITTLE_ENDIAN,
+     .impl.max_access_size = 4,
-+Emulated CPU architecture support
++    .impl.unaligned = true,
-+=================================
+     .valid.unaligned = true,
-+
+ };
 +.. toctree::
 +   arm/emulation
 +
  Arm CPU features
  ================
 --
-.20.1
+.34.1

-[PULL 50/57] target/arm: Implement MVE VRHADD
+[PULL 09/35] hw/block/tc58128: Don't emit deprecation warning under qtest
-Implement the MVE VRHADD insn, which performs a rounded halving
+Suppress the deprecation warning when we're running under qtest,
-addition.
+to avoid "make check" including warning messages in its output.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20210617121628.20116-40-peter.maydell@linaro.org
+Message-id: 20240206154151.155620-1-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 8 ++++++++
+ hw/block/tc58128.c | 4 +++-
- target/arm/mve.decode      | 3 +++
+file changed, 3 insertions(+), 1 deletion(-)
  target/arm/mve_helper.c    | 6 ++++++
  target/arm/translate-mve.c | 2 ++
 files changed, 19 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/block/tc58128.c b/hw/block/tc58128.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/block/tc58128.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/block/tc58128.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqdmullbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ static sh7750_io_device tc58128 = {
- DEF_HELPER_FLAGS_4(mve_vqdmullth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vqdmulltw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ int tc58128_init(struct SH7750State *s, const char *zone1, const char *zone2)
 +DEF_HELPER_FLAGS_4(mve_vrhaddsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vrhaddsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vrhaddsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vrhaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vrhadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vrhadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQRDMLSDHX       1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
  VQDMULLB         111 . 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 1 @2op_sz28
  VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
 +VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 +VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vshlu, DO_VSHLU)
  DO_2OP_S(vrshls, DO_VRSHLS)
  DO_2OP_U(vrshlu, DO_VRSHLU)
 +#define DO_RHADD_S(N, M) (((int64_t)(N) + (M) + 1) >> 1)
 +#define DO_RHADD_U(N, M) (((uint64_t)(N) + (M) + 1) >> 1)
 +
 +DO_2OP_S(vrhadds, DO_RHADD_S)
 +DO_2OP_U(vrhaddu, DO_RHADD_U)
 +
  static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  {
-     if (val > max) {
+-    warn_report_once("The TC58128 flash device is deprecated");
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
++    if (!qtest_enabled()) {
-index XXXXXXX..XXXXXXX 100644
++        warn_report_once("The TC58128 flash device is deprecated");
---- a/target/arm/translate-mve.c
++    }
-+++ b/target/arm/translate-mve.c
+     init_dev(&tc58128_devs[0], zone1);
-@@ -XXX,XX +XXX,XX @@ DO_2OP(VQDMLSDH, vqdmlsdh)
+     init_dev(&tc58128_devs[1], zone2);
- DO_2OP(VQDMLSDHX, vqdmlsdhx)
+     return sh7750_register_io_device(s, &tc58128);
  DO_2OP(VQRDMLSDH, vqrdmlsdh)
  DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
 +DO_2OP(VRHADD_S, vrhadds)
 +DO_2OP(VRHADD_U, vrhaddu)
  static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
  {
 --
-.20.1
+.34.1

-[PULL 49/57] target/arm: Implement MVE VQDMULL (vector)
+[PULL 10/35] tests/qtest/meson.build: Don't include qtests_npcm7xx in qtests_aarch64
-Implement the vector form of the MVE VQDMULL insn.
+We deliberately don't include qtests_npcm7xx in qtests_aarch64,
 because we already get the coverage of those tests via qtests_arm,
 and we don't want to use extra CI minutes testing them twice.
+In commit 327b680877b79c4b we added it to qtests_aarch64; revert
+that change.
+Fixes: 327b680877b79c4b ("tests/qtest: Creating qtest for GMAC Module")
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20210617121628.20116-39-peter.maydell@linaro.org
+Message-id: 20240206163043.315535-1-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  5 +++++
+ tests/qtest/meson.build | 1 -
- target/arm/mve.decode      |  5 +++++
+file changed, 1 deletion(-)
  target/arm/mve_helper.c    | 30 ++++++++++++++++++++++++++++++
  target/arm/translate-mve.c | 30 ++++++++++++++++++++++++++++++
 files changed, 70 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/tests/qtest/meson.build
-+++ b/target/arm/helper-mve.h
++++ b/tests/qtest/meson.build
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ qtests_aarch64 = \
- DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+   (config_all_devices.has_key('CONFIG_RASPI') ? ['bcm2835-dma-test'] : []) +  \
- DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+   (config_all_accel.has_key('CONFIG_TCG') and                                            \
+    config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : []) + \
-+DEF_HELPER_FLAGS_4(mve_vqdmullbh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-  (config_all_devices.has_key('CONFIG_NPCM7XX') ? qtests_npcm7xx : []) + \
-+DEF_HELPER_FLAGS_4(mve_vqdmullbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+   ['arm-cpu-features',
-+DEF_HELPER_FLAGS_4(mve_vqdmullth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+    'numa-test',
-+DEF_HELPER_FLAGS_4(mve_vqdmulltw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+    'boot-serial-test',
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
  @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
  @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
 +@2op_sz28 .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn \
 +     size=%size_28
  # The _rev suffix indicates that Vn and Vm are reversed. This is
  # the case for shifts. In the Arm ARM these insns are documented
@@ -XXX,XX +XXX,XX @@ VQDMLSDHX        1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
  VQRDMLSDH        1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
  VQRDMLSDHX       1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
 +VQDMULLB         111 . 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 1 @2op_sz28
 +VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR_L(vqdmullt_scalarh, 1, 2, int16_t, 4, int32_t, \
  DO_2OP_SAT_SCALAR_L(vqdmullt_scalarw, 1, 4, int32_t, 8, int64_t, \
                      do_qdmullw, SATMASK32)
 +/*
 + * Long saturating ops
 + */
 +#define DO_2OP_SAT_L(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN, SATMASK)  \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
 +                                void *vm)                               \
 +    {                                                                   \
 +        LTYPE *d = vd;                                                  \
 +        TYPE *n = vn, *m = vm;                                          \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned le;                                                    \
 +        bool qc = false;                                                \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
 +            bool sat = false;                                           \
 +            LTYPE op1 = n[H##ESIZE(le * 2 + TOP)];                      \
 +            LTYPE op2 = m[H##ESIZE(le * 2 + TOP)];                      \
 +            mergemask(&d[H##LESIZE(le)], FN(op1, op2, &sat), mask);     \
 +            qc |= sat && (mask & SATMASK);                              \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +DO_2OP_SAT_L(vqdmullbh, 0, 2, int16_t, 4, int32_t, do_qdmullh, SATMASK16B)
 +DO_2OP_SAT_L(vqdmullbw, 0, 4, int32_t, 8, int64_t, do_qdmullw, SATMASK32)
 +DO_2OP_SAT_L(vqdmullth, 1, 2, int16_t, 4, int32_t, do_qdmullh, SATMASK16T)
 +DO_2OP_SAT_L(vqdmulltw, 1, 4, int32_t, 8, int64_t, do_qdmullw, SATMASK32)
 +
  static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
  {
      m &= 0xff;
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQDMLSDHX, vqdmlsdhx)
  DO_2OP(VQRDMLSDH, vqrdmlsdh)
  DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
 +static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
 +{
 +    static MVEGenTwoOpFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vqdmullbh,
 +        gen_helper_mve_vqdmullbw,
 +        NULL,
 +    };
 +    if (a->size == MO_32 && (a->qd == a->qm || a->qd == a->qn)) {
 +        /* UNPREDICTABLE; we choose to undef */
 +        return false;
 +    }
 +    return do_2op(s, a, fns[a->size]);
 +}
 +
 +static bool trans_VQDMULLT(DisasContext *s, arg_2op *a)
 +{
 +    static MVEGenTwoOpFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vqdmullth,
 +        gen_helper_mve_vqdmulltw,
 +        NULL,
 +    };
 +    if (a->size == MO_32 && (a->qd == a->qm || a->qd == a->qn)) {
 +        /* UNPREDICTABLE; we choose to undef */
 +        return false;
 +    }
 +    return do_2op(s, a, fns[a->size]);
 +}
 +
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
  {
 --
-.20.1
+.34.1

-[PULL 48/57] target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
+[PULL 11/35] tests/qtest/bios-tables-test: Allow changes to virt GTDT
-Implement the MVE VQDMLSDH and VQRDMLSDH insns, which are
+Allow changes to the virt GTDT -- we are going to add the IRQ
-like VQDMLADH and VQRDMLADH except that products are subtracted
+entry for a new timer to it.
 rather than added.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
-Message-id: 20210617121628.20116-38-peter.maydell@linaro.org
+Message-id: 20240122143537.233498-2-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 16 ++++++++++++++
+ tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
- target/arm/mve.decode      |  5 +++++
+file changed, 2 insertions(+)
  target/arm/mve_helper.c    | 44 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  4 ++++
 files changed, 69 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/tests/qtest/bios-tables-test-allowed-diff.h
-+++ b/target/arm/helper-mve.h
++++ b/tests/qtest/bios-tables-test-allowed-diff.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -1 +1,3 @@
- DEF_HELPER_FLAGS_4(mve_vqrdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ /* List of comma-separated changed AML files to ignore */
- DEF_HELPER_FLAGS_4(mve_vqrdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++"tests/data/acpi/virt/FACP",
++"tests/data/acpi/virt/GTDT",
 +DEF_HELPER_FLAGS_4(mve_vqdmlsdhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmlsdhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmlsdhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vqdmlsdhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmlsdhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmlsdhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQDMLADHX        1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
  VQRDMLADH        1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
  VQRDMLADHX       1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
 +VQDMLSDH         1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
 +VQDMLSDHX        1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
 +VQRDMLSDH        1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
 +VQRDMLSDHX       1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static int32_t do_vqdmladh_w(int32_t a, int32_t b, int32_t c, int32_t d,
      return r >> 32;
  }
 +static int8_t do_vqdmlsdh_b(int8_t a, int8_t b, int8_t c, int8_t d,
 +                            int round, bool *sat)
 +{
 +    int64_t r = ((int64_t)a * b - (int64_t)c * d) * 2 + (round << 7);
 +    return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
 +}
 +
 +static int16_t do_vqdmlsdh_h(int16_t a, int16_t b, int16_t c, int16_t d,
 +                             int round, bool *sat)
 +{
 +    int64_t r = ((int64_t)a * b - (int64_t)c * d) * 2 + (round << 15);
 +    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
 +}
 +
 +static int32_t do_vqdmlsdh_w(int32_t a, int32_t b, int32_t c, int32_t d,
 +                             int round, bool *sat)
 +{
 +    int64_t m1 = (int64_t)a * b;
 +    int64_t m2 = (int64_t)c * d;
 +    int64_t r;
 +    /* The same ordering issue as in do_vqdmladh_w applies here too */
 +    if (ssub64_overflow(m1, m2, &r) ||
 +        sadd64_overflow(r, (round << 30), &r) ||
 +        sadd64_overflow(r, r, &r)) {
 +        *sat = true;
 +        return r < 0 ? INT32_MAX : INT32_MIN;
 +    }
 +    return r >> 32;
 +}
 +
  DO_VQDMLADH_OP(vqdmladhb, 1, int8_t, 0, 0, do_vqdmladh_b)
  DO_VQDMLADH_OP(vqdmladhh, 2, int16_t, 0, 0, do_vqdmladh_h)
  DO_VQDMLADH_OP(vqdmladhw, 4, int32_t, 0, 0, do_vqdmladh_w)
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmladhxb, 1, int8_t, 1, 1, do_vqdmladh_b)
  DO_VQDMLADH_OP(vqrdmladhxh, 2, int16_t, 1, 1, do_vqdmladh_h)
  DO_VQDMLADH_OP(vqrdmladhxw, 4, int32_t, 1, 1, do_vqdmladh_w)
 +DO_VQDMLADH_OP(vqdmlsdhb, 1, int8_t, 0, 0, do_vqdmlsdh_b)
 +DO_VQDMLADH_OP(vqdmlsdhh, 2, int16_t, 0, 0, do_vqdmlsdh_h)
 +DO_VQDMLADH_OP(vqdmlsdhw, 4, int32_t, 0, 0, do_vqdmlsdh_w)
 +DO_VQDMLADH_OP(vqdmlsdhxb, 1, int8_t, 1, 0, do_vqdmlsdh_b)
 +DO_VQDMLADH_OP(vqdmlsdhxh, 2, int16_t, 1, 0, do_vqdmlsdh_h)
 +DO_VQDMLADH_OP(vqdmlsdhxw, 4, int32_t, 1, 0, do_vqdmlsdh_w)
 +
 +DO_VQDMLADH_OP(vqrdmlsdhb, 1, int8_t, 0, 1, do_vqdmlsdh_b)
 +DO_VQDMLADH_OP(vqrdmlsdhh, 2, int16_t, 0, 1, do_vqdmlsdh_h)
 +DO_VQDMLADH_OP(vqrdmlsdhw, 4, int32_t, 0, 1, do_vqdmlsdh_w)
 +DO_VQDMLADH_OP(vqrdmlsdhxb, 1, int8_t, 1, 1, do_vqdmlsdh_b)
 +DO_VQDMLADH_OP(vqrdmlsdhxh, 2, int16_t, 1, 1, do_vqdmlsdh_h)
 +DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
 +
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                  uint32_t rm)                            \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQDMLADH, vqdmladh)
  DO_2OP(VQDMLADHX, vqdmladhx)
  DO_2OP(VQRDMLADH, vqrdmladh)
  DO_2OP(VQRDMLADHX, vqrdmladhx)
 +DO_2OP(VQDMLSDH, vqdmlsdh)
 +DO_2OP(VQDMLSDHX, vqdmlsdhx)
 +DO_2OP(VQRDMLSDH, vqrdmlsdh)
 +DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
-.20.1
+.34.1

-[PULL 47/57] target/arm: Implement MVE VQDMLADH and VQRDMLADH
+[PULL 12/35] hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
-Implement the MVE VQDMLADH and VQRDMLADH insns.  These multiply
+Armv8.1+ CPUs have the Virtual Host Extension (VHE) which adds a
-elements, and then add pairs of products, double, possibly round,
+non-secure EL2 virtual timer.  We implemented the timer itself in the
-saturate and return the high half of the result.
+CPU model, but never wired up its IRQ line to the GIC.
 Wire up the IRQ line (this is always safe whether the CPU has the
 interrupt or not, since it always creates the outbound IRQ line).
 Report it to the guest via dtb and ACPI if the CPU has the feature.
 The DTB binding is documented in the kernel's
 Documentation/devicetree/bindings/timer/arm\,arch_timer.yaml
 and the ACPI table entries are documented in the ACPI specification
 version 6.3 or later.
 Because the IRQ line ACPI binding is new in 6.3, we need to bump the
 FADT table rev to show that we might be using 6.3 features.
 Note that exposing this IRQ in the DTB will trigger a bug in EDK2
 versions prior to edk2-stable202311, for users who use the virt board
 with 'virtualization=on' to enable EL2 emulation and are booting an
 EDK2 guest BIOS, if that EDK2 has assertions enabled.  The effect is
 that EDK2 will assert on bootup:
  ASSERT [ArmTimerDxe] /home/kraxel/projects/qemu/roms/edk2/ArmVirtPkg/Library/ArmVirtTimerFdtClientLib/ArmVirtTimerFdtClientLib.c(72): PropSize == 36 || PropSize == 48
 If you see that assertion you should do one of:
  * update your EDK2 binaries to edk2-stable202311 or newer
  * use the 'virt-8.2' versioned machine type
  * not use 'virtualization=on'
 (The versions shipped with QEMU itself have the fix.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
-Message-id: 20210617121628.20116-37-peter.maydell@linaro.org
+Message-id: 20240122143537.233498-3-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 16 +++++++
+ include/hw/arm/virt.h    |  2 ++
- target/arm/mve.decode      |  5 +++
+ hw/arm/virt-acpi-build.c | 20 ++++++++++----
- target/arm/mve_helper.c    | 89 ++++++++++++++++++++++++++++++++++++++
+ hw/arm/virt.c            | 60 ++++++++++++++++++++++++++++++++++------
- target/arm/translate-mve.c |  4 ++
+files changed, 67 insertions(+), 15 deletions(-)
-files changed, 114 insertions(+)
+diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
 diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/include/hw/arm/virt.h
-+++ b/target/arm/helper-mve.h
++++ b/include/hw/arm/virt.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ struct VirtMachineClass {
- DEF_HELPER_FLAGS_4(mve_vqrshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     /* Machines < 6.2 have no support for describing cpu topology to guest */
- DEF_HELPER_FLAGS_4(mve_vqrshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     bool no_cpu_topology;
+     bool no_tcg_lpa2;
-+DEF_HELPER_FLAGS_4(mve_vqdmladhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++    bool no_ns_el2_virt_timer_irq;
-+DEF_HELPER_FLAGS_4(mve_vqdmladhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ };
-+DEF_HELPER_FLAGS_4(mve_vqdmladhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+
+ struct VirtMachineState {
-+DEF_HELPER_FLAGS_4(mve_vqdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ struct VirtMachineState {
-+DEF_HELPER_FLAGS_4(mve_vqdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     PCIBus *bus;
-+DEF_HELPER_FLAGS_4(mve_vqdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     char *oem_id;
-+
+     char *oem_table_id;
-+DEF_HELPER_FLAGS_4(mve_vqrdmladhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++    bool ns_el2_virt_timer_irq;
-+DEF_HELPER_FLAGS_4(mve_vqrdmladhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ };
-+DEF_HELPER_FLAGS_4(mve_vqrdmladhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+
+ #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
-+DEF_HELPER_FLAGS_4(mve_vqrdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
 +DEF_HELPER_FLAGS_4(mve_vqrdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/hw/arm/virt-acpi-build.c
-+++ b/target/arm/mve.decode
++++ b/hw/arm/virt-acpi-build.c
-@@ -XXX,XX +XXX,XX @@ VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
+@@ -XXX,XX +XXX,XX @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
- VQRSHL_S         111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
+ }
- VQRSHL_U         111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
+ /*
-+VQDMLADH         1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
+- * ACPI spec, Revision 5.1
-+VQDMLADHX        1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
+- * 5.2.24 Generic Timer Description Table (GTDT)
-+VQRDMLADH        1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
++ * ACPI spec, Revision 6.5
-+VQRDMLADHX       1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
++ * 5.2.25 Generic Timer Description Table (GTDT)
-+
+  */
- # Vector miscellaneous
+ static void
+ build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
- VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
+@@ -XXX,XX +XXX,XX @@ build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+     uint32_t irqflags = vmc->claim_edge_triggered_timers ?
 : /* Interrupt is Edge triggered */
 ;  /* Interrupt is Level triggered  */
 -    AcpiTable table = { .sig = "GTDT", .rev = 2, .oem_id = vms->oem_id,
 +    AcpiTable table = { .sig = "GTDT", .rev = 3, .oem_id = vms->oem_id,
                          .oem_table_id = vms->oem_table_id };
      acpi_table_begin(&table, table_data);
@@ -XXX,XX +XXX,XX @@ build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
      build_append_int_noprefix(table_data, 0, 4);
      /* Platform Timer Offset */
      build_append_int_noprefix(table_data, 0, 4);
 -
 +    if (vms->ns_el2_virt_timer_irq) {
 +        /* Virtual EL2 Timer GSIV */
 +        build_append_int_noprefix(table_data, ARCH_TIMER_NS_EL2_VIRT_IRQ, 4);
 +        /* Virtual EL2 Timer Flags */
 +        build_append_int_noprefix(table_data, irqflags, 4);
 +    } else {
 +        build_append_int_noprefix(table_data, 0, 4);
 +        build_append_int_noprefix(table_data, 0, 4);
 +    }
      acpi_table_end(linker, &table);
  }
@@ -XXX,XX +XXX,XX @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
  static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
                              VirtMachineState *vms, unsigned dsdt_tbl_offset)
  {
 -    /* ACPI v6.0 */
 +    /* ACPI v6.3 */
      AcpiFadtData fadt = {
          .rev = 6,
 -        .minor_ver = 0,
 +        .minor_ver = 3,
          .flags = 1 << ACPI_FADT_F_HW_REDUCED_ACPI,
          .xdsdt_tbl_offset = &dsdt_tbl_offset,
      };
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
+--- a/hw/arm/virt.c
-+++ b/target/arm/mve_helper.c
++++ b/hw/arm/virt.c
-@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
+@@ -XXX,XX +XXX,XX @@ static void create_randomness(MachineState *ms, const char *node)
- DO_2OP_SAT_S(vqrshls, DO_SQRSHL_OP)
+     qemu_fdt_setprop(ms->fdt, node, "rng-seed", seed.rng, sizeof(seed.rng));
- DO_2OP_SAT_U(vqrshlu, DO_UQRSHL_OP)
+ }
 +/*
-+ * Multiply add dual returning high half
++ * The CPU object always exposes the NS EL2 virt timer IRQ line,
-+ * The 'FN' here takes four inputs A, B, C, D, a 0/1 indicator of
++ * but we don't want to advertise it to the guest in the dtb or ACPI
-+ * whether to add the rounding constant, and the pointer to the
++ * table unless it's really going to do something.
 + * saturation flag, and should do "(A * B + C * D) * 2 + rounding constant",
 + * saturate to twice the input size and return the high half; or
 + * (A * B - C * D) etc for VQDMLSDH.
 + */
-+#define DO_VQDMLADH_OP(OP, ESIZE, TYPE, XCHG, ROUND, FN)                \
++static bool ns_el2_virt_timer_present(void)
-+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
++{
-+                                void *vm)                               \
++    ARMCPU *cpu = ARM_CPU(qemu_get_cpu(0));
-+    {                                                                   \
++    CPUARMState *env = &cpu->env;
-+        TYPE *d = vd, *n = vn, *m = vm;                                 \
++
-+        uint16_t mask = mve_element_mask(env);                          \
++    return arm_feature(env, ARM_FEATURE_AARCH64) &&
-+        unsigned e;                                                     \
++        arm_feature(env, ARM_FEATURE_EL2) && cpu_isar_feature(aa64_vh, cpu);
-+        bool qc = false;                                                \
++}
-+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
++
-+            bool sat = false;                                           \
+ static void create_fdt(VirtMachineState *vms)
-+            if ((e & 1) == XCHG) {                                      \
+ {
-+                TYPE r = FN(n[H##ESIZE(e)],                             \
+     MachineState *ms = MACHINE(vms);
-+                            m[H##ESIZE(e - XCHG)],                      \
+@@ -XXX,XX +XXX,XX @@ static void fdt_add_timer_nodes(const VirtMachineState *vms)
-+                            n[H##ESIZE(e + (1 - 2 * XCHG))],            \
+                                 "arm,armv7-timer");
-+                            m[H##ESIZE(e + (1 - XCHG))],                \
+     }
-+                            ROUND, &sat);                               \
+     qemu_fdt_setprop(ms->fdt, "/timer", "always-on", NULL, 0);
-+                mergemask(&d[H##ESIZE(e)], r, mask);                    \
+-    qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
-+                qc |= sat & mask & 1;                                   \
+-                           GIC_FDT_IRQ_TYPE_PPI,
-+            }                                                           \
+-                           INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
-+        }                                                               \
+-                           GIC_FDT_IRQ_TYPE_PPI,
-+        if (qc) {                                                       \
+-                           INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
-+            env->vfp.qc[0] = qc;                                        \
+-                           GIC_FDT_IRQ_TYPE_PPI,
-+        }                                                               \
+-                           INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
-+        mve_advance_vpt(env);                                           \
+-                           GIC_FDT_IRQ_TYPE_PPI,
 -                           INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags);
 +    if (vms->ns_el2_virt_timer_irq) {
 +        qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_NS_EL2_VIRT_IRQ), irqflags);
 +    } else {
 +        qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
 +                               GIC_FDT_IRQ_TYPE_PPI,
 +                               INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags);
 +    }
-+
+ }
-+static int8_t do_vqdmladh_b(int8_t a, int8_t b, int8_t c, int8_t d,
-+                            int round, bool *sat)
+ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
-+{
+@@ -XXX,XX +XXX,XX @@ static void create_gic(VirtMachineState *vms, MemoryRegion *mem)
-+    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 7);
+             [GTIMER_VIRT] = ARCH_TIMER_VIRT_IRQ,
-+    return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
+             [GTIMER_HYP]  = ARCH_TIMER_NS_EL2_IRQ,
-+}
+             [GTIMER_SEC]  = ARCH_TIMER_S_EL1_IRQ,
-+
++            [GTIMER_HYPVIRT] = ARCH_TIMER_NS_EL2_VIRT_IRQ,
-+static int16_t do_vqdmladh_h(int16_t a, int16_t b, int16_t c, int16_t d,
+         };
-+                             int round, bool *sat)
-+{
+         for (unsigned irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
-+    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 15);
+@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
-+    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
+         qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
-+}
+         object_unref(cpuobj);
-+
+     }
-+static int32_t do_vqdmladh_w(int32_t a, int32_t b, int32_t c, int32_t d,
++
-+                             int round, bool *sat)
++    /* Now we've created the CPUs we can see if they have the hypvirt timer */
-+{
++    vms->ns_el2_virt_timer_irq = ns_el2_virt_timer_present() &&
-+    int64_t m1 = (int64_t)a * b;
++        !vmc->no_ns_el2_virt_timer_irq;
-+    int64_t m2 = (int64_t)c * d;
++
-+    int64_t r;
+     fdt_add_timer_nodes(vms);
      fdt_add_cpu_nodes(vms);
@@ -XXX,XX +XXX,XX @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 0)
  static void virt_machine_8_2_options(MachineClass *mc)
  {
 +    VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
 +
      virt_machine_9_0_options(mc);
      compat_props_add(mc->compat_props, hw_compat_8_2, hw_compat_8_2_len);
 +    /*
-+     * Architecturally we should do the entire add, double, round
++     * Don't expose NS_EL2_VIRT timer IRQ in DTB on ACPI on 8.2 and
-+     * and then check for saturation. We do three saturating adds,
++     * earlier machines. (Exposing it tickles a bug in older EDK2
-+     * but we need to be careful about the order. If the first
++     * guest BIOS binaries.)
 +     * m1 + m2 saturates then it's impossible for the *2+rc to
 +     * bring it back into the non-saturated range. However, if
 +     * m1 + m2 is negative then it's possible that doing the doubling
 +     * would take the intermediate result below INT64_MAX and the
 +     * addition of the rounding constant then brings it back in range.
 +     * So we add half the rounding constant before doubling rather
 +     * than adding the rounding constant after the doubling.
 +     */
-+    if (sadd64_overflow(m1, m2, &r) ||
++    vmc->no_ns_el2_virt_timer_irq = true;
-+        sadd64_overflow(r, (round << 30), &r) ||
+ }
-+        sadd64_overflow(r, r, &r)) {
+ DEFINE_VIRT_MACHINE(8, 2)
-+        *sat = true;
 +        return r < 0 ? INT32_MAX : INT32_MIN;
 +    }
 +    return r >> 32;
 +}
 +
 +DO_VQDMLADH_OP(vqdmladhb, 1, int8_t, 0, 0, do_vqdmladh_b)
 +DO_VQDMLADH_OP(vqdmladhh, 2, int16_t, 0, 0, do_vqdmladh_h)
 +DO_VQDMLADH_OP(vqdmladhw, 4, int32_t, 0, 0, do_vqdmladh_w)
 +DO_VQDMLADH_OP(vqdmladhxb, 1, int8_t, 1, 0, do_vqdmladh_b)
 +DO_VQDMLADH_OP(vqdmladhxh, 2, int16_t, 1, 0, do_vqdmladh_h)
 +DO_VQDMLADH_OP(vqdmladhxw, 4, int32_t, 1, 0, do_vqdmladh_w)
 +
 +DO_VQDMLADH_OP(vqrdmladhb, 1, int8_t, 0, 1, do_vqdmladh_b)
 +DO_VQDMLADH_OP(vqrdmladhh, 2, int16_t, 0, 1, do_vqdmladh_h)
 +DO_VQDMLADH_OP(vqrdmladhw, 4, int32_t, 0, 1, do_vqdmladh_w)
 +DO_VQDMLADH_OP(vqrdmladhxb, 1, int8_t, 1, 1, do_vqdmladh_b)
 +DO_VQDMLADH_OP(vqrdmladhxh, 2, int16_t, 1, 1, do_vqdmladh_h)
 +DO_VQDMLADH_OP(vqrdmladhxw, 4, int32_t, 1, 1, do_vqdmladh_w)
 +
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                  uint32_t rm)                            \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQSHL_S, vqshls)
  DO_2OP(VQSHL_U, vqshlu)
  DO_2OP(VQRSHL_S, vqrshls)
  DO_2OP(VQRSHL_U, vqrshlu)
 +DO_2OP(VQDMLADH, vqdmladh)
 +DO_2OP(VQDMLADHX, vqdmladhx)
 +DO_2OP(VQRDMLADH, vqrdmladh)
 +DO_2OP(VQRDMLADHX, vqrdmladhx)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
-.20.1
+.34.1

-[PULL 46/57] target/arm: Implement MVE VRSHL
+[PULL 13/35] tests/qtest/bios-tables-tests: Update virt golden reference
-Implement the MVE VRSHL insn (vector form).
+Update the virt golden reference files to say that the FACP is ACPI
 v6.3, and the GTDT table is a revision 3 table with space for the
 virtual EL2 timer.
 Diffs from iasl:
@@ -XXX,XX +XXX,XX @@
  /*
   * Intel ACPI Component Architecture
   * AML/ASL+ Disassembler version 20200925 (64-bit version)
   * Copyright (c) 2000 - 2020 Intel Corporation
   *
 - * Disassembly of tests/data/acpi/virt/FACP, Mon Jan 22 13:48:40 2024
 + * Disassembly of /tmp/aml-W8RZH2, Mon Jan 22 13:48:40 2024
   *
   * ACPI Data Table [FACP]
   *
   * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
   */
  [000h 0000   4]                    Signature : "FACP"    [Fixed ACPI Description Table (FADT)]
  [004h 0004   4]                 Table Length : 00000114
  [008h 0008   1]                     Revision : 06
 -[009h 0009   1]                     Checksum : 15
 +[009h 0009   1]                     Checksum : 12
  [00Ah 0010   6]                       Oem ID : "BOCHS "
  [010h 0016   8]                 Oem Table ID : "BXPC    "
  [018h 0024   4]                 Oem Revision : 00000001
  [01Ch 0028   4]              Asl Compiler ID : "BXPC"
  [020h 0032   4]        Asl Compiler Revision : 00000001
  [024h 0036   4]                 FACS Address : 00000000
  [028h 0040   4]                 DSDT Address : 00000000
  [02Ch 0044   1]                        Model : 00
  [02Dh 0045   1]                   PM Profile : 00 [Unspecified]
  [02Eh 0046   2]                SCI Interrupt : 0000
  [030h 0048   4]             SMI Command Port : 00000000
  [034h 0052   1]            ACPI Enable Value : 00
  [035h 0053   1]           ACPI Disable Value : 00
  [036h 0054   1]               S4BIOS Command : 00
  [037h 0055   1]              P-State Control : 00
@@ -XXX,XX +XXX,XX @@
       Use APIC Physical Destination Mode (V4) : 0
                         Hardware Reduced (V5) : 1
                        Low Power S0 Idle (V5) : 0
  [074h 0116  12]               Reset Register : [Generic Address Structure]
  [074h 0116   1]                     Space ID : 00 [SystemMemory]
  [075h 0117   1]                    Bit Width : 00
  [076h 0118   1]                   Bit Offset : 00
  [077h 0119   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [078h 0120   8]                      Address : 0000000000000000
  [080h 0128   1]         Value to cause reset : 00
  [081h 0129   2]    ARM Flags (decoded below) : 0003
                                PSCI Compliant : 1
                         Must use HVC for PSCI : 1
 -[083h 0131   1]          FADT Minor Revision : 00
 +[083h 0131   1]          FADT Minor Revision : 03
  [084h 0132   8]                 FACS Address : 0000000000000000
  [08Ch 0140   8]                 DSDT Address : 0000000000000000
  [094h 0148  12]             PM1A Event Block : [Generic Address Structure]
  [094h 0148   1]                     Space ID : 00 [SystemMemory]
  [095h 0149   1]                    Bit Width : 00
  [096h 0150   1]                   Bit Offset : 00
  [097h 0151   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [098h 0152   8]                      Address : 0000000000000000
  [0A0h 0160  12]             PM1B Event Block : [Generic Address Structure]
  [0A0h 0160   1]                     Space ID : 00 [SystemMemory]
  [0A1h 0161   1]                    Bit Width : 00
  [0A2h 0162   1]                   Bit Offset : 00
  [0A3h 0163   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [0A4h 0164   8]                      Address : 0000000000000000
@@ -XXX,XX +XXX,XX @@
  [0F5h 0245   1]                    Bit Width : 00
  [0F6h 0246   1]                   Bit Offset : 00
  [0F7h 0247   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [0F8h 0248   8]                      Address : 0000000000000000
  [100h 0256  12]        Sleep Status Register : [Generic Address Structure]
  [100h 0256   1]                     Space ID : 00 [SystemMemory]
  [101h 0257   1]                    Bit Width : 00
  [102h 0258   1]                   Bit Offset : 00
  [103h 0259   1]         Encoded Access Width : 00 [Undefined/Legacy]
  [104h 0260   8]                      Address : 0000000000000000
  [10Ch 0268   8]                Hypervisor ID : 00000000554D4551
  Raw Table Data: Length 276 (0x114)
 -    0000: 46 41 43 50 14 01 00 00 06 15 42 4F 43 48 53 20  // FACP......BOCHS
 +    0000: 46 41 43 50 14 01 00 00 06 12 42 4F 43 48 53 20  // FACP......BOCHS
 : 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
 : 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 -    0080: 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 +    0080: 00 03 00 03 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
 : 00 00 00 00 00 00 00 00 00 00 00 00 51 45 4D 55  // ............QEMU
 : 00 00 00 00                                      // ....
@@ -XXX,XX +XXX,XX @@
  /*
   * Intel ACPI Component Architecture
   * AML/ASL+ Disassembler version 20200925 (64-bit version)
   * Copyright (c) 2000 - 2020 Intel Corporation
   *
 - * Disassembly of tests/data/acpi/virt/GTDT, Mon Jan 22 13:48:40 2024
 + * Disassembly of /tmp/aml-XDSZH2, Mon Jan 22 13:48:40 2024
   *
   * ACPI Data Table [GTDT]
   *
   * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
   */
  [000h 0000   4]                    Signature : "GTDT"    [Generic Timer Description Table]
 -[004h 0004   4]                 Table Length : 00000060
 -[008h 0008   1]                     Revision : 02
 -[009h 0009   1]                     Checksum : 9C
 +[004h 0004   4]                 Table Length : 00000068
 +[008h 0008   1]                     Revision : 03
 +[009h 0009   1]                     Checksum : 93
  [00Ah 0010   6]                       Oem ID : "BOCHS "
  [010h 0016   8]                 Oem Table ID : "BXPC    "
  [018h 0024   4]                 Oem Revision : 00000001
  [01Ch 0028   4]              Asl Compiler ID : "BXPC"
  [020h 0032   4]        Asl Compiler Revision : 00000001
  [024h 0036   8]        Counter Block Address : FFFFFFFFFFFFFFFF
  [02Ch 0044   4]                     Reserved : 00000000
  [030h 0048   4]         Secure EL1 Interrupt : 0000001D
  [034h 0052   4]    EL1 Flags (decoded below) : 00000000
                                  Trigger Mode : 0
                                      Polarity : 0
                                     Always On : 0
  [038h 0056   4]     Non-Secure EL1 Interrupt : 0000001E
@@ -XXX,XX +XXX,XX @@
  [040h 0064   4]      Virtual Timer Interrupt : 0000001B
  [044h 0068   4]     VT Flags (decoded below) : 00000000
                                  Trigger Mode : 0
                                      Polarity : 0
                                     Always On : 0
  [048h 0072   4]     Non-Secure EL2 Interrupt : 0000001A
  [04Ch 0076   4]   NEL2 Flags (decoded below) : 00000000
                                  Trigger Mode : 0
                                      Polarity : 0
                                     Always On : 0
  [050h 0080   8]   Counter Read Block Address : FFFFFFFFFFFFFFFF
  [058h 0088   4]         Platform Timer Count : 00000000
  [05Ch 0092   4]        Platform Timer Offset : 00000000
 +[060h 0096   4]       Virtual EL2 Timer GSIV : 00000000
 +[064h 0100   4]      Virtual EL2 Timer Flags : 00000000
 -Raw Table Data: Length 96 (0x60)
 +Raw Table Data: Length 104 (0x68)
 -    0000: 47 54 44 54 60 00 00 00 02 9C 42 4F 43 48 53 20  // GTDT`.....BOCHS
 +    0000: 47 54 44 54 68 00 00 00 03 93 42 4F 43 48 53 20  // GTDTh.....BOCHS
 : 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
 : 01 00 00 00 FF FF FF FF FF FF FF FF 00 00 00 00  // ................
 : 1D 00 00 00 00 00 00 00 1E 00 00 00 04 00 00 00  // ................
 : 1B 00 00 00 00 00 00 00 1A 00 00 00 00 00 00 00  // ................
 : FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00  // ................
 +    0060: 00 00 00 00 00 00 00 00                          // ........
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
-Message-id: 20210617121628.20116-36-peter.maydell@linaro.org
+Message-id: 20240122143537.233498-4-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 8 ++++++++
+ tests/qtest/bios-tables-test-allowed-diff.h |   2 --
- target/arm/mve.decode      | 3 +++
+ tests/data/acpi/virt/FACP                   | Bin 276 -> 276 bytes
- target/arm/mve_helper.c    | 4 ++++
+ tests/data/acpi/virt/GTDT                   | Bin 96 -> 104 bytes
- target/arm/translate-mve.c | 2 ++
+files changed, 2 deletions(-)
-files changed, 17 insertions(+)
+diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
 diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/tests/qtest/bios-tables-test-allowed-diff.h
-+++ b/target/arm/helper-mve.h
++++ b/tests/qtest/bios-tables-test-allowed-diff.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -1,3 +1 @@
- DEF_HELPER_FLAGS_4(mve_vshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ /* List of comma-separated changed AML files to ignore */
- DEF_HELPER_FLAGS_4(mve_vshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-"tests/data/acpi/virt/FACP",
+-"tests/data/acpi/virt/GTDT",
-+DEF_HELPER_FLAGS_4(mve_vrshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+diff --git a/tests/data/acpi/virt/FACP b/tests/data/acpi/virt/FACP
 +DEF_HELPER_FLAGS_4(mve_vrshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vrshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vrshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vrshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vrshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vqshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vqshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vqshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+GIT binary patch
-+++ b/target/arm/mve.decode
+delta 25
-@@ -XXX,XX +XXX,XX @@ VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
+gcmbQjG=+)F&CxkPgpq-PO=u!l<;2F$$vli407<0<)c^nh
- VSHL_S           111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
- VSHL_U           111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
+delta 28
+kcmbQjG=+)F&CxkPgpq-PO>`nx<-|!<6Akz$^DuG%0AAS!ssI20
-+VRSHL_S          111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 0 ... 0 @2op_rev
-+VRSHL_U          111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 0 ... 0 @2op_rev
+diff --git a/tests/data/acpi/virt/GTDT b/tests/data/acpi/virt/GTDT
 +
  VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
  VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
+GIT binary patch
-+++ b/target/arm/mve_helper.c
+delta 25
-@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhsubu, do_vhsub_u)
+bcmYeu;BpUf3CUn!U|^m+kt>V?$N&QXMtB4L
- #define DO_VSHLS(N, M) do_sqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, false, NULL)
+delta 16
- #define DO_VSHLU(N, M) do_uqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, false, NULL)
+Xcmc~u;BpUf2}xjJU|^avkt+-UB60)u
-+#define DO_VRSHLS(N, M) do_sqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, true, NULL)
 +#define DO_VRSHLU(N, M) do_uqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, true, NULL)
  DO_2OP_S(vshls, DO_VSHLS)
  DO_2OP_U(vshlu, DO_VSHLU)
 +DO_2OP_S(vrshls, DO_VRSHLS)
 +DO_2OP_U(vrshlu, DO_VRSHLU)
  static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  {
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQSUB_S, vqsubs)
  DO_2OP(VQSUB_U, vqsubu)
  DO_2OP(VSHL_S, vshls)
  DO_2OP(VSHL_U, vshlu)
 +DO_2OP(VRSHL_S, vrshls)
 +DO_2OP(VRSHL_U, vrshlu)
  DO_2OP(VQSHL_S, vqshls)
  DO_2OP(VQSHL_U, vqshlu)
  DO_2OP(VQRSHL_S, vqrshls)
 --
-.20.1
+.34.1

-[PULL 45/57] target/arm: Implement MVE VSHL insn
+[PULL 14/35] hw/arm/npcm7xx: Call qemu_configure_nic_device() for GMAC modules
-Implement the MVE VSHL insn (vector form).
+The patchset adding the GMAC ethernet to this SoC crossed in the
 mail with the patchset cleaning up the NIC handling. When we
 create the GMAC modules we must call qemu_configure_nic_device()
 so that the user has the opportunity to use the -nic commandline
 option to create a network backend and connect it to the GMACs.
+Add the missing call.
+Fixes: 21e5326a7c ("hw/arm: Add GMAC devices to NPCM7XX SoC")
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
-Message-id: 20210617121628.20116-35-peter.maydell@linaro.org
+Message-id: 20240206171231.396392-2-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 8 ++++++++
+ hw/arm/npcm7xx.c | 1 +
- target/arm/mve.decode      | 3 +++
+file changed, 1 insertion(+)
  target/arm/mve_helper.c    | 6 ++++++
  target/arm/translate-mve.c | 2 ++
 files changed, 19 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/npcm7xx.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/npcm7xx.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
- DEF_HELPER_FLAGS_4(mve_vqsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
- DEF_HELPER_FLAGS_4(mve_vqsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+         SysBusDevice *sbd = SYS_BUS_DEVICE(&s->gmac[i]);
-+DEF_HELPER_FLAGS_4(mve_vshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++        qemu_configure_nic_device(DEVICE(sbd), false, NULL);
-+DEF_HELPER_FLAGS_4(mve_vshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+         /*
-+DEF_HELPER_FLAGS_4(mve_vshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+          * The device exists regardless of whether it's connected to a QEMU
-+
+          * netdev backend. So always instantiate it even if there is no
 +DEF_HELPER_FLAGS_4(mve_vshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vqshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vqshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vqshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
  VQSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
  VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 +VSHL_S           111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
 +VSHL_U           111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
 +
  VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
  VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhaddu, do_vhadd_u)
  DO_2OP_S(vhsubs, do_vhsub_s)
  DO_2OP_U(vhsubu, do_vhsub_u)
 +#define DO_VSHLS(N, M) do_sqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, false, NULL)
 +#define DO_VSHLU(N, M) do_uqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, false, NULL)
 +
 +DO_2OP_S(vshls, DO_VSHLS)
 +DO_2OP_U(vshlu, DO_VSHLU)
 +
  static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  {
      if (val > max) {
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQADD_S, vqadds)
  DO_2OP(VQADD_U, vqaddu)
  DO_2OP(VQSUB_S, vqsubs)
  DO_2OP(VQSUB_U, vqsubu)
 +DO_2OP(VSHL_S, vshls)
 +DO_2OP(VSHL_U, vshlu)
  DO_2OP(VQSHL_S, vqshls)
  DO_2OP(VQSHL_U, vqshlu)
  DO_2OP(VQRSHL_S, vqrshls)
 --
-.20.1
+.34.1

-[PULL 44/57] target/arm: Implement MVE VQRSHL
+[PULL 15/35] tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
-Implement the MV VQRSHL (vector) insn.  Again, the code to perform
+Currently QEMU will warn if there is a NIC on the board that
-the actual shifts is borrowed from neon_helper.c.
+is not connected to a backend. By default the '-nic user' will
 get used for all NICs, but if you manually connect a specific
 NIC to a specific backend, then the other NICs on the board
 have no backend and will be warned about:
 qemu-system-arm: warning: nic npcm7xx-emc.1 has no peer
 qemu-system-arm: warning: nic npcm-gmac.0 has no peer
 qemu-system-arm: warning: nic npcm-gmac.1 has no peer
 So suppress those warnings by manually connecting every NIC
 on the board to some backend.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
-Message-id: 20210617121628.20116-34-peter.maydell@linaro.org
+Reviewed-by: Thomas Huth <thuth@redhat.com>
 Message-id: 20240206171231.396392-3-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 8 ++++++++
+ tests/qtest/npcm7xx_emc-test.c | 5 ++++-
- target/arm/mve.decode      | 3 +++
+file changed, 4 insertions(+), 1 deletion(-)
  target/arm/mve_helper.c    | 6 ++++++
  target/arm/translate-mve.c | 2 ++
 files changed, 19 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/tests/qtest/npcm7xx_emc-test.c b/tests/qtest/npcm7xx_emc-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/tests/qtest/npcm7xx_emc-test.c
-+++ b/target/arm/helper-mve.h
++++ b/tests/qtest/npcm7xx_emc-test.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ static int *packet_test_init(int module_num, GString *cmd_line)
- DEF_HELPER_FLAGS_4(mve_vqshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+      * KISS and use -nic. The driver accepts 'emc0' and 'emc1' as aliases
- DEF_HELPER_FLAGS_4(mve_vqshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+      * in the 'model' field to specify the device to match.
+      */
-+DEF_HELPER_FLAGS_4(mve_vqrshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-    g_string_append_printf(cmd_line, " -nic socket,fd=%d,model=emc%d ",
-+DEF_HELPER_FLAGS_4(mve_vqrshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++    g_string_append_printf(cmd_line, " -nic socket,fd=%d,model=emc%d "
-+DEF_HELPER_FLAGS_4(mve_vqrshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++                           "-nic user,model=npcm7xx-emc "
-+
++                           "-nic user,model=npcm-gmac "
-+DEF_HELPER_FLAGS_4(mve_vqrshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++                           "-nic user,model=npcm-gmac",
-+DEF_HELPER_FLAGS_4(mve_vqrshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+                            test_sockets[1], module_num);
-+DEF_HELPER_FLAGS_4(mve_vqrshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+
+     g_test_queue_destroy(packet_test_clear, test_sockets);
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
  VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
  VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 +VQRSHL_S         111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
 +VQRSHL_U         111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
      WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, false, satp)
  #define DO_UQSHL_OP(N, M, satp) \
      WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, false, satp)
 +#define DO_SQRSHL_OP(N, M, satp) \
 +    WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, true, satp)
 +#define DO_UQRSHL_OP(N, M, satp) \
 +    WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, true, satp)
  DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
  DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
 +DO_2OP_SAT_S(vqrshls, DO_SQRSHL_OP)
 +DO_2OP_SAT_U(vqrshlu, DO_UQRSHL_OP)
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQSUB_S, vqsubs)
  DO_2OP(VQSUB_U, vqsubu)
  DO_2OP(VQSHL_S, vqshls)
  DO_2OP(VQSHL_U, vqshlu)
 +DO_2OP(VQRSHL_S, vqrshls)
 +DO_2OP(VQRSHL_U, vqrshlu)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
-.20.1
+.34.1

-[PULL 55/57] target/arm: Make VMOV scalar <-> gpreg beatwise for MVE
+[PULL 16/35] target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
-In a CPU with MVE, the VMOV (vector lane to general-purpose register)
+It doesn't make sense to read the value of MDCR_EL2 on a non-A-profile
-and VMOV (general-purpose register to vector lane) insns are not
+CPU, and in fact if you try to do it we will assert:
 predicated, but they are subject to beatwise execution if they
 are not in an IT block.
-Since our implementation always executes all 4 beats in one tick,
+#6  0x00007ffff4b95e96 in __GI___assert_fail
-this means only that we need to handle PSR.ECI:
+    (assertion=0x5555565a8c70 "!arm_feature(env, ARM_FEATURE_M)", file=0x5555565a6e5c "../../target/arm/helper.c", line=12600, function=0x5555565a9560 <__PRETTY_FUNCTION__.0> "arm_security_space_below_el3") at ./assert/assert.c:101
- * we must do the usual check for bad ECI state
+#7  0x0000555555ebf412 in arm_security_space_below_el3 (env=0x555557bc8190) at ../../target/arm/helper.c:12600
- * we must advance ECI state if the insn succeeds
+#8  0x0000555555ea6f89 in arm_is_el2_enabled (env=0x555557bc8190) at ../../target/arm/cpu.h:2595
- * if ECI says we should not be executing the beat corresponding
+#9  0x0000555555ea942f in arm_mdcr_el2_eff (env=0x555557bc8190) at ../../target/arm/internals.h:1512
    to the lane of the vector register being accessed then we
    should skip performing the move
-Note that if PSR.ECI is non-zero then we cannot be in an IT block.
+We might call pmu_counter_enabled() on an M-profile CPU (for example
 from the migration pre/post hooks in machine.c); this should always
 return false because these CPUs don't set ARM_FEATURE_PMU.
+Avoid the assertion by not calling arm_mdcr_el2_eff() before we
+have done the early return for "PMU not present".
+This fixes an assertion failure if you try to do a loadvm or
+savevm for an M-profile board.
+Cc: qemu-stable@nongnu.org
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2155
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-45-peter.maydell@linaro.org
+Message-id: 20240208153346.970021-1-peter.maydell@linaro.org
 ---
- target/arm/translate-a32.h |  2 +
+ target/arm/helper.c | 12 ++++++++++--
- target/arm/translate-mve.c |  4 +-
+file changed, 10 insertions(+), 2 deletions(-)
  target/arm/translate-vfp.c | 77 +++++++++++++++++++++++++++++++++++---
 files changed, 75 insertions(+), 8 deletions(-)
-diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a32.h
+--- a/target/arm/helper.c
-+++ b/target/arm/translate-a32.h
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ long neon_full_reg_offset(unsigned reg);
+@@ -XXX,XX +XXX,XX @@ static bool pmu_counter_enabled(CPUARMState *env, uint8_t counter)
- long neon_element_offset(int reg, int element, MemOp memop);
+     bool enabled, prohibited = false, filtered;
- void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
+     bool secure = arm_is_secure(env);
- void clear_eci_state(DisasContext *s);
+     int el = arm_current_el(env);
-+bool mve_eci_check(DisasContext *s);
+-    uint64_t mdcr_el2 = arm_mdcr_el2_eff(env);
-+void mve_update_and_store_eci(DisasContext *s);
+-    uint8_t hpmn = mdcr_el2 & MDCR_HPMN;
++    uint64_t mdcr_el2;
- static inline TCGv_i32 load_cpu_offset(int offset)
++    uint8_t hpmn;
- {
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool mve_check_qreg_bank(DisasContext *s, int qmask)
      return qmask < 8;
  }
 -static bool mve_eci_check(DisasContext *s)
 +bool mve_eci_check(DisasContext *s)
  {
      /*
       * This is a beatwise insn: check that ECI is valid (not a
@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
      }
  }
 -static void mve_update_and_store_eci(DisasContext *s)
 +void mve_update_and_store_eci(DisasContext *s)
  {
      /*
       * For insns which don't call a helper function that will call
 diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c
 +++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
      return true;
  }
 +static bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
 +{
 +    /*
-+     * In a CPU with MVE, the VMOV (vector lane to general-purpose register)
++     * We might be called for M-profile cores where MDCR_EL2 doesn't
-+     * and VMOV (general-purpose register to vector lane) insns are not
++     * exist and arm_mdcr_el2_eff() will assert, so this early-exit check
-+     * predicated, but they are subject to beatwise execution if they are
++     * must be before we read that value.
 +     * not in an IT block.
 +     *
 +     * Since our implementation always executes all 4 beats in one tick,
 +     * this means only that if PSR.ECI says we should not be executing
 +     * the beat corresponding to the lane of the vector register being
 +     * accessed then we should skip performing the move, and that we need
 +     * to do the usual check for bad ECI state and advance of ECI state.
 +     *
 +     * Note that if PSR.ECI is non-zero then we cannot be in an IT block.
 +     *
 +     * Return true if this VMOV scalar <-> gpreg should be skipped because
 +     * the MVE PSR.ECI state says we skip the beat where the store happens.
 +     */
-+
+     if (!arm_feature(env, ARM_FEATURE_PMU)) {
 +    /* Calculate the byte offset into Qn which we're going to access */
 +    int ofs = (index << size) + ((vn & 1) * 8);
 +
 +    if (!dc_isar_feature(aa32_mve, s)) {
 +        return false;
 +    }
 +
 +    switch (s->eci) {
 +    case ECI_NONE:
 +        return false;
 +    case ECI_A0:
 +        return ofs < 4;
 +    case ECI_A0A1:
 +        return ofs < 8;
 +    case ECI_A0A1A2:
 +    case ECI_A0A1A2B0:
 +        return ofs < 12;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
  static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  {
      /* VMOV scalar to general purpose register */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
          return false;
      }
-+    if (dc_isar_feature(aa32_mve, s)) {
++    mdcr_el2 = arm_mdcr_el2_eff(env);
-+        if (!mve_eci_check(s)) {
++    hpmn = mdcr_el2 & MDCR_HPMN;
 +            return true;
 +        }
 +    }
 +
-     if (!vfp_access_check(s)) {
+     if (!arm_feature(env, ARM_FEATURE_EL2) ||
-         return true;
+             (counter < hpmn || counter == 31)) {
-     }
+         e = env->cp15.c9_pmcr & PMCRE;
 -    tmp = tcg_temp_new_i32();
 -    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
 -    store_reg(s, a->rt, tmp);
 +    if (!mve_skip_vmov(s, a->vn, a->index, a->size)) {
 +        tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, a->index,
 +                            a->size | (a->u ? 0 : MO_SIGN));
 +        store_reg(s, a->rt, tmp);
 +    }
 +    if (dc_isar_feature(aa32_mve, s)) {
 +        mve_update_and_store_eci(s);
 +    }
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
          return false;
      }
 +    if (dc_isar_feature(aa32_mve, s)) {
 +        if (!mve_eci_check(s)) {
 +            return true;
 +        }
 +    }
 +
      if (!vfp_access_check(s)) {
          return true;
      }
 -    tmp = load_reg(s, a->rt);
 -    write_neon_element32(tmp, a->vn, a->index, a->size);
 -    tcg_temp_free_i32(tmp);
 +    if (!mve_skip_vmov(s, a->vn, a->index, a->size)) {
 +        tmp = load_reg(s, a->rt);
 +        write_neon_element32(tmp, a->vn, a->index, a->size);
 +        tcg_temp_free_i32(tmp);
 +    }
 +    if (dc_isar_feature(aa32_mve, s)) {
 +        mve_update_and_store_eci(s);
 +    }
      return true;
  }
 --
-.20.1
+.34.1

-[PULL 05/57] target/arm/translate-vfp.c: Whitespace fixes
+[PULL 17/35] tests/qtest: Fix GMAC test to run on a machine in upstream QEMU
-In the code for handling VFP system register accesses there is some
+From: Nabih Estefan <nabihestefan@google.com>
 stray whitespace after a unary '-' operator, and also some incorrect
 indent in a couple of function prototypes.  We're about to move this
 code to another file, so fix the code style issues first so
 checkpatch doesn't complain about the code-movement patch.
-Cc: qemu-stable@nongnu.org
+Fix the nocm_gmac-test.c file to run on a nuvoton 7xx machine instead
 of 8xx. Also fix comments referencing this and values expecting 8xx.
 Change-Id: Iabd0fba14910c3f1e883c4a9521350f3db9ffab8
 Signed-Off-By: Nabih Estefan <nabihestefan@google.com>
 Reviewed-by: Tyrone Ting <kfting@nuvoton.com>
 Message-id: 20240208194759.2858582-2-nabihestefan@google.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 [PMM: commit message tweaks]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210618141019.10671-2-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.c | 11 +++++------
+ tests/qtest/npcm_gmac-test.c | 84 +-----------------------------------
-file changed, 5 insertions(+), 6 deletions(-)
+ tests/qtest/meson.build      |  3 +-
 files changed, 4 insertions(+), 83 deletions(-)
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
+--- a/tests/qtest/npcm_gmac-test.c
-+++ b/target/arm/translate-vfp.c
++++ b/tests/qtest/npcm_gmac-test.c
-@@ -XXX,XX +XXX,XX @@ static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
+@@ -XXX,XX +XXX,XX @@ typedef struct TestData {
      const GMACModule *module;
  } TestData;
 -/* Values extracted from hw/arm/npcm8xx.c */
 +/* Values extracted from hw/arm/npcm7xx.c */
  static const GMACModule gmac_module_list[] = {
      {
          .irq        = 14,
@@ -XXX,XX +XXX,XX @@ static const GMACModule gmac_module_list[] = {
          .irq        = 15,
          .base_addr  = 0xf0804000
      },
 -    {
 -        .irq        = 16,
 -        .base_addr  = 0xf0806000
 -    },
 -    {
 -        .irq        = 17,
 -        .base_addr  = 0xf0808000
 -    }
  };
  /* Returns the index of the GMAC module. */
@@ -XXX,XX +XXX,XX @@ static uint32_t gmac_read(QTestState *qts, const GMACModule *mod,
      return qtest_readl(qts, mod->base_addr + regno);
  }
- static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
+-static uint16_t pcs_read(QTestState *qts, const GMACModule *mod,
 -                          NPCMRegister regno)
 -{
 -    uint32_t write_value = (regno & 0x3ffe00) >> 9;
 -    qtest_writel(qts, PCS_BASE_ADDRESS + NPCM_PCS_IND_AC_BA, write_value);
 -    uint32_t read_offset = regno & 0x1ff;
 -    return qtest_readl(qts, PCS_BASE_ADDRESS + read_offset);
 -}
 -
-                                   fp_sysreg_loadfn *loadfn,
+ /* Check that GMAC registers are reset to default value */
--                                 void *opaque)
+ static void test_init(gconstpointer test_data)
 +                                  void *opaque)
  {
-     /* Do a write to an M-profile floating point system register */
+     const TestData *td = test_data;
-     TCGv_i32 tmp;
+     const GMACModule *mod = td->module;
-@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
+-    QTestState *qts = qtest_init("-machine npcm845-evb");
 +    QTestState *qts = qtest_init("-machine npcm750-evb");
  #define CHECK_REG32(regno, value) \
      do { \
          g_assert_cmphex(gmac_read(qts, mod, (regno)), ==, (value)); \
      } while (0)
 -#define CHECK_REG_PCS(regno, value) \
 -    do { \
 -        g_assert_cmphex(pcs_read(qts, mod, (regno)), ==, (value)); \
 -    } while (0)
 -
      CHECK_REG32(NPCM_DMA_BUS_MODE, 0x00020100);
      CHECK_REG32(NPCM_DMA_XMT_POLL_DEMAND, 0);
      CHECK_REG32(NPCM_DMA_RCV_POLL_DEMAND, 0);
@@ -XXX,XX +XXX,XX @@ static void test_init(gconstpointer test_data)
      CHECK_REG32(NPCM_GMAC_PTP_TAR, 0);
      CHECK_REG32(NPCM_GMAC_PTP_TTSR, 0);
 -    /* TODO Add registers PCS */
 -    if (mod->base_addr == 0xf0802000) {
 -        CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID1, 0x699e);
 -        CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID2, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_CTL_STS, 0x8000);
 -
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_CTRL, 0x1140);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_STS, 0x0109);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID1, 0x699e);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID2, 0x0ced0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_AN_ADV, 0x0020);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_LP_BABL, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_AN_EXPN, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_MII_EXT_STS, 0xc000);
 -
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_ABL, 0x0003);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_LWR, 0x0038);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_UPR, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_LWR, 0x0038);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_UPR, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_LWR, 0x0058);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_UPR, 0);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_LWR, 0x0048);
 -        CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_UPR, 0);
 -
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MMD_DIG_CTRL1, 0x2400);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_AN_CTRL, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_AN_INTR_STS, 0x000a);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_TC, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DBG_CTRL, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_MCTRL0, 0x899c);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_TXTIMER, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_RXTIMER, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_LINK_TIMER_CTRL, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_MCTRL1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_STS, 0x0010);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_ICG_ERRCNT1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MISC_STS, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_RX_LSTS, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_BSTCTRL0, 0x00a);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_LVLCTRL0, 0x007f);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_GENCTRL0, 0x0001);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_GENCTRL1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_STS, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_GENCTRL0, 0x0100);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_GENCTRL1, 0x1100);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_LOS_CTRL0, 0x000e);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_CTRL0, 0x0100);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_CTRL1, 0x0032);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_STS, 0x0001);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL2, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_LVL_CTRL, 0x0019);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL0, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL1, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_CTRL2, 0);
 -        CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_ERRCNT_SEL, 0);
 -    }
 -
      qtest_quit(qts);
  }
- static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
+diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
--                                fp_sysreg_storefn *storefn,
+index XXXXXXX..XXXXXXX 100644
--                                void *opaque)
+--- a/tests/qtest/meson.build
-+                                 fp_sysreg_storefn *storefn,
++++ b/tests/qtest/meson.build
-+                                 void *opaque)
+@@ -XXX,XX +XXX,XX @@ qtests_npcm7xx = \
- {
+    'npcm7xx_sdhci-test',
-     /* Do a read from an M-profile floating point system register */
+    'npcm7xx_smbus-test',
-     TCGv_i32 tmp;
+    'npcm7xx_timer-test',
-@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
+-   'npcm7xx_watchdog_timer-test'] + \
-     TCGv_i32 addr;
++   'npcm7xx_watchdog_timer-test',
++   'npcm_gmac-test'] + \
-     if (!a->a) {
+    (slirp.found() ? ['npcm7xx_emc-test'] : [])
--        offset = - offset;
+ qtests_aspeed = \
-+        offset = -offset;
+   ['aspeed_hace-test',
      }
      addr = load_reg(s, a->rn);
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
      TCGv_i32 value = tcg_temp_new_i32();
      if (!a->a) {
 -        offset = - offset;
 +        offset = -offset;
      }
      addr = load_reg(s, a->rn);
 --
-.20.1
+.34.1

-[PULL 34/57] target/arm: Implement MVE VSUB, VMUL (scalar)
+[PULL 18/35] hw/arm/smmuv3: add support for stage 1 access fault
-Implement the scalar forms of the MVE VSUB and VMUL insns.
+From: Luc Michel <luc.michel@amd.com>
+An access fault is raised when the Access Flag is not set in the
+looked-up PTE and the AFFD field is not set in the corresponding context
+descriptor. This was already implemented for stage 2. Implement it for
+stage 1 as well.
+Signed-off-by: Luc Michel <luc.michel@amd.com>
+Reviewed-by: Mostafa Saleh <smostafa@google.com>
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
+Tested-by: Mostafa Saleh <smostafa@google.com>
+Message-id: 20240213082211.3330400-1-luc.michel@amd.com
+[PMM: tweaked comment text]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-24-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 8 ++++++++
+ hw/arm/smmuv3-internal.h     |  1 +
- target/arm/mve.decode      | 2 ++
+ include/hw/arm/smmu-common.h |  1 +
- target/arm/mve_helper.c    | 2 ++
+ hw/arm/smmu-common.c         | 11 +++++++++++
- target/arm/translate-mve.c | 2 ++
+ hw/arm/smmuv3.c              |  1 +
 files changed, 14 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/smmuv3-internal.h
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/smmuv3-internal.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ static inline int pa_range(STE *ste)
- DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ #define CD_EPD(x, sel)   extract32((x)->word[0], (16 * (sel)) + 14, 1)
- DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ #define CD_ENDI(x)       extract32((x)->word[0], 15, 1)
+ #define CD_IPS(x)        extract32((x)->word[1], 0 , 3)
-+DEF_HELPER_FLAGS_4(mve_vsub_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++#define CD_AFFD(x)       extract32((x)->word[1], 3 , 1)
-+DEF_HELPER_FLAGS_4(mve_vsub_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ #define CD_TBI(x)        extract32((x)->word[1], 6 , 2)
-+DEF_HELPER_FLAGS_4(mve_vsub_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ #define CD_HD(x)         extract32((x)->word[1], 10 , 1)
  #define CD_HA(x)         extract32((x)->word[1], 11 , 1)
 diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/arm/smmu-common.h
 +++ b/include/hw/arm/smmu-common.h
@@ -XXX,XX +XXX,XX @@ typedef struct SMMUTransCfg {
      bool disabled;             /* smmu is disabled */
      bool bypassed;             /* translation is bypassed */
      bool aborted;              /* translation is aborted */
 +    bool affd;                 /* AF fault disable */
      uint32_t iotlb_hits;       /* counts IOTLB hits */
      uint32_t iotlb_misses;     /* counts IOTLB misses*/
      /* Used by stage-1 only. */
 diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/smmu-common.c
 +++ b/hw/arm/smmu-common.c
@@ -XXX,XX +XXX,XX @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
                                       pte_addr, pte, iova, gpa,
                                       block_size >> 20);
          }
 +
-+DEF_HELPER_FLAGS_4(mve_vmul_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++        /*
-+DEF_HELPER_FLAGS_4(mve_vmul_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++         * QEMU does not currently implement HTTU, so if AFFD and PTE.AF
-+DEF_HELPER_FLAGS_4(mve_vmul_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++         * are 0 we take an Access flag fault. (5.4. Context Descriptor)
 +         * An Access flag fault takes priority over a Permission fault.
 +         */
 +        if (!PTE_AF(pte) && !cfg->affd) {
 +            info->type = SMMU_PTW_ERR_ACCESS;
 +            goto error;
 +        }
 +
- DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+         ap = PTE_AP(pte);
- DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+         if (is_permission_fault(ap, perm)) {
- DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+             info->type = SMMU_PTW_ERR_PERMISSION;
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/hw/arm/smmuv3.c
-+++ b/target/arm/mve.decode
++++ b/hw/arm/smmuv3.c
-@@ -XXX,XX +XXX,XX @@ VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_no
+@@ -XXX,XX +XXX,XX @@ static int decode_cd(SMMUTransCfg *cfg, CD *cd, SMMUEventInfo *event)
- # Scalar operations
+     cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
+     cfg->tbi = CD_TBI(cd);
- VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
+     cfg->asid = CD_ASID(cd);
-+VSUB_scalar      1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
++    cfg->affd = CD_AFFD(cd);
-+VMUL_scalar      1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+     trace_smmuv3_decode_cd(cfg->oas);
-index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhsubu, do_vhsub_u)
      DO_2OP_SCALAR(OP##w, 4, uint32_t, FN)
  DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
 +DO_2OP_SCALAR_U(vsub_scalar, DO_SUB)
 +DO_2OP_SCALAR_U(vmul_scalar, DO_MUL)
  /*
   * Multiply add long dual accumulate ops.
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
      }
  DO_2OP_SCALAR(VADD_scalar, vadd_scalar)
 +DO_2OP_SCALAR(VSUB_scalar, vsub_scalar)
 +DO_2OP_SCALAR(VMUL_scalar, vmul_scalar)
  static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                               MVEGenDualAccOpFn *fn)
 --
-.20.1
+.34.1

-[PULL 08/57] target/arm: Handle writeback in VLDR/VSTR sysreg with no memory access
+[PULL 19/35] hw/arm/stellaris: Convert ADC controller to Resettable interface
-A few subcases of VLDR/VSTR sysreg succeed but do not perform a
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
 memory access:
  * VSTR of VPR when unprivileged
  * VLDR to VPR when unprivileged
  * VLDR to FPCXT_NS when fpInactive
-In these cases, even though we don't do the memory access we should
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-still update the base register and perform the stack limit check if
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-the insn's addressing mode specifies writeback.  Our implementation
+Message-id: 20240213155214.13619-2-philmd@linaro.org
-failed to do this, because we handle these side-effects inside the
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-memory_to_fp_sysreg() and fp_sysreg_to_memory() callback functions,
+---
-which are only called if there's something to load or store.
+ hw/arm/stellaris.c | 6 ++++--
 file changed, 4 insertions(+), 2 deletions(-)
-Fix this by adding an extra argument to the callbacks which is set to
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 true to actually perform the access and false to only do side effects
 like writeback, and calling the callback with do_access = false
 for the three cases listed above.
 This produces slightly suboptimal code for the case of a write
 to FPCXT_NS when the FPU is inactive and the insn didn't have
 side effects (ie no writeback, or via VMSR), in which case we'll
 generate a conditional branch over an unconditional branch.
 But this doesn't seem to be important enough to merit requiring
 the callback to report back whether it generated any code or not.
 Cc: qemu-stable@nongnu.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210618141019.10671-5-peter.maydell@linaro.org
 ---
  target/arm/translate-m-nocp.c | 102 ++++++++++++++++++++++++----------
 file changed, 72 insertions(+), 30 deletions(-)
 diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-m-nocp.c
+--- a/hw/arm/stellaris.c
-+++ b/target/arm/translate-m-nocp.c
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
+@@ -XXX,XX +XXX,XX @@ static void stellaris_adc_trigger(void *opaque, int irq, int level)
  /*
   * Emit code to store the sysreg to its final destination; frees the
 - * TCG temp 'value' it is passed.
 + * TCG temp 'value' it is passed. do_access is true to do the store,
 + * and false to skip it and only perform side-effects like base
 + * register writeback.
   */
 -typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
 +typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value,
 +                               bool do_access);
  /*
   * Emit code to load the value to be copied to the sysreg; returns
 - * a new TCG temporary
 + * a new TCG temporary. do_access is true to do the store,
 + * and false to skip it and only perform side-effects like base
 + * register writeback.
   */
 -typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
 +typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque,
 +                                  bool do_access);
  /* Common decode/access checks for fp sysreg read/write */
  typedef enum FPSysRegCheckResult {
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
      switch (regno) {
      case ARM_VFP_FPSCR:
 -        tmp = loadfn(s, opaque);
 +        tmp = loadfn(s, opaque, true);
          gen_helper_vfp_set_fpscr(cpu_env, tmp);
          tcg_temp_free_i32(tmp);
          gen_lookup_tb(s);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
      case ARM_VFP_FPSCR_NZCVQC:
      {
          TCGv_i32 fpscr;
 -        tmp = loadfn(s, opaque);
 +        tmp = loadfn(s, opaque, true);
          if (dc_isar_feature(aa32_mve, s)) {
              /* QC is only present for MVE; otherwise RES0 */
              TCGv_i32 qc = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
          break;
      }
      case ARM_VFP_FPCXT_NS:
 +    {
 +        TCGLabel *lab_active = gen_new_label();
 +
          lab_end = gen_new_label();
 -        /* fpInactive case: write is a NOP, so branch to end */
 -        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
 +        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
 +        /*
 +         * fpInactive case: write is a NOP, so only do side effects
 +         * like register writeback before we branch to end
 +         */
 +        loadfn(s, opaque, false);
 +        tcg_gen_br(lab_end);
 +
 +        gen_set_label(lab_active);
          /*
           * !fpInactive: if FPU disabled, take NOCP exception;
           * otherwise PreserveFPState(), and then FPCXT_NS writes
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
              break;
          }
          gen_preserve_fp_state(s);
 -        /* fall through */
 +    }
 +    /* fall through */
      case ARM_VFP_FPCXT_S:
      {
          TCGv_i32 sfpa, control;
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
           * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
           * bits [27:0] from value and zeroes bits [31:28].
           */
 -        tmp = loadfn(s, opaque);
 +        tmp = loadfn(s, opaque, true);
          sfpa = tcg_temp_new_i32();
          tcg_gen_shri_i32(sfpa, tmp, 31);
          control = load_cpu_field(v7m.control[M_REG_S]);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
      case ARM_VFP_VPR:
          /* Behaves as NOP if not privileged */
          if (IS_USER(s)) {
 +            loadfn(s, opaque, false);
              break;
          }
 -        tmp = loadfn(s, opaque);
 +        tmp = loadfn(s, opaque, true);
          store_cpu_field(tmp, v7m.vpr);
          break;
      case ARM_VFP_P0:
      {
          TCGv_i32 vpr;
 -        tmp = loadfn(s, opaque);
 +        tmp = loadfn(s, opaque, true);
          vpr = load_cpu_field(v7m.vpr);
          tcg_gen_deposit_i32(vpr, vpr, tmp,
                              R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
      case ARM_VFP_FPSCR:
          tmp = tcg_temp_new_i32();
          gen_helper_vfp_get_fpscr(tmp, cpu_env);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          break;
      case ARM_VFP_FPSCR_NZCVQC:
          tmp = tcg_temp_new_i32();
          gen_helper_vfp_get_fpscr(tmp, cpu_env);
          tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          break;
      case QEMU_VFP_FPSCR_NZCV:
          /*
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
           */
          tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
          tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          break;
      case ARM_VFP_FPCXT_S:
      {
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
           * Store result before updating FPSCR etc, in case
           * it is a memory write which causes an exception.
           */
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          /*
           * Now we must reset FPSCR from FPDSCR_NS, and clear
           * CONTROL.SFPA; so we'll end the TB here.
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
          gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
          /* fpInactive case: reads as FPDSCR_NS */
          TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          lab_end = gen_new_label();
          tcg_gen_br(lab_end);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
          tcg_gen_or_i32(tmp, tmp, sfpa);
          tcg_temp_free_i32(control);
          /* Store result before updating FPSCR, in case it faults */
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          /* If SFPA is zero then set FPSCR from FPDSCR_NS */
          fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
          zero = tcg_const_i32(0);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
      case ARM_VFP_VPR:
          /* Behaves as NOP if not privileged */
          if (IS_USER(s)) {
 +            storefn(s, opaque, NULL, false);
              break;
          }
          tmp = load_cpu_field(v7m.vpr);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          break;
      case ARM_VFP_P0:
          tmp = load_cpu_field(v7m.vpr);
          tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          break;
      default:
          g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
      return true;
  }
 -static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
 +static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value,
 +                             bool do_access)
  {
      arg_VMSR_VMRS *a = opaque;
 +    if (!do_access) {
 +        return;
 +    }
 +
      if (a->rt == 15) {
          /* Set the 4 flag bits in the CPSR */
          gen_set_nzcv(value);
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
      }
  }
--static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
+-static void stellaris_adc_reset(StellarisADCState *s)
-+static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque, bool do_access)
++static void stellaris_adc_reset_hold(Object *obj)
  {
-     arg_VMSR_VMRS *a = opaque;
++    StellarisADCState *s = STELLARIS_ADC(obj);
+     int n;
-+    if (!do_access) {
-+        return NULL;
+     for (n = 0; n < 4; n++) {
-+    }
+@@ -XXX,XX +XXX,XX @@ static void stellaris_adc_init(Object *obj)
-     return load_reg(s, a->rt);
+     memory_region_init_io(&s->iomem, obj, &stellaris_adc_ops, s,
                            "adc", 0x1000);
      sysbus_init_mmio(sbd, &s->iomem);
 -    stellaris_adc_reset(s);
      qdev_init_gpio_in(dev, stellaris_adc_trigger, 1);
  }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
+@@ -XXX,XX +XXX,XX @@ static const TypeInfo stellaris_i2c_info = {
-     }
+ static void stellaris_adc_class_init(ObjectClass *klass, void *data)
  {
      DeviceClass *dc = DEVICE_CLASS(klass);
 +    ResettableClass *rc = RESETTABLE_CLASS(klass);
 +    rc->phases.hold = stellaris_adc_reset_hold;
      dc->vmsd = &vmstate_stellaris_adc;
  }
--static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
-+static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value,
-+                                bool do_access)
- {
-     arg_vldr_sysreg *a = opaque;
-     uint32_t offset = a->imm;
-@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
-         offset = -offset;
-     }
-+    if (!do_access && !a->w) {
-+        return;
-+    }
-+
-     addr = load_reg(s, a->rn);
-     if (a->p) {
-         tcg_gen_addi_i32(addr, addr, offset);
-@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
-         gen_helper_v8m_stackcheck(cpu_env, addr);
-     }
--    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
--                    MO_UL | MO_ALIGN | s->be_data);
--    tcg_temp_free_i32(value);
-+    if (do_access) {
-+        gen_aa32_st_i32(s, value, addr, get_mem_index(s),
-+                        MO_UL | MO_ALIGN | s->be_data);
-+        tcg_temp_free_i32(value);
-+    }
-     if (a->w) {
-         /* writeback */
-@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
-     }
- }
--static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
-+static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque,
-+                                    bool do_access)
- {
-     arg_vldr_sysreg *a = opaque;
-     uint32_t offset = a->imm;
-     TCGv_i32 addr;
--    TCGv_i32 value = tcg_temp_new_i32();
-+    TCGv_i32 value = NULL;
-     if (!a->a) {
-         offset = -offset;
-     }
-+    if (!do_access && !a->w) {
-+        return NULL;
-+    }
-+
-     addr = load_reg(s, a->rn);
-     if (a->p) {
-         tcg_gen_addi_i32(addr, addr, offset);
-@@ -XXX,XX +XXX,XX @@ static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
-         gen_helper_v8m_stackcheck(cpu_env, addr);
-     }
--    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
--                    MO_UL | MO_ALIGN | s->be_data);
-+    if (do_access) {
-+        value = tcg_temp_new_i32();
-+        gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
-+                        MO_UL | MO_ALIGN | s->be_data);
-+    }
-     if (a->w) {
-         /* writeback */
 --
-.20.1
+.34.1

-[PULL 56/57] target/arm: Implement MTE3
+[PULL 20/35] hw/arm/stellaris: Convert I2C controller to Resettable interface
-From: Peter Collingbourne <pcc@google.com>
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
-MTE3 introduces an asymmetric tag checking mode, in which loads are
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
-checked synchronously and stores are checked asynchronously. Add
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-support for it.
+Message-id: 20240213155214.13619-3-philmd@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Collingbourne <pcc@google.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210616195614.11785-1-pcc@google.com
 [PMM: Add line to emulation.rst]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- docs/system/arm/emulation.rst |  1 +
+ hw/arm/stellaris.c | 26 ++++++++++++++++++++++----
- target/arm/cpu64.c            |  2 +-
+file changed, 22 insertions(+), 4 deletions(-)
  target/arm/mte_helper.c       | 82 ++++++++++++++++++++++-------------
 files changed, 53 insertions(+), 32 deletions(-)
-diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 index XXXXXXX..XXXXXXX 100644
---- a/docs/system/arm/emulation.rst
+--- a/hw/arm/stellaris.c
-+++ b/docs/system/arm/emulation.rst
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
+@@ -XXX,XX +XXX,XX @@ static void stellaris_sys_instance_init(Object *obj)
- - FEAT_LSE (Large System Extensions)
+     s->sysclk = qdev_init_clock_out(DEVICE(s), "SYSCLK");
  - FEAT_MTE (Memory Tagging Extension)
  - FEAT_MTE2 (Memory Tagging Extension)
 +- FEAT_MTE3 (MTE Asymmetric Fault Handling)
  - FEAT_PAN (Privileged access never)
  - FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN)
  - FEAT_PAuth (Pointer authentication)
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
           * during realize if the board provides no tag memory, much like
           * we do for EL2 with the virtualization=on property.
           */
 -        t = FIELD_DP64(t, ID_AA64PFR1, MTE, 2);
 +        t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3);
          cpu->isar.id_aa64pfr1 = t;
          t = cpu->isar.id_aa64mmfr0;
 diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mte_helper.c
 +++ b/target/arm/mte_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(stzgm_tags)(CPUARMState *env, uint64_t ptr, uint64_t val)
      }
  }
-+static void mte_sync_check_fail(CPUARMState *env, uint32_t desc,
+-/* I2C controller.  */
-+                                uint64_t dirty_ptr, uintptr_t ra)
++/*
-+{
++ * I2C controller.
-+    int is_write, syn;
++ * ??? For now we only implement the master interface.
 + */
  #define TYPE_STELLARIS_I2C "stellaris-i2c"
  OBJECT_DECLARE_SIMPLE_TYPE(stellaris_i2c_state, STELLARIS_I2C)
@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_write(void *opaque, hwaddr offset,
      stellaris_i2c_update(s);
  }
 -static void stellaris_i2c_reset(stellaris_i2c_state *s)
 +static void stellaris_i2c_reset_enter(Object *obj, ResetType type)
  {
 +    stellaris_i2c_state *s = STELLARIS_I2C(obj);
 +
-+    env->exception.vaddress = dirty_ptr;
+     if (s->mcs & STELLARIS_I2C_MCS_BUSBSY)
-+
+         i2c_end_transfer(s->bus);
 +    is_write = FIELD_EX32(desc, MTEDESC, WRITE);
 +    syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0, is_write,
 +                                0x11);
 +    raise_exception_ra(env, EXCP_DATA_ABORT, syn, exception_target_el(env), ra);
 +    g_assert_not_reached();
 +}
 +
-+static void mte_async_check_fail(CPUARMState *env, uint64_t dirty_ptr,
++static void stellaris_i2c_reset_hold(Object *obj)
 +                                 uintptr_t ra, ARMMMUIdx arm_mmu_idx, int el)
 +{
-+    int select;
++    stellaris_i2c_state *s = STELLARIS_I2C(obj);
-+
-+    if (regime_has_2_ranges(arm_mmu_idx)) {
+     s->msa = 0;
-+        select = extract64(dirty_ptr, 55, 1);
+     s->mcs = 0;
-+    } else {
+@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_reset(stellaris_i2c_state *s)
-+        select = 0;
+     s->mimr = 0;
-+    }
+     s->mris = 0;
-+    env->cp15.tfsr_el[el] |= 1 << select;
+     s->mcr = 0;
 +#ifdef CONFIG_USER_ONLY
 +    /*
 +     * Stand in for a timer irq, setting _TIF_MTE_ASYNC_FAULT,
 +     * which then sends a SIGSEGV when the thread is next scheduled.
 +     * This cpu will return to the main loop at the end of the TB,
 +     * which is rather sooner than "normal".  But the alternative
 +     * is waiting until the next syscall.
 +     */
 +    qemu_cpu_kick(env_cpu(env));
 +#endif
 +}
 +
- /* Record a tag check failure.  */
++static void stellaris_i2c_reset_exit(Object *obj)
- static void mte_check_fail(CPUARMState *env, uint32_t desc,
++{
-                            uint64_t dirty_ptr, uintptr_t ra)
++    stellaris_i2c_state *s = STELLARIS_I2C(obj);
 +
      stellaris_i2c_update(s);
  }
@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_init(Object *obj)
      memory_region_init_io(&s->iomem, obj, &stellaris_i2c_ops, s,
                            "i2c", 0x1000);
      sysbus_init_mmio(sbd, &s->iomem);
 -    /* ??? For now we only implement the master interface.  */
 -    stellaris_i2c_reset(s);
  }
  /* Analogue to Digital Converter.  This is only partially implemented,
@@ -XXX,XX +XXX,XX @@ type_init(stellaris_machine_init)
  static void stellaris_i2c_class_init(ObjectClass *klass, void *data)
  {
-     int mmu_idx = FIELD_EX32(desc, MTEDESC, MIDX);
+     DeviceClass *dc = DEVICE_CLASS(klass);
-     ARMMMUIdx arm_mmu_idx = core_to_aa64_mmu_idx(mmu_idx);
++    ResettableClass *rc = RESETTABLE_CLASS(klass);
--    int el, reg_el, tcf, select, is_write, syn;
-+    int el, reg_el, tcf;
++    rc->phases.enter = stellaris_i2c_reset_enter;
-     uint64_t sctlr;
++    rc->phases.hold = stellaris_i2c_reset_hold;
++    rc->phases.exit = stellaris_i2c_reset_exit;
-     reg_el = regime_el(env, arm_mmu_idx);
+     dc->vmsd = &vmstate_stellaris_i2c;
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
      switch (tcf) {
      case 1:
          /* Tag check fail causes a synchronous exception. */
 -        env->exception.vaddress = dirty_ptr;
 -
 -        is_write = FIELD_EX32(desc, MTEDESC, WRITE);
 -        syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
 -                                    is_write, 0x11);
 -        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
 -                           exception_target_el(env), ra);
 -        /* noreturn, but fall through to the assert anyway */
 +        mte_sync_check_fail(env, desc, dirty_ptr, ra);
 +        break;
      case 0:
          /*
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
      case 2:
          /* Tag check fail causes asynchronous flag set.  */
 -        if (regime_has_2_ranges(arm_mmu_idx)) {
 -            select = extract64(dirty_ptr, 55, 1);
 -        } else {
 -            select = 0;
 -        }
 -        env->cp15.tfsr_el[el] |= 1 << select;
 -#ifdef CONFIG_USER_ONLY
 -        /*
 -         * Stand in for a timer irq, setting _TIF_MTE_ASYNC_FAULT,
 -         * which then sends a SIGSEGV when the thread is next scheduled.
 -         * This cpu will return to the main loop at the end of the TB,
 -         * which is rather sooner than "normal".  But the alternative
 -         * is waiting until the next syscall.
 -         */
 -        qemu_cpu_kick(env_cpu(env));
 -#endif
 +        mte_async_check_fail(env, dirty_ptr, ra, arm_mmu_idx, el);
          break;
 -    default:
 -        /* Case 3: Reserved. */
 -        qemu_log_mask(LOG_GUEST_ERROR,
 -                      "Tag check failure with SCTLR_EL%d.TCF%s "
 -                      "set to reserved value %d\n",
 -                      reg_el, el ? "" : "0", tcf);
 +    case 3:
 +        /*
 +         * Tag check fail causes asynchronous flag set for stores, or
 +         * a synchronous exception for loads.
 +         */
 +        if (FIELD_EX32(desc, MTEDESC, WRITE)) {
 +            mte_async_check_fail(env, dirty_ptr, ra, arm_mmu_idx, el);
 +        } else {
 +            mte_sync_check_fail(env, desc, dirty_ptr, ra);
 +        }
          break;
      }
  }
 --
-.20.1
+.34.1

-[PULL 57/57] docs/system: arm: Add nRF boards description
+[PULL 21/35] hw/arm/stellaris: Add missing QOM 'machine' parent
-From: Alexandre Iooss <erdnaxe@crans.org>
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
-This adds the target guide for BBC Micro:bit.
+QDev objects created with qdev_new() need to manually add
 their parent relationship with object_property_add_child().
-Information is taken from https://wiki.qemu.org/Features/MicroBit
+This commit plug the devices which aren't part of the SoC;
-and from hw/arm/nrf51_soc.c.
+they will be plugged into a SoC container in the next one.
-Signed-off-by: Alexandre Iooss <erdnaxe@crans.org>
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Joel Stanley <joel@jms.id.au>
+Message-id: 20240213155214.13619-4-philmd@linaro.org
 Message-id: 20210621075625.540471-1-erdnaxe@crans.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- docs/system/arm/nrf.rst    | 51 ++++++++++++++++++++++++++++++++++++++
+ hw/arm/stellaris.c | 4 ++++
- docs/system/target-arm.rst |  1 +
+file changed, 4 insertions(+)
  MAINTAINERS                |  1 +
 files changed, 53 insertions(+)
  create mode 100644 docs/system/arm/nrf.rst
-diff --git a/docs/system/arm/nrf.rst b/docs/system/arm/nrf.rst
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/docs/system/arm/nrf.rst
@@ -XXX,XX +XXX,XX @@
 +Nordic nRF boards (``microbit``)
 +================================
 +
 +The `Nordic nRF`_ chips are a family of ARM-based System-on-Chip that
 +are designed to be used for low-power and short-range wireless solutions.
 +
 +.. _Nordic nRF: https://www.nordicsemi.com/Products
 +
 +The nRF51 series is the first series for short range wireless applications.
 +It is superseded by the nRF52 series.
 +The following machines are based on this chip :
 +
 +- ``microbit``       BBC micro:bit board with nRF51822 SoC
 +
 +There are other series such as nRF52, nRF53 and nRF91 which are currently not
 +supported by QEMU.
 +
 +Supported devices
 +-----------------
 +
 + * ARM Cortex-M0 (ARMv6-M)
 + * Serial ports (UART)
 + * Clock controller
 + * Timers
 + * Random Number Generator (RNG)
 + * GPIO controller
 + * NVMC
 + * SWI
 +
 +Missing devices
 +---------------
 +
 + * Watchdog
 + * Real-Time Clock (RTC) controller
 + * TWI (i2c)
 + * SPI controller
 + * Analog to Digital Converter (ADC)
 + * Quadrature decoder
 + * Radio
 +
 +Boot options
 +------------
 +
 +The Micro:bit machine can be started using the ``-device`` option to load a
 +firmware in `ihex format`_. Example:
 +
 +.. _ihex format: https://en.wikipedia.org/wiki/Intel_HEX
 +
 +.. code-block:: bash
 +
 +  $ qemu-system-arm -M microbit -device loader,file=test.hex
 diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
 index XXXXXXX..XXXXXXX 100644
---- a/docs/system/target-arm.rst
+--- a/hw/arm/stellaris.c
-+++ b/docs/system/target-arm.rst
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@ undocumented; you can get a complete list by running
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-    arm/digic
+                                    &error_fatal);
-    arm/musicpal
-    arm/gumstix
+             ssddev = qdev_new("ssd0323");
-+   arm/nrf
++            object_property_add_child(OBJECT(ms), "oled", OBJECT(ssddev));
-    arm/nseries
+             qdev_prop_set_uint8(ssddev, "cs", 1);
-    arm/nuvoton
+             qdev_realize_and_unref(ssddev, bus, &error_fatal);
-    arm/orangepi
-diff --git a/MAINTAINERS b/MAINTAINERS
+             gpio_d_splitter = qdev_new(TYPE_SPLIT_IRQ);
-index XXXXXXX..XXXXXXX 100644
++            object_property_add_child(OBJECT(ms), "splitter",
---- a/MAINTAINERS
++                                      OBJECT(gpio_d_splitter));
-+++ b/MAINTAINERS
+             qdev_prop_set_uint32(gpio_d_splitter, "num-lines", 2);
-@@ -XXX,XX +XXX,XX @@ F: hw/*/microbit*.c
+             qdev_realize_and_unref(gpio_d_splitter, NULL, &error_fatal);
- F: include/hw/*/nrf51*.h
+             qdev_connect_gpio_out(
- F: include/hw/*/microbit*.h
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
- F: tests/qtest/microbit-test.c
+         DeviceState *gpad;
-+F: docs/system/arm/nrf.rst
+         gpad = qdev_new(TYPE_STELLARIS_GAMEPAD);
- AVR Machines
++        object_property_add_child(OBJECT(ms), "gamepad", OBJECT(gpad));
- -------------
+         for (i = 0; i < ARRAY_SIZE(gpad_keycode); i++) {
              qlist_append_int(gpad_keycode_list, gpad_keycode[i]);
          }
 --
-.20.1
+.34.1

-[PULL 43/57] target/arm: Implement MVE VQSHL (vector)
+[PULL 22/35] hw/arm/stellaris: Add missing QOM 'SoC' parent
-Implement the MVE VQSHL insn (encoding T4, which is the
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
 vector-shift-by-vector version).
-The DO_SQSHL_OP and DO_UQSHL_OP macros here are derived from
+QDev objects created with qdev_new() need to manually add
-the neon_helper.c code for qshl_u{8,16,32} and qshl_s{8,16,32}.
+their parent relationship with object_property_add_child().
+Since we don't model the SoC, just use a QOM container.
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20240213155214.13619-5-philmd@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-33-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  8 ++++++++
+ hw/arm/stellaris.c | 11 ++++++++++-
- target/arm/mve.decode      | 12 ++++++++++++
+file changed, 10 insertions(+), 1 deletion(-)
  target/arm/mve_helper.c    | 34 ++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 56 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/stellaris.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/stellaris.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
- DEF_HELPER_FLAGS_4(mve_vqsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+      * 400fe000 system control
- DEF_HELPER_FLAGS_4(mve_vqsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+      */
-+DEF_HELPER_FLAGS_4(mve_vqshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++    Object *soc_container;
-+DEF_HELPER_FLAGS_4(mve_vqshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     DeviceState *gpio_dev[7], *nvic;
-+DEF_HELPER_FLAGS_4(mve_vqshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     qemu_irq gpio_in[7][8];
      qemu_irq gpio_out[7][8];
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
      flash_size = (((board->dc0 & 0xffff) + 1) << 1) * 1024;
      sram_size = ((board->dc0 >> 18) + 1) * 1024;
 +    soc_container = object_new("container");
 +    object_property_add_child(OBJECT(ms), "soc", soc_container);
 +
-+DEF_HELPER_FLAGS_4(mve_vqshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     /* Flash programming is done via the SCU, so pretend it is ROM.  */
-+DEF_HELPER_FLAGS_4(mve_vqshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     memory_region_init_rom(flash, NULL, "stellaris.flash", flash_size,
-+DEF_HELPER_FLAGS_4(mve_vqshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+                            &error_fatal);
-+
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
- DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+      * need its sysclk output.
- DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+      */
- DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+     ssys_dev = qdev_new(TYPE_STELLARIS_SYS);
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
++    object_property_add_child(soc_container, "sys", OBJECT(ssys_dev));
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+     /*
-+++ b/target/arm/mve.decode
+      * Most devices come preprogrammed with a MAC address in the user data.
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
- @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
+     sysbus_realize_and_unref(SYS_BUS_DEVICE(ssys_dev), &error_fatal);
- @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
+     nvic = qdev_new(TYPE_ARMV7M);
-+# The _rev suffix indicates that Vn and Vm are reversed. This is
++    object_property_add_child(soc_container, "v7m", OBJECT(nvic));
-+# the case for shifts. In the Arm ARM these insns are documented
+     qdev_prop_set_uint32(nvic, "num-irq", NUM_IRQ_LINES);
-+# with the Vm and Vn fields in their usual places, but in the
+     qdev_prop_set_uint8(nvic, "num-prio-bits", NUM_PRIO_BITS);
-+# assembly the operands are listed "backwards", ie in the order
+     qdev_prop_set_string(nvic, "cpu-type", ms->cpu_type);
-+# Qd, Qm, Qn where other insns use Qd, Qn, Qm. For QEMU we choose
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-+# to consider Vm and Vn as being in different fields in the insn.
-+# This gives us consistency with A64 and Neon.
+             dev = qdev_new(TYPE_STELLARIS_GPTM);
-+@2op_rev .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qn qn=%qm
+             sbd = SYS_BUS_DEVICE(dev);
-+
++            object_property_add_child(soc_container, "gptm[*]", OBJECT(dev));
- @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
+             qdev_connect_clock_in(dev, "clk",
- @2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
+                                   qdev_get_clock_out(ssys_dev, "SYSCLK"));
+             sysbus_realize_and_unref(sbd, &error_fatal);
-@@ -XXX,XX +XXX,XX @@ VQADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
- VQSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
- VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
+     if (board->dc1 & (1 << 3)) { /* watchdog present */
+         dev = qdev_new(TYPE_LUMINARY_WATCHDOG);
-+VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
+-
-+VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
++        object_property_add_child(soc_container, "wdg", OBJECT(dev));
-+
+         qdev_connect_clock_in(dev, "WDOGCLK",
- # Vector miscellaneous
+                               qdev_get_clock_out(ssys_dev, "SYSCLK"));
- VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+             SysBusDevice *sbd;
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
+             dev = qdev_new("pl011_luminary");
-+++ b/target/arm/mve_helper.c
++            object_property_add_child(soc_container, "uart[*]", OBJECT(dev));
-@@ -XXX,XX +XXX,XX @@ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
+             sbd = SYS_BUS_DEVICE(dev);
-         mve_advance_vpt(env);                                           \
+             qdev_prop_set_chr(dev, "chardev", serial_hd(i));
-     }
+             sysbus_realize_and_unref(sbd, &error_fatal);
+@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
-+/* provide unsigned 2-op helpers for all sizes */
+         DeviceState *enet;
-+#define DO_2OP_SAT_U(OP, FN)                    \
-+    DO_2OP_SAT(OP##b, 1, uint8_t, FN)           \
+         enet = qdev_new("stellaris_enet");
-+    DO_2OP_SAT(OP##h, 2, uint16_t, FN)          \
++        object_property_add_child(soc_container, "enet", OBJECT(enet));
-+    DO_2OP_SAT(OP##w, 4, uint32_t, FN)
+         if (nd) {
-+
+             qdev_set_nic_properties(enet, nd);
-+/* provide signed 2-op helpers for all sizes */
+         } else {
 +#define DO_2OP_SAT_S(OP, FN)                    \
 +    DO_2OP_SAT(OP##b, 1, int8_t, FN)            \
 +    DO_2OP_SAT(OP##h, 2, int16_t, FN)           \
 +    DO_2OP_SAT(OP##w, 4, int32_t, FN)
 +
  #define DO_AND(N, M)  ((N) & (M))
  #define DO_BIC(N, M)  ((N) & ~(M))
  #define DO_ORR(N, M)  ((N) | (M))
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqsubsb, 1, int8_t, DO_SQSUB_B)
  DO_2OP_SAT(vqsubsh, 2, int16_t, DO_SQSUB_H)
  DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
 +/*
 + * This wrapper fixes up the impedance mismatch between do_sqrshl_bhs()
 + * and friends wanting a uint32_t* sat and our needing a bool*.
 + */
 +#define WRAP_QRSHL_HELPER(FN, N, M, ROUND, satp)                        \
 +    ({                                                                  \
 +        uint32_t su32 = 0;                                              \
 +        typeof(N) r = FN(N, (int8_t)(M), sizeof(N) * 8, ROUND, &su32);  \
 +        if (su32) {                                                     \
 +            *satp = true;                                               \
 +        }                                                               \
 +        r;                                                              \
 +    })
 +
 +#define DO_SQSHL_OP(N, M, satp) \
 +    WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, false, satp)
 +#define DO_UQSHL_OP(N, M, satp) \
 +    WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, false, satp)
 +
 +DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
 +DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
 +
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                  uint32_t rm)                            \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQADD_S, vqadds)
  DO_2OP(VQADD_U, vqaddu)
  DO_2OP(VQSUB_S, vqsubs)
  DO_2OP(VQSUB_U, vqsubu)
 +DO_2OP(VQSHL_S, vqshls)
 +DO_2OP(VQSHL_U, vqshlu)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
-.20.1
+.34.1

-[PULL 42/57] target/arm: Implement MVE VQADD, VQSUB (vector)
+[PULL 23/35] target/arm: Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
-Implement the vector forms of the MVE VQADD and VQSUB insns.
+We support two different encodings for the AArch32 IMPDEF
 CBAR register -- older cores like the Cortex A9, A7, A15
 have this at 4, c15, c0, 0; newer cores like the
 Cortex A35, A53, A57 and A72 have it at 1 c15 c0 0.
 When we implemented this we picked which encoding to
 use based on whether the CPU set ARM_FEATURE_AARCH64.
 However this isn't right for three cases:
  * the qemu-system-arm 'max' CPU, which is supposed to be
    a variant on a Cortex-A57; it ought to use the same
    encoding the A57 does and which the AArch64 'max'
    exposes to AArch32 guest code
  * the Cortex-R52, which is AArch32-only but has the CBAR
    at the newer encoding (and where we incorrectly are
    not yet setting ARM_FEATURE_CBAR_RO anyway)
  * any possible future support for other v8 AArch32
    only CPUs, or for supporting "boot the CPU into
    AArch32 mode" on our existing cores like the A57 etc
 Make the decision of the encoding be based on whether
 the CPU implements the ARM_FEATURE_V8 flag instead.
 This changes the behaviour only for the qemu-system-arm
 '-cpu max'. We don't expect anybody to be relying on the
 old behaviour because:
  * it's not what the real hardware Cortex-A57 does
    (and that's what our ID register claims we are)
  * we don't implement the memory-mapped GICv3 support
    which is the only thing that exists at the peripheral
    base address pointed to by the register
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-32-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-2-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 16 ++++++++++++++++
+ target/arm/helper.c | 2 +-
- target/arm/mve.decode      |  5 +++++
+file changed, 1 insertion(+), 1 deletion(-)
  target/arm/mve_helper.c    | 14 ++++++++++++++
  target/arm/translate-mve.c |  4 ++++
 files changed, 39 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/target/arm/helper.c
-+++ b/target/arm/helper-mve.h
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
- DEF_HELPER_FLAGS_4(mve_vqrdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+          * AArch64 cores we might need to add a specific feature flag
- DEF_HELPER_FLAGS_4(mve_vqrdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+          * to indicate cores with "flavour 2" CBAR.
+          */
-+DEF_HELPER_FLAGS_4(mve_vqaddsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+-        if (arm_feature(env, ARM_FEATURE_AARCH64)) {
-+DEF_HELPER_FLAGS_4(mve_vqaddsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++        if (arm_feature(env, ARM_FEATURE_V8)) {
-+DEF_HELPER_FLAGS_4(mve_vqaddsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+             /* 32 bit view is [31:18] 0...0 [43:32]. */
-+
+             uint32_t cbar32 = (extract64(cpu->reset_cbar, 18, 14) << 18)
-+DEF_HELPER_FLAGS_4(mve_vqaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+                 | extract64(cpu->reset_cbar, 32, 12);
 +DEF_HELPER_FLAGS_4(mve_vqadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vqsubsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqsubsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqsubsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vqsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
  VQDMULH          1110 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
  VQRDMULH         1111 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
 +VQADD_S          111 0 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
 +VQADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
 +VQSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 +VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqrdmulhb, 1, int8_t, DO_QRDMULH_B)
  DO_2OP_SAT(vqrdmulhh, 2, int16_t, DO_QRDMULH_H)
  DO_2OP_SAT(vqrdmulhw, 4, int32_t, DO_QRDMULH_W)
 +DO_2OP_SAT(vqaddub, 1, uint8_t, DO_UQADD_B)
 +DO_2OP_SAT(vqadduh, 2, uint16_t, DO_UQADD_H)
 +DO_2OP_SAT(vqadduw, 4, uint32_t, DO_UQADD_W)
 +DO_2OP_SAT(vqaddsb, 1, int8_t, DO_SQADD_B)
 +DO_2OP_SAT(vqaddsh, 2, int16_t, DO_SQADD_H)
 +DO_2OP_SAT(vqaddsw, 4, int32_t, DO_SQADD_W)
 +
 +DO_2OP_SAT(vqsubub, 1, uint8_t, DO_UQSUB_B)
 +DO_2OP_SAT(vqsubuh, 2, uint16_t, DO_UQSUB_H)
 +DO_2OP_SAT(vqsubuw, 4, uint32_t, DO_UQSUB_W)
 +DO_2OP_SAT(vqsubsb, 1, int8_t, DO_SQSUB_B)
 +DO_2OP_SAT(vqsubsh, 2, int16_t, DO_SQSUB_H)
 +DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
 +
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                  uint32_t rm)                            \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULL_TS, vmullts)
  DO_2OP(VMULL_TU, vmulltu)
  DO_2OP(VQDMULH, vqdmulh)
  DO_2OP(VQRDMULH, vqrdmulh)
 +DO_2OP(VQADD_S, vqadds)
 +DO_2OP(VQADD_U, vqaddu)
 +DO_2OP(VQSUB_S, vqsubs)
 +DO_2OP(VQSUB_U, vqsubu)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
-.20.1
+.34.1

-[PULL 41/57] target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
+[PULL 24/35] target/arm: The Cortex-R52 has a read-only CBAR
-Implement the vector forms of the MVE VQDMULH and VQRDMULH insns.
+The Cortex-R52 implements the Configuration Base Address Register
 (CBAR), as a read-only register.  Add ARM_FEATURE_CBAR_RO to this CPU
 type, so that our implementation provides the register and the
 associated qdev property.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-31-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-3-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  8 ++++++++
+ target/arm/tcg/cpu32.c | 1 +
- target/arm/mve.decode      |  3 +++
+file changed, 1 insertion(+)
  target/arm/mve_helper.c    | 27 +++++++++++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 40 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/target/arm/tcg/cpu32.c b/target/arm/tcg/cpu32.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/target/arm/tcg/cpu32.c
-+++ b/target/arm/helper-mve.h
++++ b/target/arm/tcg/cpu32.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
- DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     set_feature(&cpu->env, ARM_FEATURE_PMSA);
- DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     set_feature(&cpu->env, ARM_FEATURE_NEON);
+     set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
-+DEF_HELPER_FLAGS_4(mve_vqdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++    set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
-+DEF_HELPER_FLAGS_4(mve_vqdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     cpu->midr = 0x411fd133; /* r1p3 */
-+DEF_HELPER_FLAGS_4(mve_vqdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     cpu->revidr = 0x00000000;
-+
+     cpu->reset_fpsid = 0x41034023;
 +DEF_HELPER_FLAGS_4(mve_vqrdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VMULL_BU         111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
  VMULL_TS         111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
  VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 +VQDMULH          1110 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
 +VQRDMULH         1111 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
          mve_advance_vpt(env);                                           \
      }
 +#define DO_2OP_SAT(OP, ESIZE, TYPE, FN)                                 \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \
 +    {                                                                   \
 +        TYPE *d = vd, *n = vn, *m = vm;                                 \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        bool qc = false;                                                \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            bool sat = false;                                           \
 +            TYPE r = FN(n[H##ESIZE(e)], m[H##ESIZE(e)], &sat);          \
 +            mergemask(&d[H##ESIZE(e)], r, mask);                        \
 +            qc |= sat & mask & 1;                                       \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
  #define DO_AND(N, M)  ((N) & (M))
  #define DO_BIC(N, M)  ((N) & ~(M))
  #define DO_ORR(N, M)  ((N) | (M))
@@ -XXX,XX +XXX,XX @@ static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  #define DO_QRDMULH_W(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 30)) >> 31, \
                                           INT32_MIN, INT32_MAX, s)
 +DO_2OP_SAT(vqdmulhb, 1, int8_t, DO_QDMULH_B)
 +DO_2OP_SAT(vqdmulhh, 2, int16_t, DO_QDMULH_H)
 +DO_2OP_SAT(vqdmulhw, 4, int32_t, DO_QDMULH_W)
 +
 +DO_2OP_SAT(vqrdmulhb, 1, int8_t, DO_QRDMULH_B)
 +DO_2OP_SAT(vqrdmulhh, 2, int16_t, DO_QRDMULH_H)
 +DO_2OP_SAT(vqrdmulhw, 4, int32_t, DO_QRDMULH_W)
 +
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                  uint32_t rm)                            \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULL_BS, vmullbs)
  DO_2OP(VMULL_BU, vmullbu)
  DO_2OP(VMULL_TS, vmullts)
  DO_2OP(VMULL_TU, vmulltu)
 +DO_2OP(VQDMULH, vqdmulh)
 +DO_2OP(VQRDMULH, vqrdmulh)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
-.20.1
+.34.1

-[PULL 30/57] target/arm: Implement MVE VMLALDAV
+[PULL 25/35] target/arm: Add Cortex-R52 IMPDEF sysregs
-Implement the MVE VMLALDAV insn, which multiplies pairs of integer
+Add the Cortex-R52 IMPDEF sysregs, by defining them here and
-elements, accumulating them into a 64-bit result in a pair of
+also by enabling the AUXCR feature which defines the ACTLR
-general-purpose registers.
+and HACTLR registers. As is our usual practice, we make these
 simple reads-as-zero stubs for now.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-20-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-4-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  8 ++++
+ target/arm/tcg/cpu32.c | 108 +++++++++++++++++++++++++++++++++++++++++
- target/arm/translate.h     | 10 ++++
+file changed, 108 insertions(+)
  target/arm/mve.decode      | 15 ++++++
  target/arm/mve_helper.c    | 34 ++++++++++++++
  target/arm/translate-mve.c | 96 ++++++++++++++++++++++++++++++++++++++
 files changed, 163 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/target/arm/tcg/cpu32.c b/target/arm/tcg/cpu32.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/target/arm/tcg/cpu32.c
-+++ b/target/arm/helper-mve.h
++++ b/target/arm/tcg/cpu32.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
- DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     define_arm_cp_regs(cpu, cortexr5_cp_reginfo);
  DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 +DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 +DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 +DEF_HELPER_FLAGS_4(mve_vmlaldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 +
 +DEF_HELPER_FLAGS_4(mve_vmlaldavuh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 +DEF_HELPER_FLAGS_4(mve_vmlaldavuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline int negate(DisasContext *s, int x)
      return -x;
  }
-+static inline int plus_1(DisasContext *s, int x)
++static const ARMCPRegInfo cortex_r52_cp_reginfo[] = {
-+{
++    { .name = "CPUACTLR", .cp = 15, .opc1 = 0, .crm = 15,
-+    return x + 1;
++      .access = PL1_RW, .type = ARM_CP_CONST | ARM_CP_64BIT, .resetvalue = 0 },
-+}
++    { .name = "IMP_ATCMREGIONR",
-+
++      .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 0,
- static inline int plus_2(DisasContext *s, int x)
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
- {
++    { .name = "IMP_BTCMREGIONR",
-     return x + 2;
++      .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 1,
-@@ -XXX,XX +XXX,XX @@ static inline int times_4(DisasContext *s, int x)
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-     return x * 4;
++    { .name = "IMP_CTCMREGIONR",
- }
++      .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 2,
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-+static inline int times_2_plus_1(DisasContext *s, int x)
++    { .name = "IMP_CSCTLR",
-+{
++      .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 0,
-+    return x * 2 + 1;
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-+}
++    { .name = "IMP_BPCTLR",
-+
++      .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 1,
- static inline int arm_dc_feature(DisasContext *dc, int feature)
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
- {
++    { .name = "IMP_MEMPROTCLR",
-     return (dc->features & (1ULL << feature)) != 0;
++      .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 2,
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-index XXXXXXX..XXXXXXX 100644
++    { .name = "IMP_SLAVEPCTLR",
---- a/target/arm/mve.decode
++      .cp = 15, .opc1 = 0, .crn = 11, .crm = 0, .opc2 = 0,
-+++ b/target/arm/mve.decode
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-@@ -XXX,XX +XXX,XX @@ VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
++    { .name = "IMP_PERIPHREGIONR",
- VDUP             1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
++      .cp = 15, .opc1 = 0, .crn = 15, .crm = 0, .opc2 = 0,
- VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
- VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
++    { .name = "IMP_FLASHIFREGIONR",
-+
++      .cp = 15, .opc1 = 0, .crn = 15, .crm = 0, .opc2 = 1,
-+# multiply-add long dual accumulate
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-+# rdahi: bits [3:1] from insn, bit 0 is 1
++    { .name = "IMP_BUILDOPTR",
-+# rdalo: bits [3:1] from insn, bit 0 is 0
++      .cp = 15, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 0,
-+%rdahi 20:3 !function=times_2_plus_1
++      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
-+%rdalo 13:3 !function=times_2
++    { .name = "IMP_PINOPTR",
-+# size bit is 0 for 16 bit, 1 for 32 bit
++      .cp = 15, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 7,
-+%size_16 16:1 !function=plus_1
++      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
-+
++    { .name = "IMP_QOSR",
-+&vmlaldav rdahi rdalo size qn qm x a
++      .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 1,
-+
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-+@vmlaldav        .... .... . ... ... . ... . .... .... qm:3 . \
++    { .name = "IMP_BUSTIMEOUTR",
-+                 qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
++      .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 2,
-+VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
-+VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
++    { .name = "IMP_INTMONR",
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
++      .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 4,
-index XXXXXXX..XXXXXXX 100644
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
---- a/target/arm/mve_helper.c
++    { .name = "IMP_ICERR0",
-+++ b/target/arm/mve_helper.c
++      .cp = 15, .opc1 = 2, .crn = 15, .crm = 0, .opc2 = 0,
-@@ -XXX,XX +XXX,XX @@ DO_2OP_S(vhadds, do_vhadd_s)
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
- DO_2OP_U(vhaddu, do_vhadd_u)
++    { .name = "IMP_ICERR1",
- DO_2OP_S(vhsubs, do_vhsub_s)
++      .cp = 15, .opc1 = 2, .crn = 15, .crm = 0, .opc2 = 1,
- DO_2OP_U(vhsubu, do_vhsub_u)
++      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_DCERR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 1, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_DCERR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 1, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMERR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMERR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMSYNDR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 2,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TCMSYNDR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 3,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_FLASHERR0",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 3, .opc2 = 0,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_FLASHERR1",
 +      .cp = 15, .opc1 = 2, .crn = 15, .crm = 3, .opc2 = 1,
 +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDR0",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 0, .opc2 = 0,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_CBDGBR1",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 0, .opc2 = 1,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TESTR0",
 +      .cp = 15, .opc1 = 4, .crn = 15, .crm = 0, .opc2 = 0,
 +      .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
 +    { .name = "IMP_TESTR1",
 +      .cp = 15, .opc1 = 4, .crn = 15, .crm = 0, .opc2 = 1,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDCI",
 +      .cp = 15, .opc1 = 0, .crn = 15, .crm = 15, .opc2 = 0,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDCT",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 2, .opc2 = 0,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGICT",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 2, .opc2 = 1,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGDCD",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 4, .opc2 = 0,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +    { .name = "IMP_CDBGICD",
 +      .cp = 15, .opc1 = 3, .crn = 15, .crm = 4, .opc2 = 1,
 +      .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
 +};
 +
 +
-+/*
+ static void cortex_r52_initfn(Object *obj)
-+ * Multiply add long dual accumulate ops.
+ {
-+ */
+     ARMCPU *cpu = ARM_CPU(obj);
-+#define DO_LDAV(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC)                 \
+@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
-+    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
+     set_feature(&cpu->env, ARM_FEATURE_NEON);
-+                                    void *vm, uint64_t a)               \
+     set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
-+    {                                                                   \
+     set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
-+        uint16_t mask = mve_element_mask(env);                          \
++    set_feature(&cpu->env, ARM_FEATURE_AUXCR);
-+        unsigned e;                                                     \
+     cpu->midr = 0x411fd133; /* r1p3 */
-+        TYPE *n = vn, *m = vm;                                          \
+     cpu->revidr = 0x00000000;
-+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+     cpu->reset_fpsid = 0x41034023;
-+            if (mask & 1) {                                             \
+@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
-+                if (e & 1) {                                            \
-+                    a ODDACC                                            \
+     cpu->pmsav7_dregion = 16;
-+                        (int64_t)n[H##ESIZE(e - 1 * XCHG)] * m[H##ESIZE(e)]; \
+     cpu->pmsav8r_hdregion = 16;
 +                } else {                                                \
 +                    a EVENACC                                           \
 +                        (int64_t)n[H##ESIZE(e + 1 * XCHG)] * m[H##ESIZE(e)]; \
 +                }                                                       \
 +            }                                                           \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +        return a;                                                       \
 +    }
 +
-+DO_LDAV(vmlaldavsh, 2, int16_t, false, +=, +=)
++    define_arm_cp_regs(cpu, cortex_r52_cp_reginfo);
 +DO_LDAV(vmlaldavxsh, 2, int16_t, true, +=, +=)
 +DO_LDAV(vmlaldavsw, 4, int32_t, false, +=, +=)
 +DO_LDAV(vmlaldavxsw, 4, int32_t, true, +=, +=)
 +
 +DO_LDAV(vmlaldavuh, 2, uint16_t, false, +=, +=)
 +DO_LDAV(vmlaldavuw, 4, uint32_t, false, +=, +=)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
  typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 +typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
      }
  }
-+static bool mve_skip_first_beat(DisasContext *s)
+ static void cortex_r5f_initfn(Object *obj)
 +{
 +    /* Return true if PSR.ECI says we must skip the first beat of this insn */
 +    switch (s->eci) {
 +    case ECI_NONE:
 +        return false;
 +    case ECI_A0:
 +    case ECI_A0A1:
 +    case ECI_A0A1A2:
 +    case ECI_A0A1A2B0:
 +        return true;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
  static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
  {
      TCGv_i32 addr;
@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULL_BS, vmullbs)
  DO_2OP(VMULL_BU, vmullbu)
  DO_2OP(VMULL_TS, vmullts)
  DO_2OP(VMULL_TU, vmulltu)
 +
 +static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
 +                             MVEGenDualAccOpFn *fn)
 +{
 +    TCGv_ptr qn, qm;
 +    TCGv_i64 rda;
 +    TCGv_i32 rdalo, rdahi;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qn | a->qm) ||
 +        !fn) {
 +        return false;
 +    }
 +    /*
 +     * rdahi == 13 is UNPREDICTABLE; rdahi == 15 is a related
 +     * encoding; rdalo always has bit 0 clear so cannot be 13 or 15.
 +     */
 +    if (a->rdahi == 13 || a->rdahi == 15) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qn = mve_qreg_ptr(a->qn);
 +    qm = mve_qreg_ptr(a->qm);
 +
 +    /*
 +     * This insn is subject to beat-wise execution. Partial execution
 +     * of an A=0 (no-accumulate) insn which does not execute the first
 +     * beat must start with the current rda value, not 0.
 +     */
 +    if (a->a || mve_skip_first_beat(s)) {
 +        rda = tcg_temp_new_i64();
 +        rdalo = load_reg(s, a->rdalo);
 +        rdahi = load_reg(s, a->rdahi);
 +        tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
 +        tcg_temp_free_i32(rdalo);
 +        tcg_temp_free_i32(rdahi);
 +    } else {
 +        rda = tcg_const_i64(0);
 +    }
 +
 +    fn(rda, cpu_env, qn, qm, rda);
 +    tcg_temp_free_ptr(qn);
 +    tcg_temp_free_ptr(qm);
 +
 +    rdalo = tcg_temp_new_i32();
 +    rdahi = tcg_temp_new_i32();
 +    tcg_gen_extrl_i64_i32(rdalo, rda);
 +    tcg_gen_extrh_i64_i32(rdahi, rda);
 +    store_reg(s, a->rdalo, rdalo);
 +    store_reg(s, a->rdahi, rdahi);
 +    tcg_temp_free_i64(rda);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
 +{
 +    static MVEGenDualAccOpFn * const fns[4][2] = {
 +        { NULL, NULL },
 +        { gen_helper_mve_vmlaldavsh, gen_helper_mve_vmlaldavxsh },
 +        { gen_helper_mve_vmlaldavsw, gen_helper_mve_vmlaldavxsw },
 +        { NULL, NULL },
 +    };
 +    return do_long_dual_acc(s, a, fns[a->size][a->x]);
 +}
 +
 +static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
 +{
 +    static MVEGenDualAccOpFn * const fns[4][2] = {
 +        { NULL, NULL },
 +        { gen_helper_mve_vmlaldavuh, NULL },
 +        { gen_helper_mve_vmlaldavuw, NULL },
 +        { NULL, NULL },
 +    };
 +    return do_long_dual_acc(s, a, fns[a->size][a->x]);
 +}
 --
-.20.1
+.34.1

-[PULL 40/57] target/arm: Implement MVE VQDMULL scalar
+[PULL 26/35] target/arm: Allow access to SPSR_hyp from hyp mode
-Implement the MVE VQDMULL scalar insn. This multiplies the top or
+Architecturally, the AArch32 MSR/MRS to/from banked register
-bottom half of each element by the scalar, doubles and saturates
+instructions are UNPREDICTABLE for attempts to access a banked
-to a double-width result.
+register that the guest could access in a more direct way (e.g.
 using this insn to access r8_fiq when already in FIQ mode).  QEMU has
 chosen to UNDEF on all of these.
-Note that this encoding overlaps with VQADD and VQSUB; it uses
+However, for the case of accessing SPSR_hyp from hyp mode, it turns
-what in VQADD and VQSUB would be the 'size=0b11' encoding.
+out that real hardware permits this, with the same effect as if the
 guest had directly written to SPSR. Further, there is some
 guest code out there that assumes it can do this, because it
 happens to work on hardware: an example Cortex-R52 startup code
 fragment uses this, and it got copied into various other places,
 including Zephyr. Zephyr was fixed to not use this:
  https://github.com/zephyrproject-rtos/zephyr/issues/47330
 but other examples are still out there, like the selftest
 binary for the MPS3-AN536.
 For convenience of being able to run guest code, permit
 this UNPREDICTABLE access instead of UNDEFing it.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-30-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-5-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  5 +++
+ target/arm/tcg/op_helper.c | 43 ++++++++++++++++++++++++++------------
- target/arm/mve.decode      | 23 +++++++++++---
+ target/arm/tcg/translate.c | 19 +++++++++++------
- target/arm/mve_helper.c    | 65 ++++++++++++++++++++++++++++++++++++++
+files changed, 43 insertions(+), 19 deletions(-)
  target/arm/translate-mve.c | 30 ++++++++++++++++++
 files changed, 119 insertions(+), 4 deletions(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/target/arm/tcg/op_helper.c b/target/arm/tcg/op_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/target/arm/tcg/op_helper.c
-+++ b/target/arm/helper-mve.h
++++ b/target/arm/tcg/op_helper.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ static void msr_mrs_banked_exc_checks(CPUARMState *env, uint32_t tgtmode,
- DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+      */
- DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+     int curmode = env->uncached_cpsr & CPSR_M;
-+DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-    if (regno == 17) {
-+DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-        /* ELR_Hyp: a special case because access from tgtmode is OK */
-+DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-        if (curmode != ARM_CPU_MODE_HYP && curmode != ARM_CPU_MODE_MON) {
-+DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-            goto undef;
-+
++    if (tgtmode == ARM_CPU_MODE_HYP) {
- DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
++        /*
- DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
++         * Handle Hyp target regs first because some are special cases
- DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
++         * which don't want the usual "not accessible from tgtmode" check.
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
++         */
 +        switch (regno) {
 +        case 16 ... 17: /* ELR_Hyp, SPSR_Hyp */
 +            if (curmode != ARM_CPU_MODE_HYP && curmode != ARM_CPU_MODE_MON) {
 +                goto undef;
 +            }
 +            break;
 +        case 13:
 +            if (curmode != ARM_CPU_MODE_MON) {
 +                goto undef;
 +            }
 +            break;
 +        default:
 +            g_assert_not_reached();
          }
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void msr_mrs_banked_exc_checks(CPUARMState *env, uint32_t tgtmode,
          }
      }
 -    if (tgtmode == ARM_CPU_MODE_HYP) {
 -        /* SPSR_Hyp, r13_hyp: accessible from Monitor mode only */
 -        if (curmode != ARM_CPU_MODE_MON) {
 -            goto undef;
 -        }
 -    }
 -
      return;
  undef:
@@ -XXX,XX +XXX,XX @@ void HELPER(msr_banked)(CPUARMState *env, uint32_t value, uint32_t tgtmode,
      switch (regno) {
      case 16: /* SPSRs */
 -        env->banked_spsr[bank_number(tgtmode)] = value;
 +        if (tgtmode == (env->uncached_cpsr & CPSR_M)) {
 +            /* Only happens for SPSR_Hyp access in Hyp mode */
 +            env->spsr = value;
 +        } else {
 +            env->banked_spsr[bank_number(tgtmode)] = value;
 +        }
          break;
      case 17: /* ELR_Hyp */
          env->elr_el[2] = value;
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mrs_banked)(CPUARMState *env, uint32_t tgtmode, uint32_t regno)
      switch (regno) {
      case 16: /* SPSRs */
 -        return env->banked_spsr[bank_number(tgtmode)];
 +        if (tgtmode == (env->uncached_cpsr & CPSR_M)) {
 +            /* Only happens for SPSR_Hyp access in Hyp mode */
 +            return env->spsr;
 +        } else {
 +            return env->banked_spsr[bank_number(tgtmode)];
 +        }
      case 17: /* ELR_Hyp */
          return env->elr_el[2];
      case 13:
 diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/target/arm/tcg/translate.c
-+++ b/target/arm/mve.decode
++++ b/target/arm/tcg/translate.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static bool msr_banked_access_decode(DisasContext *s, int r, int sysm, int rn,
- %qm 5:1 1:3
+         break;
- %qn 7:1 17:3
+     case ARM_CPU_MODE_HYP:
+         /*
-+# VQDMULL has size in bit 28: 0 for 16 bit, 1 for 32 bit
+-         * SPSR_hyp and r13_hyp can only be accessed from Monitor mode
-+%size_28 28:1 !function=plus_1
+-         * (and so we can forbid accesses from EL2 or below). elr_hyp
-+
+-         * can be accessed also from Hyp mode, so forbid accesses from
- &vldr_vstr rn qd imm p a w size l u
+-         * EL0 or EL1.
- &1op qd qm size
++         * r13_hyp can only be accessed from Monitor mode, and so we
- &2op qd qm qn size
++         * can forbid accesses from EL2 or below.
-@@ -XXX,XX +XXX,XX @@
++         * elr_hyp can be accessed also from Hyp mode, so forbid
- @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
++         * accesses from EL0 or EL1.
++         * SPSR_hyp is supposed to be in the same category as r13_hyp
- @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
++         * and UNPREDICTABLE if accessed from anything except Monitor
-+@2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
++         * mode. However there is some real-world code that will do
++         * it because at least some hardware happens to permit the
- # Vector loads and stores
++         * access. (Notably a standard Cortex-R52 startup code fragment
++         * does this.) So we permit SPSR_hyp from Hyp mode also, to allow
-@@ -XXX,XX +XXX,XX @@ VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
++         * this (incorrect) guest code to run.
- VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+          */
- VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+-        if (!arm_dc_feature(s, ARM_FEATURE_EL2) || s->current_el < 2 ||
+-            (s->current_el < 3 && *regno != 17)) {
--VQADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
++        if (!arm_dc_feature(s, ARM_FEATURE_EL2) || s->current_el < 2
--VQADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
++            || (s->current_el < 3 && *regno != 16 && *regno != 17)) {
--VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+             goto undef;
--VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+         }
-+{
+         break;
 +  VQADD_S_scalar  1110  1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
 +  VQADD_U_scalar  1111  1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
 +  VQDMULLB_scalar 111 . 1110 0 . 11 ... 0 ... 0 1111 . 110 .... @2scalar_nosz \
 +                  size=%size_28
 +}
 +
 +{
 +  VQSUB_S_scalar  1110  1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 +  VQSUB_U_scalar  1111  1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 +  VQDMULLT_scalar 111 . 1110 0 . 11 ... 0 ... 1 1111 . 110 .... @2scalar_nosz \
 +                  size=%size_28
 +}
 +
  VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
  VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 +
  # Predicate operations
  %mask_22_13      22:1 13:3
  VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 +/*
 + * Long saturating scalar ops. As with DO_2OP_L, TYPE and H are for the
 + * input (smaller) type and LESIZE, LTYPE, LH for the output (long) type.
 + * SATMASK specifies which bits of the predicate mask matter for determining
 + * whether to propagate a saturation indication into FPSCR.QC -- for
 + * the 16x16->32 case we must check only the bit corresponding to the T or B
 + * half that we used, but for the 32x32->64 case we propagate if the mask
 + * bit is set for either half.
 + */
 +#define DO_2OP_SAT_SCALAR_L(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN, SATMASK) \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
 +                                uint32_t rm)                            \
 +    {                                                                   \
 +        LTYPE *d = vd;                                                  \
 +        TYPE *n = vn;                                                   \
 +        TYPE m = rm;                                                    \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned le;                                                    \
 +        bool qc = false;                                                \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
 +            bool sat = false;                                           \
 +            LTYPE r = FN((LTYPE)n[H##ESIZE(le * 2 + TOP)], m, &sat);    \
 +            mergemask(&d[H##LESIZE(le)], r, mask);                      \
 +            qc |= sat && (mask & SATMASK);                              \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +static inline int32_t do_qdmullh(int16_t n, int16_t m, bool *sat)
 +{
 +    int64_t r = ((int64_t)n * m) * 2;
 +    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat);
 +}
 +
 +static inline int64_t do_qdmullw(int32_t n, int32_t m, bool *sat)
 +{
 +    /* The multiply can't overflow, but the doubling might */
 +    int64_t r = (int64_t)n * m;
 +    if (r > INT64_MAX / 2) {
 +        *sat = true;
 +        return INT64_MAX;
 +    } else if (r < INT64_MIN / 2) {
 +        *sat = true;
 +        return INT64_MIN;
 +    } else {
 +        return r * 2;
 +    }
 +}
 +
 +#define SATMASK16B 1
 +#define SATMASK16T (1 << 2)
 +#define SATMASK32 ((1 << 4) | 1)
 +
 +DO_2OP_SAT_SCALAR_L(vqdmullb_scalarh, 0, 2, int16_t, 4, int32_t, \
 +                    do_qdmullh, SATMASK16B)
 +DO_2OP_SAT_SCALAR_L(vqdmullb_scalarw, 0, 4, int32_t, 8, int64_t, \
 +                    do_qdmullw, SATMASK32)
 +DO_2OP_SAT_SCALAR_L(vqdmullt_scalarh, 1, 2, int16_t, 4, int32_t, \
 +                    do_qdmullh, SATMASK16T)
 +DO_2OP_SAT_SCALAR_L(vqdmullt_scalarw, 1, 4, int32_t, 8, int64_t, \
 +                    do_qdmullw, SATMASK32)
 +
  static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
  {
      m &= 0xff;
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
  DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
  DO_2OP_SCALAR(VBRSR, vbrsr)
 +static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
 +{
 +    static MVEGenTwoOpScalarFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vqdmullb_scalarh,
 +        gen_helper_mve_vqdmullb_scalarw,
 +        NULL,
 +    };
 +    if (a->qd == a->qn && a->size == MO_32) {
 +        /* UNPREDICTABLE; we choose to undef */
 +        return false;
 +    }
 +    return do_2op_scalar(s, a, fns[a->size]);
 +}
 +
 +static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
 +{
 +    static MVEGenTwoOpScalarFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vqdmullt_scalarh,
 +        gen_helper_mve_vqdmullt_scalarw,
 +        NULL,
 +    };
 +    if (a->qd == a->qn && a->size == MO_32) {
 +        /* UNPREDICTABLE; we choose to undef */
 +        return false;
 +    }
 +    return do_2op_scalar(s, a, fns[a->size]);
 +}
 +
  static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                               MVEGenDualAccOpFn *fn)
  {
 --
-.20.1
+.34.1

-[PULL 39/57] target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
+[PULL 27/35] hw/misc/mps2-scc: Fix condition for CFG3 register
-Implement the MVE VQDMULH and VQRDMULH scalar insns, which multiply
+We currently guard the CFG3 register read with
-elements by the scalar, double, possibly round, take the high half
+ (scc_partno(s) == 0x524 && scc_partno(s) == 0x547)
-and saturate.
+which is clearly wrong as it is never true.
+This register is present on all board types except AN524
+and AN527; correct the condition.
+Fixes: 6ac80818941829c0 ("hw/misc/mps2-scc: Implement changes for AN547")
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-29-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-6-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  8 ++++++++
+ hw/misc/mps2-scc.c | 2 +-
- target/arm/mve.decode      |  3 +++
+file changed, 1 insertion(+), 1 deletion(-)
  target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 38 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/misc/mps2-scc.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/misc/mps2-scc.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
- DEF_HELPER_FLAGS_4(mve_vqsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+         r = s->cfg2;
- DEF_HELPER_FLAGS_4(mve_vqsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+         break;
+     case A_CFG3:
-+DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+-        if (scc_partno(s) == 0x524 && scc_partno(s) == 0x547) {
-+DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++        if (scc_partno(s) == 0x524 || scc_partno(s) == 0x547) {
-+DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+             /* CFG3 reserved on AN524 */
-+
+             goto bad_offset;
-+DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+         }
 +DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
  VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
  VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 +VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 +VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 +
  # Predicate operations
  %mask_22_13      22:1 13:3
  VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  #define DO_UQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT16_MAX, s)
  #define DO_UQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT32_MAX, s)
 +/*
 + * For QDMULH and QRDMULH we simplify "double and shift by esize" into
 + * "shift by esize-1", adjusting the QRDMULH rounding constant to match.
 + */
 +#define DO_QDMULH_B(n, m, s) do_sat_bhw(((int64_t)n * m) >> 7, \
 +                                        INT8_MIN, INT8_MAX, s)
 +#define DO_QDMULH_H(n, m, s) do_sat_bhw(((int64_t)n * m) >> 15, \
 +                                        INT16_MIN, INT16_MAX, s)
 +#define DO_QDMULH_W(n, m, s) do_sat_bhw(((int64_t)n * m) >> 31, \
 +                                        INT32_MIN, INT32_MAX, s)
 +
 +#define DO_QRDMULH_B(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 6)) >> 7, \
 +                                         INT8_MIN, INT8_MAX, s)
 +#define DO_QRDMULH_H(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 14)) >> 15, \
 +                                         INT16_MIN, INT16_MAX, s)
 +#define DO_QRDMULH_W(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 30)) >> 31, \
 +                                         INT32_MIN, INT32_MAX, s)
 +
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                  uint32_t rm)                            \
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqsubs_scalarb, 1, int8_t, DO_SQSUB_B)
  DO_2OP_SAT_SCALAR(vqsubs_scalarh, 2, int16_t, DO_SQSUB_H)
  DO_2OP_SAT_SCALAR(vqsubs_scalarw, 4, int32_t, DO_SQSUB_W)
 +DO_2OP_SAT_SCALAR(vqdmulh_scalarb, 1, int8_t, DO_QDMULH_B)
 +DO_2OP_SAT_SCALAR(vqdmulh_scalarh, 2, int16_t, DO_QDMULH_H)
 +DO_2OP_SAT_SCALAR(vqdmulh_scalarw, 4, int32_t, DO_QDMULH_W)
 +DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
 +DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
 +DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 +
  static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
  {
      m &= 0xff;
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQADD_S_scalar, vqadds_scalar)
  DO_2OP_SCALAR(VQADD_U_scalar, vqaddu_scalar)
  DO_2OP_SCALAR(VQSUB_S_scalar, vqsubs_scalar)
  DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
 +DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
 +DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
  DO_2OP_SCALAR(VBRSR, vbrsr)
  static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
 --
-.20.1
+.34.1

-[PULL 28/57] target/arm: Implement MVE VHADD, VHSUB
+[PULL 28/35] hw/misc/mps2-scc: Factor out which-board conditionals
-Implement MVE VHADD and VHSUB insns, which perform an addition
+The MPS SCC device has a lot of different flavours for the various
-or subtraction and then halve the result.
+different MPS FPGA images, which look mostly similar but have
 differences in how particular registers are handled.  Currently we
 deal with this with a lot of open-coded checks on scc_partno(), but
 as we add more board types this is getting a bit hard to read.
 Factor out the conditions into some functions which we can
 give more descriptive names to.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-18-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-7-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 14 ++++++++++++++
+ hw/misc/mps2-scc.c | 45 +++++++++++++++++++++++++++++++--------------
- target/arm/mve.decode      |  5 +++++
+file changed, 31 insertions(+), 14 deletions(-)
  target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
  target/arm/translate-mve.c |  4 ++++
 files changed, 48 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/misc/mps2-scc.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/misc/mps2-scc.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vabdsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@ static int scc_partno(MPS2SCC *s)
- DEF_HELPER_FLAGS_4(mve_vabdub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     return extract32(s->id, 4, 8);
- DEF_HELPER_FLAGS_4(mve_vabduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ }
- DEF_HELPER_FLAGS_4(mve_vabduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+
++/* Is CFG_REG2 present? */
-+DEF_HELPER_FLAGS_4(mve_vhaddsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++static bool have_cfg2(MPS2SCC *s)
 +DEF_HELPER_FLAGS_4(mve_vhaddsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhaddsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vhsubsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhsubsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhsubsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VMIN_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
  VABD_S           111 0 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
  VABD_U           111 1 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
 +VHADD_S          111 0 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
 +VHADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
 +VHSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 +VHSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vminu, DO_MIN)
  DO_2OP_S(vabds, DO_ABD)
  DO_2OP_U(vabdu, DO_ABD)
 +
 +static inline uint32_t do_vhadd_u(uint32_t n, uint32_t m)
 +{
-+    return ((uint64_t)n + m) >> 1;
++    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
 +}
 +
-+static inline int32_t do_vhadd_s(int32_t n, int32_t m)
++/* Is CFG_REG3 present? */
 +static bool have_cfg3(MPS2SCC *s)
 +{
-+    return ((int64_t)n + m) >> 1;
++    return scc_partno(s) != 0x524 && scc_partno(s) != 0x547;
 +}
 +
-+static inline uint32_t do_vhsub_u(uint32_t n, uint32_t m)
++/* Is CFG_REG5 present? */
 +static bool have_cfg5(MPS2SCC *s)
 +{
-+    return ((uint64_t)n - m) >> 1;
++    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
 +}
 +
-+static inline int32_t do_vhsub_s(int32_t n, int32_t m)
++/* Is CFG_REG6 present? */
 +static bool have_cfg6(MPS2SCC *s)
 +{
-+    return ((int64_t)n - m) >> 1;
++    return scc_partno(s) == 0x524;
 +}
 +
-+DO_2OP_S(vhadds, do_vhadd_s)
+ /* Handle a write via the SYS_CFG channel to the specified function/device.
-+DO_2OP_U(vhaddu, do_vhadd_u)
+  * Return false on error (reported to guest via SYS_CFGCTRL ERROR bit).
-+DO_2OP_S(vhsubs, do_vhsub_s)
+  */
-+DO_2OP_U(vhsubu, do_vhsub_u)
+@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+         r = s->cfg1;
-index XXXXXXX..XXXXXXX 100644
+         break;
---- a/target/arm/translate-mve.c
+     case A_CFG2:
-+++ b/target/arm/translate-mve.c
+-        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
-@@ -XXX,XX +XXX,XX @@ DO_2OP(VMIN_S, vmins)
+-            /* CFG2 reserved on other boards */
- DO_2OP(VMIN_U, vminu)
++        if (!have_cfg2(s)) {
- DO_2OP(VABD_S, vabds)
+             goto bad_offset;
- DO_2OP(VABD_U, vabdu)
+         }
-+DO_2OP(VHADD_S, vhadds)
+         r = s->cfg2;
-+DO_2OP(VHADD_U, vhaddu)
+         break;
-+DO_2OP(VHSUB_S, vhsubs)
+     case A_CFG3:
-+DO_2OP(VHSUB_U, vhsubu)
+-        if (scc_partno(s) == 0x524 || scc_partno(s) == 0x547) {
 -            /* CFG3 reserved on AN524 */
 +        if (!have_cfg3(s)) {
              goto bad_offset;
          }
          /* These are user-settable DIP switches on the board. We don't
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
          r = s->cfg4;
          break;
      case A_CFG5:
 -        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
 -            /* CFG5 reserved on other boards */
 +        if (!have_cfg5(s)) {
              goto bad_offset;
          }
          r = s->cfg5;
          break;
      case A_CFG6:
 -        if (scc_partno(s) != 0x524) {
 -            /* CFG6 reserved on other boards */
 +        if (!have_cfg6(s)) {
              goto bad_offset;
          }
          r = s->cfg6;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
          }
          break;
      case A_CFG2:
 -        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
 -            /* CFG2 reserved on other boards */
 +        if (!have_cfg2(s)) {
              goto bad_offset;
          }
          /* AN524: QSPI Select signal */
          s->cfg2 = value;
          break;
      case A_CFG5:
 -        if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
 -            /* CFG5 reserved on other boards */
 +        if (!have_cfg5(s)) {
              goto bad_offset;
          }
          /* AN524: ACLK frequency in Hz */
          s->cfg5 = value;
          break;
      case A_CFG6:
 -        if (scc_partno(s) != 0x524) {
 -            /* CFG6 reserved on other boards */
 +        if (!have_cfg6(s)) {
              goto bad_offset;
          }
          /* AN524: Clock divider for BRAM */
 --
-.20.1
+.34.1

-[PULL 38/57] target/arm: Implement MVE VQADD and VQSUB
+[PULL 29/35] hw/misc/mps2-scc: Make changes needed for AN536 FPGA image
-Implement the MVE VQADD and VQSUB insns, which perform saturating
+The MPS2 SCC device is broadly the same for all FPGA images, but has
-addition of a scalar to each element.  Note that individual bytes of
+minor differences in the behaviour of the CFG registers depending on
-each result element are used or discarded according to the predicate
+the image. In many cases we don't really care about the functionality
-mask, but FPSCR.QC is only set if the predicate mask for the lowest
+controlled by these registers and a reads-as-written or similar
-byte of the element is set.
+behaviour is sufficient for the moment.
 For the AN536 the required behaviour is:
  * A_CFG0 has CPU reset and halt bits
     - implement as reads-as-written for the moment
  * A_CFG1 has flash or ATCM address 0 remap handling
     - QEMU doesn't model this; implement as reads-as-written
  * A_CFG2 has QSPI select (like AN524)
     - implemented (no behaviour, as with AN524)
  * A_CFG3 is MCC_MSB_ADDR "additional MCC addressing bits"
     - QEMU doesn't care about these, so use the existing
       RAZ behaviour for convenience
  * A_CFG4 is board rev (like all other images)
     - no change needed
  * A_CFG5 is ACLK frq in hz (like AN524)
     - implemented as reads-as-written, as for other boards
  * A_CFG6 is core 0 vector table base address
     - implemented as reads-as-written for the moment
  * A_CFG7 is core 1 vector table base address
     - implemented as reads-as-written for the moment
 Make the changes necessary for this; leave TODO comments where
 appropriate to indicate where we might want to come back and
 implement things like CPU reset.
 The other aspects of the device specific to this FPGA image (like the
 values of the board ID and similar registers) will be set via the
 device's qdev properties.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-28-peter.maydell@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20240206132931.38376-8-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 16 ++++++++++
+ include/hw/misc/mps2-scc.h |   1 +
- target/arm/mve.decode      |  5 +++
+ hw/misc/mps2-scc.c         | 101 +++++++++++++++++++++++++++++++++----
- target/arm/mve_helper.c    | 62 ++++++++++++++++++++++++++++++++++++++
+files changed, 92 insertions(+), 10 deletions(-)
- target/arm/translate-mve.c |  4 +++
-files changed, 87 insertions(+)
+diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
 diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/include/hw/misc/mps2-scc.h
-+++ b/target/arm/helper-mve.h
++++ b/include/hw/misc/mps2-scc.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ struct MPS2SCC {
- DEF_HELPER_FLAGS_4(mve_vhsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+     uint32_t cfg4;
- DEF_HELPER_FLAGS_4(mve_vhsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+     uint32_t cfg5;
+     uint32_t cfg6;
-+DEF_HELPER_FLAGS_4(mve_vqadds_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++    uint32_t cfg7;
-+DEF_HELPER_FLAGS_4(mve_vqadds_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+     uint32_t cfgdata_rtn;
-+DEF_HELPER_FLAGS_4(mve_vqadds_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+     uint32_t cfgdata_out;
-+
+     uint32_t cfgctrl;
-+DEF_HELPER_FLAGS_4(mve_vqaddu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 +DEF_HELPER_FLAGS_4(mve_vqaddu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqaddu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqsubs_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqsubs_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqsubs_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/hw/misc/mps2-scc.c
-+++ b/target/arm/mve.decode
++++ b/hw/misc/mps2-scc.c
-@@ -XXX,XX +XXX,XX @@ VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
+@@ -XXX,XX +XXX,XX @@ REG32(CFG3, 0xc)
- VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
+ REG32(CFG4, 0x10)
- VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+ REG32(CFG5, 0x14)
- VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+ REG32(CFG6, 0x18)
-+
++REG32(CFG7, 0x1c)
-+VQADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+ REG32(CFGDATA_RTN, 0xa0)
-+VQADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+ REG32(CFGDATA_OUT, 0xa4)
-+VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+ REG32(CFGCTRL, 0xa8)
-+VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+@@ -XXX,XX +XXX,XX @@ static int scc_partno(MPS2SCC *s)
- VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
+ /* Is CFG_REG2 present? */
+ static bool have_cfg2(MPS2SCC *s)
- # Predicate operations
+ {
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+-    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
-index XXXXXXX..XXXXXXX 100644
++    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547 ||
---- a/target/arm/mve_helper.c
++        scc_partno(s) == 0x536;
-+++ b/target/arm/mve_helper.c
+ }
-@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhaddu, do_vhadd_u)
- DO_2OP_S(vhsubs, do_vhsub_s)
+ /* Is CFG_REG3 present? */
- DO_2OP_U(vhsubu, do_vhsub_u)
+ static bool have_cfg3(MPS2SCC *s)
+ {
-+static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
+-    return scc_partno(s) != 0x524 && scc_partno(s) != 0x547;
-+{
++    return scc_partno(s) != 0x524 && scc_partno(s) != 0x547 &&
-+    if (val > max) {
++        scc_partno(s) != 0x536;
-+        *s = true;
+ }
-+        return max;
-+    } else if (val < min) {
+ /* Is CFG_REG5 present? */
-+        *s = true;
+ static bool have_cfg5(MPS2SCC *s)
-+        return min;
+ {
 -    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
 +    return scc_partno(s) == 0x524 || scc_partno(s) == 0x547 ||
 +        scc_partno(s) == 0x536;
  }
  /* Is CFG_REG6 present? */
  static bool have_cfg6(MPS2SCC *s)
  {
 -    return scc_partno(s) == 0x524;
 +    return scc_partno(s) == 0x524 || scc_partno(s) == 0x536;
 +}
 +
 +/* Is CFG_REG7 present? */
 +static bool have_cfg7(MPS2SCC *s)
 +{
 +    return scc_partno(s) == 0x536;
 +}
 +
 +/* Does CFG_REG0 drive the 'remap' GPIO output? */
 +static bool cfg0_is_remap(MPS2SCC *s)
 +{
 +    return scc_partno(s) != 0x536;
 +}
 +
 +/* Is CFG_REG1 driving a set of LEDs? */
 +static bool cfg1_is_leds(MPS2SCC *s)
 +{
 +    return scc_partno(s) != 0x536;
  }
  /* Handle a write via the SYS_CFG channel to the specified function/device.
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
          if (!have_cfg3(s)) {
              goto bad_offset;
          }
 -        /* These are user-settable DIP switches on the board. We don't
 +        /*
 +         * These are user-settable DIP switches on the board. We don't
           * model that, so just return zeroes.
 +         *
 +         * TODO: for AN536 this is MCC_MSB_ADDR "additional MCC addressing
 +         * bits". These change which part of the DDR4 the motherboard
 +         * configuration controller can see in its memory map (see the
 +         * appnote section 2.4). QEMU doesn't model the MCC at all, so these
 +         * bits are not interesting to us; read-as-zero is as good as anything
 +         * else.
           */
          r = 0;
          break;
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
          }
          r = s->cfg6;
          break;
 +    case A_CFG7:
 +        if (!have_cfg7(s)) {
 +            goto bad_offset;
 +        }
 +        r = s->cfg7;
 +        break;
      case A_CFGDATA_RTN:
          r = s->cfgdata_rtn;
          break;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
           * we always reflect bit 0 in the 'remap' GPIO output line,
           * and let the board wire it up or not as it chooses.
           * TODO on some boards bit 1 is CPU_WAIT.
 +         *
 +         * TODO: on the AN536 this register controls reset and halt
 +         * for both CPUs. For the moment we don't implement this, so the
 +         * register just reads as written.
           */
          s->cfg0 = value;
 -        qemu_set_irq(s->remap, s->cfg0 & 1);
 +        if (cfg0_is_remap(s)) {
 +            qemu_set_irq(s->remap, s->cfg0 & 1);
 +        }
          break;
      case A_CFG1:
          s->cfg1 = value;
 -        for (size_t i = 0; i < ARRAY_SIZE(s->led); i++) {
 -            led_set_state(s->led[i], extract32(value, i, 1));
 +        /*
 +         * On most boards this register drives LEDs.
 +         *
 +         * TODO: for AN536 this controls whether flash and ATCM are
 +         * enabled or disabled on reset. QEMU doesn't model this, and
 +         * always wires up RAM in the ATCM area and ROM in the flash area.
 +         */
 +        if (cfg1_is_leds(s)) {
 +            for (size_t i = 0; i < ARRAY_SIZE(s->led); i++) {
 +                led_set_state(s->led[i], extract32(value, i, 1));
 +            }
          }
          break;
      case A_CFG2:
          if (!have_cfg2(s)) {
              goto bad_offset;
          }
 -        /* AN524: QSPI Select signal */
 +        /* AN524, AN536: QSPI Select signal */
          s->cfg2 = value;
          break;
      case A_CFG5:
          if (!have_cfg5(s)) {
              goto bad_offset;
          }
 -        /* AN524: ACLK frequency in Hz */
 +        /* AN524, AN536: ACLK frequency in Hz */
          s->cfg5 = value;
          break;
      case A_CFG6:
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
              goto bad_offset;
          }
          /* AN524: Clock divider for BRAM */
 +        /* AN536: Core 0 vector table base address */
 +        s->cfg6 = value;
 +        break;
 +    case A_CFG7:
 +        if (!have_cfg7(s)) {
 +            goto bad_offset;
 +        }
 +        /* AN536: Core 1 vector table base address */
          s->cfg6 = value;
          break;
      case A_CFGDATA_OUT:
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_finalize(Object *obj)
      g_free(s->oscclk_reset);
  }
 +static bool cfg7_needed(void *opaque)
 +{
 +    MPS2SCC *s = opaque;
 +
 +    return have_cfg7(s);
 +}
 +
 +static const VMStateDescription vmstate_cfg7 = {
 +    .name = "mps2-scc/cfg7",
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .needed = cfg7_needed,
 +    .fields = (const VMStateField[]) {
 +        VMSTATE_UINT32(cfg7, MPS2SCC),
 +        VMSTATE_END_OF_LIST()
 +    }
-+    return val;
++};
-+}
++
-+
+ static const VMStateDescription mps2_scc_vmstate = {
-+#define DO_SQADD_B(n, m, s) do_sat_bhw((int64_t)n + m, INT8_MIN, INT8_MAX, s)
+     .name = "mps2-scc",
-+#define DO_SQADD_H(n, m, s) do_sat_bhw((int64_t)n + m, INT16_MIN, INT16_MAX, s)
+     .version_id = 3,
-+#define DO_SQADD_W(n, m, s) do_sat_bhw((int64_t)n + m, INT32_MIN, INT32_MAX, s)
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription mps2_scc_vmstate = {
-+
+         VMSTATE_VARRAY_UINT32(oscclk, MPS2SCC, num_oscclk,
-+#define DO_UQADD_B(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT8_MAX, s)
+, vmstate_info_uint32, uint32_t),
-+#define DO_UQADD_H(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT16_MAX, s)
+         VMSTATE_END_OF_LIST()
-+#define DO_UQADD_W(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT32_MAX, s)
++    },
-+
++    .subsections = (const VMStateDescription * const []) {
-+#define DO_SQSUB_B(n, m, s) do_sat_bhw((int64_t)n - m, INT8_MIN, INT8_MAX, s)
++        &vmstate_cfg7,
-+#define DO_SQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, INT16_MIN, INT16_MAX, s)
++        NULL
 +#define DO_SQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, INT32_MIN, INT32_MAX, s)
 +
 +#define DO_UQSUB_B(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT8_MAX, s)
 +#define DO_UQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT16_MAX, s)
 +#define DO_UQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT32_MAX, s)
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhsubu, do_vhsub_u)
          mve_advance_vpt(env);                                           \
      }
+ };
-+#define DO_2OP_SAT_SCALAR(OP, ESIZE, TYPE, FN)                          \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
 +                                uint32_t rm)                            \
 +    {                                                                   \
 +        TYPE *d = vd, *n = vn;                                          \
 +        TYPE m = rm;                                                    \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        bool qc = false;                                                \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            bool sat = false;                                           \
 +            mergemask(&d[H##ESIZE(e)], FN(n[H##ESIZE(e)], m, &sat),     \
 +                      mask);                                            \
 +            qc |= sat & mask & 1;                                       \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
  /* provide unsigned 2-op scalar helpers for all sizes */
  #define DO_2OP_SCALAR_U(OP, FN)                 \
      DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR_U(vhaddu_scalar, do_vhadd_u)
  DO_2OP_SCALAR_S(vhsubs_scalar, do_vhsub_s)
  DO_2OP_SCALAR_U(vhsubu_scalar, do_vhsub_u)
 +DO_2OP_SAT_SCALAR(vqaddu_scalarb, 1, uint8_t, DO_UQADD_B)
 +DO_2OP_SAT_SCALAR(vqaddu_scalarh, 2, uint16_t, DO_UQADD_H)
 +DO_2OP_SAT_SCALAR(vqaddu_scalarw, 4, uint32_t, DO_UQADD_W)
 +DO_2OP_SAT_SCALAR(vqadds_scalarb, 1, int8_t, DO_SQADD_B)
 +DO_2OP_SAT_SCALAR(vqadds_scalarh, 2, int16_t, DO_SQADD_H)
 +DO_2OP_SAT_SCALAR(vqadds_scalarw, 4, int32_t, DO_SQADD_W)
 +
 +DO_2OP_SAT_SCALAR(vqsubu_scalarb, 1, uint8_t, DO_UQSUB_B)
 +DO_2OP_SAT_SCALAR(vqsubu_scalarh, 2, uint16_t, DO_UQSUB_H)
 +DO_2OP_SAT_SCALAR(vqsubu_scalarw, 4, uint32_t, DO_UQSUB_W)
 +DO_2OP_SAT_SCALAR(vqsubs_scalarb, 1, int8_t, DO_SQSUB_B)
 +DO_2OP_SAT_SCALAR(vqsubs_scalarh, 2, int16_t, DO_SQSUB_H)
 +DO_2OP_SAT_SCALAR(vqsubs_scalarw, 4, int32_t, DO_SQSUB_W)
 +
  static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
  {
      m &= 0xff;
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VHADD_S_scalar, vhadds_scalar)
  DO_2OP_SCALAR(VHADD_U_scalar, vhaddu_scalar)
  DO_2OP_SCALAR(VHSUB_S_scalar, vhsubs_scalar)
  DO_2OP_SCALAR(VHSUB_U_scalar, vhsubu_scalar)
 +DO_2OP_SCALAR(VQADD_S_scalar, vqadds_scalar)
 +DO_2OP_SCALAR(VQADD_U_scalar, vqaddu_scalar)
 +DO_2OP_SCALAR(VQSUB_S_scalar, vqsubs_scalar)
 +DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
  DO_2OP_SCALAR(VBRSR, vbrsr)
  static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
 --
-.20.1
+.34.1

-[PULL 12/57] target/arm: Implement MVE VLDR/VSTR (non-widening forms)
+[PULL 30/35] hw/arm/mps3r: Initial skeleton for mps3-an536 board
-Implement the forms of the MVE VLDR and VSTR insns which perform
+The AN536 is another FPGA image for the MPS3 development board. Unlike
-non-widening loads of bytes, halfwords or words from memory into
+the existing FPGA images we already model, this board uses a Cortex-R
-vector elements of the same width (encodings T5, T6, T7).
+family CPU, and it does not use any equivalent to the M-profile
+"Subsystem for Embedded" SoC-equivalent that we model in hw/arm/armsse.c.
-(At the moment we know for MVE and M-profile in general that
+It's therefore more convenient for us to model it as a completely
-vfp_access_check() can never return false, but we include the
+separate C file.
-conventional return-true-on-failure check for consistency
-with non-M-profile translation code.)
+This commit adds the basic skeleton of the board model, and the
 code to create all the RAM and ROM. We assume that we're probably
 going to want to add more images in future, so use the same
 base class/subclass setup that mps2-tz.c uses, even though at
 the moment there's only a single subclass.
 Following commits will add the CPUs and the peripherals.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20210617121628.20116-2-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-9-peter.maydell@linaro.org
 ---
- target/arm/{translate-mve.c => helper-mve.h} |  19 +-
+ MAINTAINERS                             |   3 +-
- target/arm/helper.h                          |   2 +
+ configs/devices/arm-softmmu/default.mak |   1 +
- target/arm/internals.h                       |  11 ++
+ hw/arm/mps3r.c                          | 239 ++++++++++++++++++++++++
- target/arm/mve.decode                        |  22 +++
+ hw/arm/Kconfig                          |   5 +
- target/arm/mve_helper.c                      | 172 +++++++++++++++++++
+ hw/arm/meson.build                      |   1 +
- target/arm/translate-mve.c                   | 119 +++++++++++++
+files changed, 248 insertions(+), 1 deletion(-)
- target/arm/meson.build                       |   1 +
+ create mode 100644 hw/arm/mps3r.c
-files changed, 334 insertions(+), 12 deletions(-)
- copy target/arm/{translate-mve.c => helper-mve.h} (61%)
+diff --git a/MAINTAINERS b/MAINTAINERS
  create mode 100644 target/arm/mve_helper.c
 diff --git a/target/arm/translate-mve.c b/target/arm/helper-mve.h
 similarity index 61%
 copy from target/arm/translate-mve.c
 copy to target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
+--- a/MAINTAINERS
-+++ b/target/arm/helper-mve.h
++++ b/MAINTAINERS
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ F: include/hw/misc/imx7_*.h
- /*
+ F: hw/pci-host/designware.c
-- *  ARM translation: M-profile MVE instructions
+ F: include/hw/pci-host/designware.h
-+ *  M-profile MVE specific helper definitions
-  *
+-MPS2
-  *  Copyright (c) 2021 Linaro, Ltd.
++MPS2 / MPS3
-  *
+ M: Peter Maydell <peter.maydell@linaro.org>
-@@ -XXX,XX +XXX,XX @@
+ L: qemu-arm@nongnu.org
-  * You should have received a copy of the GNU Lesser General Public
+ S: Maintained
-  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ F: hw/arm/mps2.c
-  */
+ F: hw/arm/mps2-tz.c
--
++F: hw/arm/mps3r.c
--#include "qemu/osdep.h"
+ F: hw/misc/mps2-*.c
--#include "tcg/tcg-op.h"
+ F: include/hw/misc/mps2-*.h
--#include "tcg/tcg-op-gvec.h"
+ F: hw/arm/armsse.c
--#include "exec/exec-all.h"
+diff --git a/configs/devices/arm-softmmu/default.mak b/configs/devices/arm-softmmu/default.mak
 -#include "exec/gen-icount.h"
 -#include "translate.h"
 -#include "translate-a32.h"
 -
 -/* Include the generated decoder */
 -#include "decode-mve.c.inc"
 +DEF_HELPER_FLAGS_3(mve_vldrb, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vldrh, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vldrw, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vstrb, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vstrh, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vstrw, TCG_CALL_NO_WG, void, env, ptr, i32)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/configs/devices/arm-softmmu/default.mak
-+++ b/target/arm/helper.h
++++ b/configs/devices/arm-softmmu/default.mak
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
+@@ -XXX,XX +XXX,XX @@ CONFIG_ARM_VIRT=y
- #include "helper-a64.h"
+ # CONFIG_INTEGRATOR=n
- #include "helper-sve.h"
+ # CONFIG_FSL_IMX31=n
- #endif
+ # CONFIG_MUSICPAL=n
-+
++# CONFIG_MPS3R=n
-+#include "helper-mve.h"
+ # CONFIG_MUSCA=n
-diff --git a/target/arm/internals.h b/target/arm/internals.h
+ # CONFIG_CHEETAH=n
-index XXXXXXX..XXXXXXX 100644
+ # CONFIG_SX1=n
---- a/target/arm/internals.h
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 +++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t useronly_maybe_clean_ptr(uint32_t desc, uint64_t ptr)
      return ptr;
  }
 +/* Values for M-profile PSR.ECI for MVE insns */
 +enum MVEECIState {
 +    ECI_NONE = 0, /* No completed beats */
 +    ECI_A0 = 1, /* Completed: A0 */
 +    ECI_A0A1 = 2, /* Completed: A0, A1 */
 +    /* 3 is reserved */
 +    ECI_A0A1A2 = 4, /* Completed: A0, A1, A2 */
 +    ECI_A0A1A2B0 = 5, /* Completed: A0, A1, A2, B0 */
 +    /* All other values reserved */
 +};
 +
  #endif
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  #
  # This file is processed by scripts/decodetree.py
  #
 +
 +%qd 22:1 13:3
 +
 +&vldr_vstr rn qd imm p a w size l
 +
 +@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd
 +
 +# Vector loads and stores
 +
 +# Non-widening loads/stores (P=0 W=0 is 'related encoding')
 +VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111100 .......   @vldr_vstr \
 +                 size=0 p=0 w=1
 +VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111101 .......   @vldr_vstr \
 +                 size=1 p=0 w=1
 +VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111110 .......   @vldr_vstr \
 +                 size=2 p=0 w=1
 +VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111100 .......   @vldr_vstr \
 +                 size=0 p=1
 +VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
 +                 size=1 p=1
 +VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
 +                 size=2 p=1
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/target/arm/mve_helper.c
++++ b/hw/arm/mps3r.c
 @@ -XXX,XX +XXX,XX @@
 +/*
-+ * M-profile MVE Operations
++ * Arm MPS3 board emulation for Cortex-R-based FPGA images.
 + * (For M-profile images see mps2.c and mps2tz.c.)
 + *
-+ * Copyright (c) 2021 Linaro, Ltd.
++ * Copyright (c) 2017 Linaro Limited
 + * Written by Peter Maydell
 + *
-+ * This library is free software; you can redistribute it and/or
++ *  This program is free software; you can redistribute it and/or modify
-+ * modify it under the terms of the GNU Lesser General Public
++ *  it under the terms of the GNU General Public License version 2 or
-+ * License as published by the Free Software Foundation; either
++ *  (at your option) any later version.
-+ * version 2.1 of the License, or (at your option) any later version.
++ */
 +
 +/*
 + * The MPS3 is an FPGA based dev board. This file handles FPGA images
 + * which use the Cortex-R CPUs. We model these separately from the
 + * M-profile images, because on M-profile the FPGA image is based on
 + * a "Subsystem for Embedded" which is similar to an SoC, whereas
 + * the R-profile FPGA images don't have that abstraction layer.
 + *
-+ * This library is distributed in the hope that it will be useful,
++ * We model the following FPGA images here:
-+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
++ *  "mps3-an536" -- dual Cortex-R52 as documented in Arm Application Note AN536
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
-+ * You should have received a copy of the GNU Lesser General Public
++ * Application Note AN536:
-+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
++ * https://developer.arm.com/documentation/dai0536/latest/
 + */
 +
 +#include "qemu/osdep.h"
++#include "qemu/units.h"
++#include "qapi/error.h"
++#include "exec/address-spaces.h"
 +#include "cpu.h"
-+#include "internals.h"
++#include "hw/boards.h"
-+#include "vec_internal.h"
++#include "hw/arm/boot.h"
-+#include "exec/helper-proto.h"
++
-+#include "exec/cpu_ldst.h"
++/* Define the layout of RAM and ROM in a board */
-+#include "exec/exec-all.h"
++typedef struct RAMInfo {
-+
++    const char *name;
-+static uint16_t mve_element_mask(CPUARMState *env)
++    hwaddr base;
 +    hwaddr size;
 +    int mrindex; /* index into rams[]; -1 for the system RAM block */
 +    int flags;
 +} RAMInfo;
 +
 +/*
 + * The MPS3 DDR is 3GiB, but on a 32-bit host QEMU doesn't permit
 + * emulation of that much guest RAM, so artificially make it smaller.
 + */
 +#if HOST_LONG_BITS == 32
 +#define MPS3_DDR_SIZE (1 * GiB)
 +#else
 +#define MPS3_DDR_SIZE (3 * GiB)
 +#endif
 +
 +/*
 + * Flag values:
 + * IS_MAIN: this is the main machine RAM
 + * IS_ROM: this area is read-only
 + */
 +#define IS_MAIN 1
 +#define IS_ROM 2
 +
 +#define MPS3R_RAM_MAX 9
 +
 +typedef enum MPS3RFPGAType {
 +    FPGA_AN536,
 +} MPS3RFPGAType;
 +
 +struct MPS3RMachineClass {
 +    MachineClass parent;
 +    MPS3RFPGAType fpga_type;
 +    const RAMInfo *raminfo;
 +};
 +
 +struct MPS3RMachineState {
 +    MachineState parent;
 +    MemoryRegion ram[MPS3R_RAM_MAX];
 +};
 +
 +#define TYPE_MPS3R_MACHINE "mps3r"
 +#define TYPE_MPS3R_AN536_MACHINE MACHINE_TYPE_NAME("mps3-an536")
 +
 +OBJECT_DECLARE_TYPE(MPS3RMachineState, MPS3RMachineClass, MPS3R_MACHINE)
 +
 +static const RAMInfo an536_raminfo[] = {
 +    {
 +        .name = "ATCM",
 +        .base = 0x00000000,
 +        .size = 0x00008000,
 +        .mrindex = 0,
 +    }, {
 +        /* We model the QSPI flash as simple ROM for now */
 +        .name = "QSPI",
 +        .base = 0x08000000,
 +        .size = 0x00800000,
 +        .flags = IS_ROM,
 +        .mrindex = 1,
 +    }, {
 +        .name = "BRAM",
 +        .base = 0x10000000,
 +        .size = 0x00080000,
 +        .mrindex = 2,
 +    }, {
 +        .name = "DDR",
 +        .base = 0x20000000,
 +        .size = MPS3_DDR_SIZE,
 +        .mrindex = -1,
 +    }, {
 +        .name = "ATCM0",
 +        .base = 0xee000000,
 +        .size = 0x00008000,
 +        .mrindex = 3,
 +    }, {
 +        .name = "BTCM0",
 +        .base = 0xee100000,
 +        .size = 0x00008000,
 +        .mrindex = 4,
 +    }, {
 +        .name = "CTCM0",
 +        .base = 0xee200000,
 +        .size = 0x00008000,
 +        .mrindex = 5,
 +    }, {
 +        .name = "ATCM1",
 +        .base = 0xee400000,
 +        .size = 0x00008000,
 +        .mrindex = 6,
 +    }, {
 +        .name = "BTCM1",
 +        .base = 0xee500000,
 +        .size = 0x00008000,
 +        .mrindex = 7,
 +    }, {
 +        .name = "CTCM1",
 +        .base = 0xee600000,
 +        .size = 0x00008000,
 +        .mrindex = 8,
 +    }, {
 +        .name = NULL,
 +    }
 +};
 +
 +static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
 +                                    const RAMInfo *raminfo)
 +{
 +    /* Return an initialized MemoryRegion for the RAMInfo. */
 +    MemoryRegion *ram;
 +
 +    if (raminfo->mrindex < 0) {
 +        /* Means this RAMInfo is for QEMU's "system memory" */
 +        MachineState *machine = MACHINE(mms);
 +        assert(!(raminfo->flags & IS_ROM));
 +        return machine->ram;
 +    }
 +
 +    assert(raminfo->mrindex < MPS3R_RAM_MAX);
 +    ram = &mms->ram[raminfo->mrindex];
 +
 +    memory_region_init_ram(ram, NULL, raminfo->name,
 +                           raminfo->size, &error_fatal);
 +    if (raminfo->flags & IS_ROM) {
 +        memory_region_set_readonly(ram, true);
 +    }
 +    return ram;
 +}
 +
 +static void mps3r_common_init(MachineState *machine)
 +{
 +    MPS3RMachineState *mms = MPS3R_MACHINE(machine);
 +    MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
 +    MemoryRegion *sysmem = get_system_memory();
 +
 +    for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
 +        MemoryRegion *mr = mr_for_raminfo(mms, ri);
 +        memory_region_add_subregion(sysmem, ri->base, mr);
 +    }
 +}
 +
 +static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
 +{
 +    /*
-+     * Return the mask of which elements in the MVE vector should be
++     * Set mc->default_ram_size and default_ram_id from the
-+     * updated. This is a combination of multiple things:
++     * information in mmc->raminfo.
 +     *  (1) by default, we update every lane in the vector
 +     *  (2) VPT predication stores its state in the VPR register;
 +     *  (3) low-overhead-branch tail predication will mask out part
 +     *      the vector on the final iteration of the loop
 +     *  (4) if EPSR.ECI is set then we must execute only some beats
 +     *      of the insn
 +     * We combine all these into a 16-bit result with the same semantics
 +     * as VPR.P0: 0 to mask the lane, 1 if it is active.
 +     * 8-bit vector ops will look at all bits of the result;
 +     * 16-bit ops will look at bits 0, 2, 4, ...;
 +     * 32-bit ops will look at bits 0, 4, 8 and 12.
 +     * Compare pseudocode GetCurInstrBeat(), though that only returns
 +     * the 4-bit slice of the mask corresponding to a single beat.
 +     */
-+    uint16_t mask = FIELD_EX32(env->v7m.vpr, V7M_VPR, P0);
++    MachineClass *mc = MACHINE_CLASS(mmc);
-+
++    const RAMInfo *p;
-+    if (!(env->v7m.vpr & R_V7M_VPR_MASK01_MASK)) {
++
-+        mask |= 0xff;
++    for (p = mmc->raminfo; p->name; p++) {
-+    }
++        if (p->mrindex < 0) {
-+    if (!(env->v7m.vpr & R_V7M_VPR_MASK23_MASK)) {
++            /* Found the entry for "system memory" */
-+        mask |= 0xff00;
++            mc->default_ram_size = p->size;
-+    }
++            mc->default_ram_id = p->name;
-+
++            return;
 +    if (env->v7m.ltpsize < 4 &&
 +        env->regs[14] <= (1 << (4 - env->v7m.ltpsize))) {
 +        /*
 +         * Tail predication active, and this is the last loop iteration.
 +         * The element size is (1 << ltpsize), and we only want to process
 +         * loopcount elements, so we want to retain the least significant
 +         * (loopcount * esize) predicate bits and zero out bits above that.
 +         */
 +        int masklen = env->regs[14] << env->v7m.ltpsize;
 +        assert(masklen <= 16);
 +        mask &= MAKE_64BIT_MASK(0, masklen);
 +    }
 +
 +    if ((env->condexec_bits & 0xf) == 0) {
 +        /*
 +         * ECI bits indicate which beats are already executed;
 +         * we handle this by effectively predicating them out.
 +         */
 +        int eci = env->condexec_bits >> 4;
 +        switch (eci) {
 +        case ECI_NONE:
 +            break;
 +        case ECI_A0:
 +            mask &= 0xfff0;
 +            break;
 +        case ECI_A0A1:
 +            mask &= 0xff00;
 +            break;
 +        case ECI_A0A1A2:
 +        case ECI_A0A1A2B0:
 +            mask &= 0xf000;
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +    }
-+
++    g_assert_not_reached();
-+    return mask;
++}
-+}
++
-+
++static void mps3r_class_init(ObjectClass *oc, void *data)
-+static void mve_advance_vpt(CPUARMState *env)
++{
-+{
++    MachineClass *mc = MACHINE_CLASS(oc);
-+    /* Advance the VPT and ECI state if necessary */
++
-+    uint32_t vpr = env->v7m.vpr;
++    mc->init = mps3r_common_init;
-+    unsigned mask01, mask23;
++}
 +
-+    if ((env->condexec_bits & 0xf) == 0) {
++static void mps3r_an536_class_init(ObjectClass *oc, void *data)
-+        env->condexec_bits = (env->condexec_bits == (ECI_A0A1A2B0 << 4)) ?
++{
-+            (ECI_A0 << 4) : (ECI_NONE << 4);
++    MachineClass *mc = MACHINE_CLASS(oc);
-+    }
++    MPS3RMachineClass *mmc = MPS3R_MACHINE_CLASS(oc);
-+
++    static const char * const valid_cpu_types[] = {
-+    if (!(vpr & (R_V7M_VPR_MASK01_MASK | R_V7M_VPR_MASK23_MASK))) {
++        ARM_CPU_TYPE_NAME("cortex-r52"),
-+        /* VPT not enabled, nothing to do */
++        NULL
-+        return;
++    };
-+    }
++
-+
++    mc->desc = "ARM MPS3 with AN536 FPGA image for Cortex-R52";
-+    mask01 = FIELD_EX32(vpr, V7M_VPR, MASK01);
++    mc->default_cpus = 2;
-+    mask23 = FIELD_EX32(vpr, V7M_VPR, MASK23);
++    mc->min_cpus = mc->default_cpus;
-+    if (mask01 > 8) {
++    mc->max_cpus = mc->default_cpus;
-+        /* high bit set, but not 0b1000: invert the relevant half of P0 */
++    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-r52");
-+        vpr ^= 0xff;
++    mc->valid_cpu_types = valid_cpu_types;
-+    }
++    mmc->raminfo = an536_raminfo;
-+    if (mask23 > 8) {
++    mps3r_set_default_ram_info(mmc);
-+        /* high bit set, but not 0b1000: invert the relevant half of P0 */
++}
-+        vpr ^= 0xff00;
++
-+    }
++static const TypeInfo mps3r_machine_types[] = {
-+    vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
++    {
-+    vpr = FIELD_DP32(vpr, V7M_VPR, MASK23, mask23 << 1);
++        .name = TYPE_MPS3R_MACHINE,
-+    env->v7m.vpr = vpr;
++        .parent = TYPE_MACHINE,
-+}
++        .abstract = true,
-+
++        .instance_size = sizeof(MPS3RMachineState),
-+
++        .class_size = sizeof(MPS3RMachineClass),
-+#define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE)                         \
++        .class_init = mps3r_class_init,
-+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
++    }, {
-+    {                                                                   \
++        .name = TYPE_MPS3R_AN536_MACHINE,
-+        TYPE *d = vd;                                                   \
++        .parent = TYPE_MPS3R_MACHINE,
-+        uint16_t mask = mve_element_mask(env);                          \
++        .class_init = mps3r_an536_class_init,
-+        unsigned b, e;                                                  \
++    },
-+        /*                                                              \
++};
-+         * R_SXTM allows the dest reg to become UNKNOWN for abandoned   \
++
-+         * beats so we don't care if we update part of the dest and     \
++DEFINE_TYPES(mps3r_machine_types);
-+         * then take an exception.                                      \
+diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
 +         */                                                             \
 +        for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
 +            if (mask & (1 << b)) {                                      \
 +                d[H##ESIZE(e)] = cpu_##LDTYPE##_data_ra(env, addr, GETPC()); \
 +            }                                                           \
 +            addr += MSIZE;                                              \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define DO_VSTR(OP, MSIZE, STTYPE, ESIZE, TYPE)                         \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
 +    {                                                                   \
 +        TYPE *d = vd;                                                   \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned b, e;                                                  \
 +        for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
 +            if (mask & (1 << b)) {                                      \
 +                cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
 +            }                                                           \
 +            addr += MSIZE;                                              \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +DO_VLDR(vldrb, 1, ldub, 1, uint8_t)
 +DO_VLDR(vldrh, 2, lduw, 2, uint16_t)
 +DO_VLDR(vldrw, 4, ldl, 4, uint32_t)
 +
 +DO_VSTR(vstrb, 1, stb, 1, uint8_t)
 +DO_VSTR(vstrh, 2, stw, 2, uint16_t)
 +DO_VSTR(vstrw, 4, stl, 4, uint32_t)
 +
 +#undef DO_VLDR
 +#undef DO_VSTR
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
+--- a/hw/arm/Kconfig
-+++ b/target/arm/translate-mve.c
++++ b/hw/arm/Kconfig
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ config MAINSTONE
+     select PFLASH_CFI01
- /* Include the generated decoder */
+     select SMC91C111
- #include "decode-mve.c.inc"
-+
++config MPS3R
-+typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
++    bool
-+
++    default y
-+/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
++    depends on TCG && ARM
-+static inline long mve_qreg_offset(unsigned reg)
++
-+{
+ config MUSCA
-+    return offsetof(CPUARMState, vfp.zregs[reg].d[0]);
+     bool
-+}
+     default y
-+
+diff --git a/hw/arm/meson.build b/hw/arm/meson.build
 +static TCGv_ptr mve_qreg_ptr(unsigned reg)
 +{
 +    TCGv_ptr ret = tcg_temp_new_ptr();
 +    tcg_gen_addi_ptr(ret, cpu_env, mve_qreg_offset(reg));
 +    return ret;
 +}
 +
 +static bool mve_check_qreg_bank(DisasContext *s, int qmask)
 +{
 +    /*
 +     * Check whether Qregs are in range. For v8.1M only Q0..Q7
 +     * are supported, see VFPSmallRegisterBank().
 +     */
 +    return qmask < 8;
 +}
 +
 +static bool mve_eci_check(DisasContext *s)
 +{
 +    /*
 +     * This is a beatwise insn: check that ECI is valid (not a
 +     * reserved value) and note that we are handling it.
 +     * Return true if OK, false if we generated an exception.
 +     */
 +    s->eci_handled = true;
 +    switch (s->eci) {
 +    case ECI_NONE:
 +    case ECI_A0:
 +    case ECI_A0A1:
 +    case ECI_A0A1A2:
 +    case ECI_A0A1A2B0:
 +        return true;
 +    default:
 +        /* Reserved value: INVSTATE UsageFault */
 +        gen_exception_insn(s, s->pc_curr, EXCP_INVSTATE, syn_uncategorized(),
 +                           default_exception_el(s));
 +        return false;
 +    }
 +}
 +
 +static void mve_update_eci(DisasContext *s)
 +{
 +    /*
 +     * The helper function will always update the CPUState field,
 +     * so we only need to update the DisasContext field.
 +     */
 +    if (s->eci) {
 +        s->eci = (s->eci == ECI_A0A1A2B0) ? ECI_A0 : ECI_NONE;
 +    }
 +}
 +
 +static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
 +{
 +    TCGv_i32 addr;
 +    uint32_t offset;
 +    TCGv_ptr qreg;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd) ||
 +        !fn) {
 +        return false;
 +    }
 +
 +    /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
 +    if (a->rn == 15 || (a->rn == 13 && a->w)) {
 +        return false;
 +    }
 +
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    offset = a->imm << a->size;
 +    if (!a->a) {
 +        offset = -offset;
 +    }
 +    addr = load_reg(s, a->rn);
 +    if (a->p) {
 +        tcg_gen_addi_i32(addr, addr, offset);
 +    }
 +
 +    qreg = mve_qreg_ptr(a->qd);
 +    fn(cpu_env, qreg, addr);
 +    tcg_temp_free_ptr(qreg);
 +
 +    /*
 +     * Writeback always happens after the last beat of the insn,
 +     * regardless of predication
 +     */
 +    if (a->w) {
 +        if (!a->p) {
 +            tcg_gen_addi_i32(addr, addr, offset);
 +        }
 +        store_reg(s, a->rn, addr);
 +    } else {
 +        tcg_temp_free_i32(addr);
 +    }
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
 +{
 +    static MVEGenLdStFn * const ldstfns[4][2] = {
 +        { gen_helper_mve_vstrb, gen_helper_mve_vldrb },
 +        { gen_helper_mve_vstrh, gen_helper_mve_vldrh },
 +        { gen_helper_mve_vstrw, gen_helper_mve_vldrw },
 +        { NULL, NULL }
 +    };
 +    return do_ldst(s, a, ldstfns[a->size][a->l]);
 +}
 diff --git a/target/arm/meson.build b/target/arm/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/meson.build
+--- a/hw/arm/meson.build
-+++ b/target/arm/meson.build
++++ b/hw/arm/meson.build
-@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
+@@ -XXX,XX +XXX,XX @@ arm_ss.add(when: 'CONFIG_HIGHBANK', if_true: files('highbank.c'))
-   'helper.c',
+ arm_ss.add(when: 'CONFIG_INTEGRATOR', if_true: files('integratorcp.c'))
-   'iwmmxt_helper.c',
+ arm_ss.add(when: 'CONFIG_MAINSTONE', if_true: files('mainstone.c'))
-   'm_helper.c',
+ arm_ss.add(when: 'CONFIG_MICROBIT', if_true: files('microbit.c'))
-+  'mve_helper.c',
++arm_ss.add(when: 'CONFIG_MPS3R', if_true: files('mps3r.c'))
-   'neon_helper.c',
+ arm_ss.add(when: 'CONFIG_MUSICPAL', if_true: files('musicpal.c'))
-   'op_helper.c',
+ arm_ss.add(when: 'CONFIG_NETDUINOPLUS2', if_true: files('netduinoplus2.c'))
-   'tlb_helper.c',
+ arm_ss.add(when: 'CONFIG_OLIMEX_STM32_H405', if_true: files('olimex-stm32-h405.c'))
 --
-.20.1
+.34.1

-[PULL 07/57] target/arm: Don't NOCP fault for FPCXT_NS accesses
+[PULL 31/35] hw/arm/mps3r: Add CPUs, GIC, and per-CPU RAM
-The M-profile architecture requires that accesses to FPCXT_NS when
+Create the CPUs, the GIC, and the per-CPU RAM block for
-there is no active FP state must not take a NOCP fault even if the
+the mps3-an536 board.
 FPU is disabled. We were not implementing this correctly, because
 in our decode we catch the NOCP faults early in m-nocp.decode.
-Fix this bug by moving all the handling of M-profile FP system
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-register accesses from vfp.decode into m-nocp.decode and putting
+Message-id: 20240206132931.38376-10-peter.maydell@linaro.org
-it above the NOCP blocks. This provides the correct behaviour:
+---
- * for accesses other than FPCXT_NS the trans functions call
+ hw/arm/mps3r.c | 180 ++++++++++++++++++++++++++++++++++++++++++++++++-
-   vfp_access_check(), which will check for FPU disabled and
+file changed, 177 insertions(+), 3 deletions(-)
    raise a NOCP exception if necessary
  * for FPCXT_NS we have the special case code that doesn't
    call vfp_access_check()
  * when these trans functions want to raise an UNDEF they return
    false, so the decoder will fall through into the NOCP blocks.
    This means that NOCP correctly takes precedence over UNDEF
    for these insns. (This is a difference from the other insns
    handled by m-nocp.decode, where UNDEF takes precedence and
    which we implement by having those trans functions call
    unallocated_encoding() in the appropriate places.)
-[Note for backport to stable: this commit has a semantic dependency
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 on commit 9a486856e9173af, which was not marked as cc-stable because
 we didn't know we'd need it for a for-stable bugfix.]
 Cc: qemu-stable@nongnu.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210618141019.10671-4-peter.maydell@linaro.org
 ---
  target/arm/translate-a32.h    |   1 +
  target/arm/m-nocp.decode      |  24 ++
  target/arm/vfp.decode         |  14 -
  target/arm/translate-m-nocp.c | 514 +++++++++++++++++++++++++++++++++
  target/arm/translate-vfp.c    | 517 +---------------------------------
 files changed, 542 insertions(+), 528 deletions(-)
 diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a32.h
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/translate-a32.h
++++ b/hw/arm/mps3r.c
@@ -XXX,XX +XXX,XX @@ bool disas_neon_shared(DisasContext *s, uint32_t insn);
  void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
  void arm_gen_condlabel(DisasContext *s);
  bool vfp_access_check(DisasContext *s);
 +void gen_preserve_fp_state(DisasContext *s);
  void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop);
  void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop);
  void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop);
 diff --git a/target/arm/m-nocp.decode b/target/arm/m-nocp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/m-nocp.decode
 +++ b/target/arm/m-nocp.decode
 @@ -XXX,XX +XXX,XX @@
- &nocp cp
-+# M-profile VLDR/VSTR to sysreg
-+%vldr_sysreg 22:1 13:3
-+%imm7_0x4 0:7 !function=times_4
-+
-+&vldr_sysreg rn reg imm a w p
-+@vldr_sysreg .... ... . a:1 . . . rn:4 ... . ... .. ....... \
-+             reg=%vldr_sysreg imm=%imm7_0x4 &vldr_sysreg
-+
- {
-   # Special cases which do not take an early NOCP: VLLDM and VLSTM
-   VLLDM_VLSTM  1110 1100 001 l:1 rn:4 0000 1010 op:1 000 0000
-@@ -XXX,XX +XXX,XX @@
-   VSCCLRM      1110 1100 1.01 1111 .... 1011 imm:7 0   vd=%vd_dp size=3
-   VSCCLRM      1110 1100 1.01 1111 .... 1010 imm:8     vd=%vd_sp size=2
-+  # FP system register accesses: these are a special case because accesses
-+  # to FPCXT_NS succeed even if the FPU is disabled. We therefore need
-+  # to handle them before the big NOCP blocks. Note that within these
-+  # insns NOCP still has higher priority than UNDEFs; this is implemented
-+  # by their returning 'false' for UNDEF so as to fall through into the
-+  # NOCP check (in contrast to VLLDM etc, which call unallocated_encoding()
-+  # for the UNDEFs there that must take precedence over NOCP.)
-+
-+  VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
-+
-+  # P=0 W=0 is SEE "Related encodings", so split into two patterns
-+  VLDR_sysreg  ---- 110 1 . . w:1 1 .... ... 0 111 11 ....... @vldr_sysreg p=1
-+  VLDR_sysreg  ---- 110 0 . . 1   1 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
-+  VSTR_sysreg  ---- 110 1 . . w:1 0 .... ... 0 111 11 ....... @vldr_sysreg p=1
-+  VSTR_sysreg  ---- 110 0 . . 1   0 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
-+
-   NOCP         111- 1110 ---- ---- ---- cp:4 ---- ---- &nocp
-   NOCP         111- 110- ---- ---- ---- cp:4 ---- ---- &nocp
-   # From v8.1M onwards this range will also NOCP:
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp.decode
-+++ b/target/arm/vfp.decode
-@@ -XXX,XX +XXX,XX @@ VLDR_VSTR_hp ---- 1101 u:1 .0 l:1 rn:4 .... 1001 imm:8      vd=%vd_sp
- VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8      vd=%vd_sp
- VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8      vd=%vd_dp
--# M-profile VLDR/VSTR to sysreg
--%vldr_sysreg 22:1 13:3
--%imm7_0x4 0:7 !function=times_4
--
--&vldr_sysreg rn reg imm a w p
--@vldr_sysreg .... ... . a:1 . . . rn:4 ... . ... .. ....... \
--             reg=%vldr_sysreg imm=%imm7_0x4 &vldr_sysreg
--
--# P=0 W=0 is SEE "Related encodings", so split into two patterns
--VLDR_sysreg  ---- 110 1 . . w:1 1 .... ... 0 111 11 ....... @vldr_sysreg p=1
--VLDR_sysreg  ---- 110 0 . . 1   1 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
--VSTR_sysreg  ---- 110 1 . . w:1 0 .... ... 0 111 11 ....... @vldr_sysreg p=1
--VSTR_sysreg  ---- 110 0 . . 1   0 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
--
- # We split the load/store multiple up into two patterns to avoid
- # overlap with other insns in the "Advanced SIMD load/store and 64-bit move"
- # grouping:
-diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-m-nocp.c
-+++ b/target/arm/translate-m-nocp.c
-@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
- #include "tcg/tcg-op.h"
+ #include "qemu/units.h"
-+#include "tcg/tcg-op-gvec.h"
+ #include "qapi/error.h"
- #include "translate.h"
++#include "qapi/qmp/qlist.h"
- #include "translate-a32.h"
+ #include "exec/address-spaces.h"
+ #include "cpu.h"
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
+ #include "hw/boards.h"
-     return true;
++#include "hw/qdev-properties.h"
  #include "hw/arm/boot.h"
 +#include "hw/arm/bsa.h"
 +#include "hw/intc/arm_gicv3.h"
  /* Define the layout of RAM and ROM in a board */
  typedef struct RAMInfo {
@@ -XXX,XX +XXX,XX @@ typedef struct RAMInfo {
  #define IS_ROM 2
  #define MPS3R_RAM_MAX 9
 +#define MPS3R_CPU_MAX 2
 +
 +#define PERIPHBASE 0xf0000000
 +#define NUM_SPIS 96
  typedef enum MPS3RFPGAType {
      FPGA_AN536,
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineClass {
      MachineClass parent;
      MPS3RFPGAType fpga_type;
      const RAMInfo *raminfo;
 +    hwaddr loader_start;
  };
  struct MPS3RMachineState {
      MachineState parent;
 +    struct arm_boot_info bootinfo;
      MemoryRegion ram[MPS3R_RAM_MAX];
 +    Object *cpu[MPS3R_CPU_MAX];
 +    MemoryRegion cpu_sysmem[MPS3R_CPU_MAX];
 +    MemoryRegion sysmem_alias[MPS3R_CPU_MAX];
 +    MemoryRegion cpu_ram[MPS3R_CPU_MAX];
 +    GICv3State gic;
  };
  #define TYPE_MPS3R_MACHINE "mps3r"
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
      return ram;
  }
 +/*
-+ * M-profile provides two different sets of instructions that can
++ * There is no defined secondary boot protocol for Linux for the AN536,
-+ * access floating point system registers: VMSR/VMRS (which move
++ * because real hardware has a restriction that atomic operations between
-+ * to/from a general purpose register) and VLDR/VSTR sysreg (which
++ * the two CPUs do not function correctly, and so true SMP is not
-+ * move directly to/from memory). In some cases there are also side
++ * possible. Therefore for cases where the user is directly booting
-+ * effects which must happen after any write to memory (which could
++ * a kernel, we treat the system as essentially uniprocessor, and
-+ * cause an exception). So we implement the common logic for the
++ * put the secondary CPU into power-off state (as if the user on the
-+ * sysreg access in gen_M_fp_sysreg_write() and gen_M_fp_sysreg_read(),
++ * real hardware had configured the secondary to be halted via the
-+ * which take pointers to callback functions which will perform the
++ * SCC config registers).
-+ * actual "read/write general purpose register" and "read/write
++ *
-+ * memory" operations.
++ * Note that the default secondary boot code would not work here anyway
 + * as it assumes a GICv2, and we have a GICv3.
 + */
-+
++static void mps3r_write_secondary_boot(ARMCPU *cpu,
-+/*
++                                       const struct arm_boot_info *info)
 + * Emit code to store the sysreg to its final destination; frees the
 + * TCG temp 'value' it is passed.
 + */
 +typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
 +/*
 + * Emit code to load the value to be copied to the sysreg; returns
 + * a new TCG temporary
 + */
 +typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
 +
 +/* Common decode/access checks for fp sysreg read/write */
 +typedef enum FPSysRegCheckResult {
 +    FPSysRegCheckFailed, /* caller should return false */
 +    FPSysRegCheckDone, /* caller should return true */
 +    FPSysRegCheckContinue, /* caller should continue generating code */
 +} FPSysRegCheckResult;
 +
 +static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
 +{
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
 +        return FPSysRegCheckFailed;
 +    }
 +
 +    switch (regno) {
 +    case ARM_VFP_FPSCR:
 +    case QEMU_VFP_FPSCR_NZCV:
 +        break;
 +    case ARM_VFP_FPSCR_NZCVQC:
 +        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +            return FPSysRegCheckFailed;
 +        }
 +        break;
 +    case ARM_VFP_FPCXT_S:
 +    case ARM_VFP_FPCXT_NS:
 +        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +            return FPSysRegCheckFailed;
 +        }
 +        if (!s->v8m_secure) {
 +            return FPSysRegCheckFailed;
 +        }
 +        break;
 +    case ARM_VFP_VPR:
 +    case ARM_VFP_P0:
 +        if (!dc_isar_feature(aa32_mve, s)) {
 +            return FPSysRegCheckFailed;
 +        }
 +        break;
 +    default:
 +        return FPSysRegCheckFailed;
 +    }
 +
 +    /*
 +     * FPCXT_NS is a special case: it has specific handling for
 +     * "current FP state is inactive", and must do the PreserveFPState()
 +     * but not the usual full set of actions done by ExecuteFPCheck().
 +     * So we don't call vfp_access_check() and the callers must handle this.
 +     */
 +    if (regno != ARM_VFP_FPCXT_NS && !vfp_access_check(s)) {
 +        return FPSysRegCheckDone;
 +    }
 +    return FPSysRegCheckContinue;
 +}
 +
 +static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
 +                                  TCGLabel *label)
 +{
 +    /*
-+     * FPCXT_NS is a special case: it has specific handling for
++     * Power the secondary CPU off. This means we don't need to write any
-+     * "current FP state is inactive", and must do the PreserveFPState()
++     * boot code into guest memory. Note that the 'cpu' argument to this
-+     * but not the usual full set of actions done by ExecuteFPCheck().
++     * function is the primary CPU we passed to arm_load_kernel(), not
-+     * We don't have a TB flag that matches the fpInactive check, so we
++     * the secondary. Loop around all the other CPUs, as the boot.c
-+     * do it at runtime as we don't expect FPCXT_NS accesses to be frequent.
++     * code does for the "disable secondaries if PSCI is enabled" case.
 +     *
 +     * Emit code that checks fpInactive and does a conditional
 +     * branch to label based on it:
 +     *  if cond is TCG_COND_NE then branch if fpInactive != 0 (ie if inactive)
 +     *  if cond is TCG_COND_EQ then branch if fpInactive == 0 (ie if active)
 +     */
-+    assert(cond == TCG_COND_EQ || cond == TCG_COND_NE);
++    for (CPUState *cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
-+
++        if (cs != first_cpu) {
-+    /* fpInactive = FPCCR_NS.ASPEN == 1 && CONTROL.FPCA == 0 */
++            object_property_set_bool(OBJECT(cs), "start-powered-off", true,
-+    TCGv_i32 aspen, fpca;
++                                     &error_abort);
 +    aspen = load_cpu_field(v7m.fpccr[M_REG_NS]);
 +    fpca = load_cpu_field(v7m.control[M_REG_S]);
 +    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 +    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 +    tcg_gen_andi_i32(fpca, fpca, R_V7M_CONTROL_FPCA_MASK);
 +    tcg_gen_or_i32(fpca, fpca, aspen);
 +    tcg_gen_brcondi_i32(tcg_invert_cond(cond), fpca, 0, label);
 +    tcg_temp_free_i32(aspen);
 +    tcg_temp_free_i32(fpca);
 +}
 +
 +static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
 +                                  fp_sysreg_loadfn *loadfn,
 +                                  void *opaque)
 +{
 +    /* Do a write to an M-profile floating point system register */
 +    TCGv_i32 tmp;
 +    TCGLabel *lab_end = NULL;
 +
 +    switch (fp_sysreg_checks(s, regno)) {
 +    case FPSysRegCheckFailed:
 +        return false;
 +    case FPSysRegCheckDone:
 +        return true;
 +    case FPSysRegCheckContinue:
 +        break;
 +    }
 +
 +    switch (regno) {
 +    case ARM_VFP_FPSCR:
 +        tmp = loadfn(s, opaque);
 +        gen_helper_vfp_set_fpscr(cpu_env, tmp);
 +        tcg_temp_free_i32(tmp);
 +        gen_lookup_tb(s);
 +        break;
 +    case ARM_VFP_FPSCR_NZCVQC:
 +    {
 +        TCGv_i32 fpscr;
 +        tmp = loadfn(s, opaque);
 +        if (dc_isar_feature(aa32_mve, s)) {
 +            /* QC is only present for MVE; otherwise RES0 */
 +            TCGv_i32 qc = tcg_temp_new_i32();
 +            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
 +            /*
 +             * The 4 vfp.qc[] fields need only be "zero" vs "non-zero";
 +             * here writing the same value into all elements is simplest.
 +             */
 +            tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
 +                                 16, 16, qc);
 +        }
-+        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
-+        fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
-+        tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
-+        tcg_gen_or_i32(fpscr, fpscr, tmp);
-+        store_cpu_field(fpscr, vfp.xregs[ARM_VFP_FPSCR]);
-+        tcg_temp_free_i32(tmp);
-+        break;
-+    }
-+    case ARM_VFP_FPCXT_NS:
-+        lab_end = gen_new_label();
-+        /* fpInactive case: write is a NOP, so branch to end */
-+        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
-+        /*
-+         * !fpInactive: if FPU disabled, take NOCP exception;
-+         * otherwise PreserveFPState(), and then FPCXT_NS writes
-+         * behave the same as FPCXT_S writes.
-+         */
-+        if (s->fp_excp_el) {
-+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-+                               syn_uncategorized(), s->fp_excp_el);
-+            /*
-+             * This was only a conditional exception, so override
-+             * gen_exception_insn()'s default to DISAS_NORETURN
-+             */
-+            s->base.is_jmp = DISAS_NEXT;
-+            break;
-+        }
-+        gen_preserve_fp_state(s);
-+        /* fall through */
-+    case ARM_VFP_FPCXT_S:
-+    {
-+        TCGv_i32 sfpa, control;
-+        /*
-+         * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
-+         * bits [27:0] from value and zeroes bits [31:28].
-+         */
-+        tmp = loadfn(s, opaque);
-+        sfpa = tcg_temp_new_i32();
-+        tcg_gen_shri_i32(sfpa, tmp, 31);
-+        control = load_cpu_field(v7m.control[M_REG_S]);
-+        tcg_gen_deposit_i32(control, control, sfpa,
-+                            R_V7M_CONTROL_SFPA_SHIFT, 1);
-+        store_cpu_field(control, v7m.control[M_REG_S]);
-+        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
-+        gen_helper_vfp_set_fpscr(cpu_env, tmp);
-+        tcg_temp_free_i32(tmp);
-+        tcg_temp_free_i32(sfpa);
-+        break;
-+    }
-+    case ARM_VFP_VPR:
-+        /* Behaves as NOP if not privileged */
-+        if (IS_USER(s)) {
-+            break;
-+        }
-+        tmp = loadfn(s, opaque);
-+        store_cpu_field(tmp, v7m.vpr);
-+        break;
-+    case ARM_VFP_P0:
-+    {
-+        TCGv_i32 vpr;
-+        tmp = loadfn(s, opaque);
-+        vpr = load_cpu_field(v7m.vpr);
-+        tcg_gen_deposit_i32(vpr, vpr, tmp,
-+                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
-+        store_cpu_field(vpr, v7m.vpr);
-+        tcg_temp_free_i32(tmp);
-+        break;
-+    }
-+    default:
-+        g_assert_not_reached();
-+    }
-+    if (lab_end) {
-+        gen_set_label(lab_end);
-+    }
-+    return true;
-+}
-+
-+static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
-+                                 fp_sysreg_storefn *storefn,
-+                                 void *opaque)
-+{
-+    /* Do a read from an M-profile floating point system register */
-+    TCGv_i32 tmp;
-+    TCGLabel *lab_end = NULL;
-+    bool lookup_tb = false;
-+
-+    switch (fp_sysreg_checks(s, regno)) {
-+    case FPSysRegCheckFailed:
-+        return false;
-+    case FPSysRegCheckDone:
-+        return true;
-+    case FPSysRegCheckContinue:
-+        break;
-+    }
-+
-+    if (regno == ARM_VFP_FPSCR_NZCVQC && !dc_isar_feature(aa32_mve, s)) {
-+        /* QC is RES0 without MVE, so NZCVQC simplifies to NZCV */
-+        regno = QEMU_VFP_FPSCR_NZCV;
-+    }
-+
-+    switch (regno) {
-+    case ARM_VFP_FPSCR:
-+        tmp = tcg_temp_new_i32();
-+        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-+        storefn(s, opaque, tmp);
-+        break;
-+    case ARM_VFP_FPSCR_NZCVQC:
-+        tmp = tcg_temp_new_i32();
-+        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-+        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
-+        storefn(s, opaque, tmp);
-+        break;
-+    case QEMU_VFP_FPSCR_NZCV:
-+        /*
-+         * Read just NZCV; this is a special case to avoid the
-+         * helper call for the "VMRS to CPSR.NZCV" insn.
-+         */
-+        tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
-+        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
-+        storefn(s, opaque, tmp);
-+        break;
-+    case ARM_VFP_FPCXT_S:
-+    {
-+        TCGv_i32 control, sfpa, fpscr;
-+        /* Bits [27:0] from FPSCR, bit [31] from CONTROL.SFPA */
-+        tmp = tcg_temp_new_i32();
-+        sfpa = tcg_temp_new_i32();
-+        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-+        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
-+        control = load_cpu_field(v7m.control[M_REG_S]);
-+        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
-+        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
-+        tcg_gen_or_i32(tmp, tmp, sfpa);
-+        tcg_temp_free_i32(sfpa);
-+        /*
-+         * Store result before updating FPSCR etc, in case
-+         * it is a memory write which causes an exception.
-+         */
-+        storefn(s, opaque, tmp);
-+        /*
-+         * Now we must reset FPSCR from FPDSCR_NS, and clear
-+         * CONTROL.SFPA; so we'll end the TB here.
-+         */
-+        tcg_gen_andi_i32(control, control, ~R_V7M_CONTROL_SFPA_MASK);
-+        store_cpu_field(control, v7m.control[M_REG_S]);
-+        fpscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-+        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-+        tcg_temp_free_i32(fpscr);
-+        lookup_tb = true;
-+        break;
-+    }
-+    case ARM_VFP_FPCXT_NS:
-+    {
-+        TCGv_i32 control, sfpa, fpscr, fpdscr, zero;
-+        TCGLabel *lab_active = gen_new_label();
-+
-+        lookup_tb = true;
-+
-+        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
-+        /* fpInactive case: reads as FPDSCR_NS */
-+        TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-+        storefn(s, opaque, tmp);
-+        lab_end = gen_new_label();
-+        tcg_gen_br(lab_end);
-+
-+        gen_set_label(lab_active);
-+        /*
-+         * !fpInactive: if FPU disabled, take NOCP exception;
-+         * otherwise PreserveFPState(), and then FPCXT_NS
-+         * reads the same as FPCXT_S.
-+         */
-+        if (s->fp_excp_el) {
-+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-+                               syn_uncategorized(), s->fp_excp_el);
-+            /*
-+             * This was only a conditional exception, so override
-+             * gen_exception_insn()'s default to DISAS_NORETURN
-+             */
-+            s->base.is_jmp = DISAS_NEXT;
-+            break;
-+        }
-+        gen_preserve_fp_state(s);
-+        tmp = tcg_temp_new_i32();
-+        sfpa = tcg_temp_new_i32();
-+        fpscr = tcg_temp_new_i32();
-+        gen_helper_vfp_get_fpscr(fpscr, cpu_env);
-+        tcg_gen_andi_i32(tmp, fpscr, ~FPCR_NZCV_MASK);
-+        control = load_cpu_field(v7m.control[M_REG_S]);
-+        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
-+        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
-+        tcg_gen_or_i32(tmp, tmp, sfpa);
-+        tcg_temp_free_i32(control);
-+        /* Store result before updating FPSCR, in case it faults */
-+        storefn(s, opaque, tmp);
-+        /* If SFPA is zero then set FPSCR from FPDSCR_NS */
-+        fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-+        zero = tcg_const_i32(0);
-+        tcg_gen_movcond_i32(TCG_COND_EQ, fpscr, sfpa, zero, fpdscr, fpscr);
-+        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-+        tcg_temp_free_i32(zero);
-+        tcg_temp_free_i32(sfpa);
-+        tcg_temp_free_i32(fpdscr);
-+        tcg_temp_free_i32(fpscr);
-+        break;
-+    }
-+    case ARM_VFP_VPR:
-+        /* Behaves as NOP if not privileged */
-+        if (IS_USER(s)) {
-+            break;
-+        }
-+        tmp = load_cpu_field(v7m.vpr);
-+        storefn(s, opaque, tmp);
-+        break;
-+    case ARM_VFP_P0:
-+        tmp = load_cpu_field(v7m.vpr);
-+        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
-+        storefn(s, opaque, tmp);
-+        break;
-+    default:
-+        g_assert_not_reached();
-+    }
-+
-+    if (lab_end) {
-+        gen_set_label(lab_end);
-+    }
-+    if (lookup_tb) {
-+        gen_lookup_tb(s);
-+    }
-+    return true;
-+}
-+
-+static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
-+{
-+    arg_VMSR_VMRS *a = opaque;
-+
-+    if (a->rt == 15) {
-+        /* Set the 4 flag bits in the CPSR */
-+        gen_set_nzcv(value);
-+        tcg_temp_free_i32(value);
-+    } else {
-+        store_reg(s, a->rt, value);
 +    }
 +}
 +
-+static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
++static void mps3r_secondary_cpu_reset(ARMCPU *cpu,
 +                                      const struct arm_boot_info *info)
 +{
-+    arg_VMSR_VMRS *a = opaque;
++    /* We don't need to do anything here because the CPU will be off */
 +
 +    return load_reg(s, a->rt);
 +}
 +
-+static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
++static void create_gic(MPS3RMachineState *mms, MemoryRegion *sysmem)
 +{
++    MachineState *machine = MACHINE(mms);
++    DeviceState *gicdev;
++    QList *redist_region_count;
++
++    object_initialize_child(OBJECT(mms), "gic", &mms->gic, TYPE_ARM_GICV3);
++    gicdev = DEVICE(&mms->gic);
++    qdev_prop_set_uint32(gicdev, "num-cpu", machine->smp.cpus);
++    qdev_prop_set_uint32(gicdev, "num-irq", NUM_SPIS + GIC_INTERNAL);
++    redist_region_count = qlist_new();
++    qlist_append_int(redist_region_count, machine->smp.cpus);
++    qdev_prop_set_array(gicdev, "redist-region-count", redist_region_count);
++    object_property_set_link(OBJECT(&mms->gic), "sysmem",
++                             OBJECT(sysmem), &error_fatal);
++    sysbus_realize(SYS_BUS_DEVICE(&mms->gic), &error_fatal);
++    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->gic), 0, PERIPHBASE);
++    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->gic), 1, PERIPHBASE + 0x100000);
 +    /*
-+     * Accesses to R15 are UNPREDICTABLE; we choose to undef.
++     * Wire the outputs from each CPU's generic timer and the GICv3
-+     * FPSCR -> r15 is a special case which writes to the PSR flags;
++     * maintenance interrupt signal to the appropriate GIC PPI inputs,
-+     * set a->reg to a special value to tell gen_M_fp_sysreg_read()
++     * and the GIC's IRQ/FIQ/VIRQ/VFIQ interrupt outputs to the CPU's inputs.
 +     * we only care about the top 4 bits of FPSCR there.
 +     */
-+    if (a->rt == 15) {
++    for (int i = 0; i < machine->smp.cpus; i++) {
-+        if (a->l && a->reg == ARM_VFP_FPSCR) {
++        DeviceState *cpudev = DEVICE(mms->cpu[i]);
-+            a->reg = QEMU_VFP_FPSCR_NZCV;
++        SysBusDevice *gicsbd = SYS_BUS_DEVICE(&mms->gic);
-+        } else {
++        int intidbase = NUM_SPIS + i * GIC_INTERNAL;
-+            return false;
++        int irq;
 +        /*
 +         * Mapping from the output timer irq lines from the CPU to the
 +         * GIC PPI inputs used for this board. This isn't a BSA board,
 +         * but it uses the standard convention for the PPI numbers.
 +         */
 +        const int timer_irq[] = {
 +            [GTIMER_PHYS] = ARCH_TIMER_NS_EL1_IRQ,
 +            [GTIMER_VIRT] = ARCH_TIMER_VIRT_IRQ,
 +            [GTIMER_HYP]  = ARCH_TIMER_NS_EL2_IRQ,
 +        };
 +
 +        for (irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
 +            qdev_connect_gpio_out(cpudev, irq,
 +                                  qdev_get_gpio_in(gicdev,
 +                                                   intidbase + timer_irq[irq]));
 +        }
-+    }
++
-+
++        qdev_connect_gpio_out_named(cpudev, "gicv3-maintenance-interrupt", 0,
-+    if (a->l) {
++                                    qdev_get_gpio_in(gicdev,
-+        /* VMRS, move FP system register to gp register */
++                                                     intidbase + ARCH_GIC_MAINT_IRQ));
-+        return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_gpr, a);
++
-+    } else {
++        qdev_connect_gpio_out_named(cpudev, "pmu-interrupt", 0,
-+        /* VMSR, move gp register to FP system register */
++                                    qdev_get_gpio_in(gicdev,
-+        return gen_M_fp_sysreg_write(s, a->reg, gpr_to_fp_sysreg, a);
++                                                     intidbase + VIRTUAL_PMU_IRQ));
 +
 +        sysbus_connect_irq(gicsbd, i,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_IRQ));
 +        sysbus_connect_irq(gicsbd, i + machine->smp.cpus,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_FIQ));
 +        sysbus_connect_irq(gicsbd, i + 2 * machine->smp.cpus,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_VIRQ));
 +        sysbus_connect_irq(gicsbd, i + 3 * machine->smp.cpus,
 +                           qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ));
 +    }
 +}
 +
-+static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
+ static void mps3r_common_init(MachineState *machine)
-+{
+ {
-+    arg_vldr_sysreg *a = opaque;
+     MPS3RMachineState *mms = MPS3R_MACHINE(machine);
-+    uint32_t offset = a->imm;
+@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
-+    TCGv_i32 addr;
+         MemoryRegion *mr = mr_for_raminfo(mms, ri);
-+
+         memory_region_add_subregion(sysmem, ri->base, mr);
-+    if (!a->a) {
+     }
-+        offset = -offset;
++
 +    assert(machine->smp.cpus <= MPS3R_CPU_MAX);
 +    for (int i = 0; i < machine->smp.cpus; i++) {
 +        g_autofree char *sysmem_name = g_strdup_printf("cpu-%d-memory", i);
 +        g_autofree char *ramname = g_strdup_printf("cpu-%d-memory", i);
 +        g_autofree char *alias_name = g_strdup_printf("sysmem-alias-%d", i);
 +
 +        /*
 +         * Each CPU has some private RAM/peripherals, so create the container
 +         * which will house those, with the whole-machine system memory being
 +         * used where there's no CPU-specific device. Note that we need the
 +         * sysmem_alias aliases because we can't put one MR (the original
 +         * 'sysmem') into more than one other MR.
 +         */
 +        memory_region_init(&mms->cpu_sysmem[i], OBJECT(machine),
 +                           sysmem_name, UINT64_MAX);
 +        memory_region_init_alias(&mms->sysmem_alias[i], OBJECT(machine),
 +                                 alias_name, sysmem, 0, UINT64_MAX);
 +        memory_region_add_subregion_overlap(&mms->cpu_sysmem[i], 0,
 +                                            &mms->sysmem_alias[i], -1);
 +
 +        mms->cpu[i] = object_new(machine->cpu_type);
 +        object_property_set_link(mms->cpu[i], "memory",
 +                                 OBJECT(&mms->cpu_sysmem[i]), &error_abort);
 +        object_property_set_int(mms->cpu[i], "reset-cbar",
 +                                PERIPHBASE, &error_abort);
 +        qdev_realize(DEVICE(mms->cpu[i]), NULL, &error_fatal);
 +        object_unref(mms->cpu[i]);
 +
 +        /* Per-CPU RAM */
 +        memory_region_init_ram(&mms->cpu_ram[i], NULL, ramname,
 +                               0x1000, &error_fatal);
 +        memory_region_add_subregion(&mms->cpu_sysmem[i], 0xe7c01000,
 +                                    &mms->cpu_ram[i]);
 +    }
 +
-+    addr = load_reg(s, a->rn);
++    create_gic(mms, sysmem);
-+    if (a->p) {
++
-+        tcg_gen_addi_i32(addr, addr, offset);
++    mms->bootinfo.ram_size = machine->ram_size;
-+    }
++    mms->bootinfo.board_id = -1;
-+
++    mms->bootinfo.loader_start = mmc->loader_start;
-+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
++    mms->bootinfo.write_secondary_boot = mps3r_write_secondary_boot;
-+        gen_helper_v8m_stackcheck(cpu_env, addr);
++    mms->bootinfo.secondary_cpu_reset_hook = mps3r_secondary_cpu_reset;
-+    }
++    arm_load_kernel(ARM_CPU(mms->cpu[0]), machine, &mms->bootinfo);
 +
 +    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
 +                    MO_UL | MO_ALIGN | s->be_data);
 +    tcg_temp_free_i32(value);
 +
 +    if (a->w) {
 +        /* writeback */
 +        if (!a->p) {
 +            tcg_gen_addi_i32(addr, addr, offset);
 +        }
 +        store_reg(s, a->rn, addr);
 +    } else {
 +        tcg_temp_free_i32(addr);
 +    }
 +}
 +
 +static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
 +{
 +    arg_vldr_sysreg *a = opaque;
 +    uint32_t offset = a->imm;
 +    TCGv_i32 addr;
 +    TCGv_i32 value = tcg_temp_new_i32();
 +
 +    if (!a->a) {
 +        offset = -offset;
 +    }
 +
 +    addr = load_reg(s, a->rn);
 +    if (a->p) {
 +        tcg_gen_addi_i32(addr, addr, offset);
 +    }
 +
 +    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
 +        gen_helper_v8m_stackcheck(cpu_env, addr);
 +    }
 +
 +    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
 +                    MO_UL | MO_ALIGN | s->be_data);
 +
 +    if (a->w) {
 +        /* writeback */
 +        if (!a->p) {
 +            tcg_gen_addi_i32(addr, addr, offset);
 +        }
 +        store_reg(s, a->rn, addr);
 +    } else {
 +        tcg_temp_free_i32(addr);
 +    }
 +    return value;
 +}
 +
 +static bool trans_VLDR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
 +{
 +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        return false;
 +    }
 +    if (a->rn == 15) {
 +        return false;
 +    }
 +    return gen_M_fp_sysreg_write(s, a->reg, memory_to_fp_sysreg, a);
 +}
 +
 +static bool trans_VSTR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
 +{
 +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        return false;
 +    }
 +    if (a->rn == 15) {
 +        return false;
 +    }
 +    return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_memory, a);
 +}
 +
  static bool trans_NOCP(DisasContext *s, arg_nocp *a)
  {
      /*
 diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c
 +++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static inline long vfp_f16_offset(unsigned reg, bool top)
   * Generate code for M-profile lazy FP state preservation if needed;
   * this corresponds to the pseudocode PreserveFPState() function.
   */
 -static void gen_preserve_fp_state(DisasContext *s)
 +void gen_preserve_fp_state(DisasContext *s)
  {
      if (s->v7m_lspact) {
          /*
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
      return true;
  }
--/*
+ static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
-- * M-profile provides two different sets of instructions that can
+@@ -XXX,XX +XXX,XX @@ static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
-- * access floating point system registers: VMSR/VMRS (which move
+             /* Found the entry for "system memory" */
-- * to/from a general purpose register) and VLDR/VSTR sysreg (which
+             mc->default_ram_size = p->size;
-- * move directly to/from memory). In some cases there are also side
+             mc->default_ram_id = p->name;
-- * effects which must happen after any write to memory (which could
++            mmc->loader_start = p->base;
-- * cause an exception). So we implement the common logic for the
+             return;
-- * sysreg access in gen_M_fp_sysreg_write() and gen_M_fp_sysreg_read(),
+         }
 - * which take pointers to callback functions which will perform the
 - * actual "read/write general purpose register" and "read/write
 - * memory" operations.
 - */
 -
 -/*
 - * Emit code to store the sysreg to its final destination; frees the
 - * TCG temp 'value' it is passed.
 - */
 -typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
 -/*
 - * Emit code to load the value to be copied to the sysreg; returns
 - * a new TCG temporary
 - */
 -typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
 -
 -/* Common decode/access checks for fp sysreg read/write */
 -typedef enum FPSysRegCheckResult {
 -    FPSysRegCheckFailed, /* caller should return false */
 -    FPSysRegCheckDone, /* caller should return true */
 -    FPSysRegCheckContinue, /* caller should continue generating code */
 -} FPSysRegCheckResult;
 -
 -static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
 -{
 -    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
 -        return FPSysRegCheckFailed;
 -    }
 -
 -    switch (regno) {
 -    case ARM_VFP_FPSCR:
 -    case QEMU_VFP_FPSCR_NZCV:
 -        break;
 -    case ARM_VFP_FPSCR_NZCVQC:
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 -            return FPSysRegCheckFailed;
 -        }
 -        break;
 -    case ARM_VFP_FPCXT_S:
 -    case ARM_VFP_FPCXT_NS:
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 -            return FPSysRegCheckFailed;
 -        }
 -        if (!s->v8m_secure) {
 -            return FPSysRegCheckFailed;
 -        }
 -        break;
 -    case ARM_VFP_VPR:
 -    case ARM_VFP_P0:
 -        if (!dc_isar_feature(aa32_mve, s)) {
 -            return FPSysRegCheckFailed;
 -        }
 -        break;
 -    default:
 -        return FPSysRegCheckFailed;
 -    }
 -
 -    /*
 -     * FPCXT_NS is a special case: it has specific handling for
 -     * "current FP state is inactive", and must do the PreserveFPState()
 -     * but not the usual full set of actions done by ExecuteFPCheck().
 -     * So we don't call vfp_access_check() and the callers must handle this.
 -     */
 -    if (regno != ARM_VFP_FPCXT_NS && !vfp_access_check(s)) {
 -        return FPSysRegCheckDone;
 -    }
 -    return FPSysRegCheckContinue;
 -}
 -
 -static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
 -                                  TCGLabel *label)
 -{
 -    /*
 -     * FPCXT_NS is a special case: it has specific handling for
 -     * "current FP state is inactive", and must do the PreserveFPState()
 -     * but not the usual full set of actions done by ExecuteFPCheck().
 -     * We don't have a TB flag that matches the fpInactive check, so we
 -     * do it at runtime as we don't expect FPCXT_NS accesses to be frequent.
 -     *
 -     * Emit code that checks fpInactive and does a conditional
 -     * branch to label based on it:
 -     *  if cond is TCG_COND_NE then branch if fpInactive != 0 (ie if inactive)
 -     *  if cond is TCG_COND_EQ then branch if fpInactive == 0 (ie if active)
 -     */
 -    assert(cond == TCG_COND_EQ || cond == TCG_COND_NE);
 -
 -    /* fpInactive = FPCCR_NS.ASPEN == 1 && CONTROL.FPCA == 0 */
 -    TCGv_i32 aspen, fpca;
 -    aspen = load_cpu_field(v7m.fpccr[M_REG_NS]);
 -    fpca = load_cpu_field(v7m.control[M_REG_S]);
 -    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 -    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 -    tcg_gen_andi_i32(fpca, fpca, R_V7M_CONTROL_FPCA_MASK);
 -    tcg_gen_or_i32(fpca, fpca, aspen);
 -    tcg_gen_brcondi_i32(tcg_invert_cond(cond), fpca, 0, label);
 -    tcg_temp_free_i32(aspen);
 -    tcg_temp_free_i32(fpca);
 -}
 -
 -static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
 -                                  fp_sysreg_loadfn *loadfn,
 -                                  void *opaque)
 -{
 -    /* Do a write to an M-profile floating point system register */
 -    TCGv_i32 tmp;
 -    TCGLabel *lab_end = NULL;
 -
 -    switch (fp_sysreg_checks(s, regno)) {
 -    case FPSysRegCheckFailed:
 -        return false;
 -    case FPSysRegCheckDone:
 -        return true;
 -    case FPSysRegCheckContinue:
 -        break;
 -    }
 -
 -    switch (regno) {
 -    case ARM_VFP_FPSCR:
 -        tmp = loadfn(s, opaque);
 -        gen_helper_vfp_set_fpscr(cpu_env, tmp);
 -        tcg_temp_free_i32(tmp);
 -        gen_lookup_tb(s);
 -        break;
 -    case ARM_VFP_FPSCR_NZCVQC:
 -    {
 -        TCGv_i32 fpscr;
 -        tmp = loadfn(s, opaque);
 -        if (dc_isar_feature(aa32_mve, s)) {
 -            /* QC is only present for MVE; otherwise RES0 */
 -            TCGv_i32 qc = tcg_temp_new_i32();
 -            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
 -            /*
 -             * The 4 vfp.qc[] fields need only be "zero" vs "non-zero";
 -             * here writing the same value into all elements is simplest.
 -             */
 -            tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
 -                                 16, 16, qc);
 -        }
 -        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
 -        fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
 -        tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
 -        tcg_gen_or_i32(fpscr, fpscr, tmp);
 -        store_cpu_field(fpscr, vfp.xregs[ARM_VFP_FPSCR]);
 -        tcg_temp_free_i32(tmp);
 -        break;
 -    }
 -    case ARM_VFP_FPCXT_NS:
 -        lab_end = gen_new_label();
 -        /* fpInactive case: write is a NOP, so branch to end */
 -        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
 -        /*
 -         * !fpInactive: if FPU disabled, take NOCP exception;
 -         * otherwise PreserveFPState(), and then FPCXT_NS writes
 -         * behave the same as FPCXT_S writes.
 -         */
 -        if (s->fp_excp_el) {
 -            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 -                               syn_uncategorized(), s->fp_excp_el);
 -            /*
 -             * This was only a conditional exception, so override
 -             * gen_exception_insn()'s default to DISAS_NORETURN
 -             */
 -            s->base.is_jmp = DISAS_NEXT;
 -            break;
 -        }
 -        gen_preserve_fp_state(s);
 -        /* fall through */
 -    case ARM_VFP_FPCXT_S:
 -    {
 -        TCGv_i32 sfpa, control;
 -        /*
 -         * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
 -         * bits [27:0] from value and zeroes bits [31:28].
 -         */
 -        tmp = loadfn(s, opaque);
 -        sfpa = tcg_temp_new_i32();
 -        tcg_gen_shri_i32(sfpa, tmp, 31);
 -        control = load_cpu_field(v7m.control[M_REG_S]);
 -        tcg_gen_deposit_i32(control, control, sfpa,
 -                            R_V7M_CONTROL_SFPA_SHIFT, 1);
 -        store_cpu_field(control, v7m.control[M_REG_S]);
 -        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
 -        gen_helper_vfp_set_fpscr(cpu_env, tmp);
 -        tcg_temp_free_i32(tmp);
 -        tcg_temp_free_i32(sfpa);
 -        break;
 -    }
 -    case ARM_VFP_VPR:
 -        /* Behaves as NOP if not privileged */
 -        if (IS_USER(s)) {
 -            break;
 -        }
 -        tmp = loadfn(s, opaque);
 -        store_cpu_field(tmp, v7m.vpr);
 -        break;
 -    case ARM_VFP_P0:
 -    {
 -        TCGv_i32 vpr;
 -        tmp = loadfn(s, opaque);
 -        vpr = load_cpu_field(v7m.vpr);
 -        tcg_gen_deposit_i32(vpr, vpr, tmp,
 -                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
 -        store_cpu_field(vpr, v7m.vpr);
 -        tcg_temp_free_i32(tmp);
 -        break;
 -    }
 -    default:
 -        g_assert_not_reached();
 -    }
 -    if (lab_end) {
 -        gen_set_label(lab_end);
 -    }
 -    return true;
 -}
 -
 -static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
 -                                 fp_sysreg_storefn *storefn,
 -                                 void *opaque)
 -{
 -    /* Do a read from an M-profile floating point system register */
 -    TCGv_i32 tmp;
 -    TCGLabel *lab_end = NULL;
 -    bool lookup_tb = false;
 -
 -    switch (fp_sysreg_checks(s, regno)) {
 -    case FPSysRegCheckFailed:
 -        return false;
 -    case FPSysRegCheckDone:
 -        return true;
 -    case FPSysRegCheckContinue:
 -        break;
 -    }
 -
 -    if (regno == ARM_VFP_FPSCR_NZCVQC && !dc_isar_feature(aa32_mve, s)) {
 -        /* QC is RES0 without MVE, so NZCVQC simplifies to NZCV */
 -        regno = QEMU_VFP_FPSCR_NZCV;
 -    }
 -
 -    switch (regno) {
 -    case ARM_VFP_FPSCR:
 -        tmp = tcg_temp_new_i32();
 -        gen_helper_vfp_get_fpscr(tmp, cpu_env);
 -        storefn(s, opaque, tmp);
 -        break;
 -    case ARM_VFP_FPSCR_NZCVQC:
 -        tmp = tcg_temp_new_i32();
 -        gen_helper_vfp_get_fpscr(tmp, cpu_env);
 -        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
 -        storefn(s, opaque, tmp);
 -        break;
 -    case QEMU_VFP_FPSCR_NZCV:
 -        /*
 -         * Read just NZCV; this is a special case to avoid the
 -         * helper call for the "VMRS to CPSR.NZCV" insn.
 -         */
 -        tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
 -        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
 -        storefn(s, opaque, tmp);
 -        break;
 -    case ARM_VFP_FPCXT_S:
 -    {
 -        TCGv_i32 control, sfpa, fpscr;
 -        /* Bits [27:0] from FPSCR, bit [31] from CONTROL.SFPA */
 -        tmp = tcg_temp_new_i32();
 -        sfpa = tcg_temp_new_i32();
 -        gen_helper_vfp_get_fpscr(tmp, cpu_env);
 -        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
 -        control = load_cpu_field(v7m.control[M_REG_S]);
 -        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
 -        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
 -        tcg_gen_or_i32(tmp, tmp, sfpa);
 -        tcg_temp_free_i32(sfpa);
 -        /*
 -         * Store result before updating FPSCR etc, in case
 -         * it is a memory write which causes an exception.
 -         */
 -        storefn(s, opaque, tmp);
 -        /*
 -         * Now we must reset FPSCR from FPDSCR_NS, and clear
 -         * CONTROL.SFPA; so we'll end the TB here.
 -         */
 -        tcg_gen_andi_i32(control, control, ~R_V7M_CONTROL_SFPA_MASK);
 -        store_cpu_field(control, v7m.control[M_REG_S]);
 -        fpscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
 -        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 -        tcg_temp_free_i32(fpscr);
 -        lookup_tb = true;
 -        break;
 -    }
 -    case ARM_VFP_FPCXT_NS:
 -    {
 -        TCGv_i32 control, sfpa, fpscr, fpdscr, zero;
 -        TCGLabel *lab_active = gen_new_label();
 -
 -        lookup_tb = true;
 -
 -        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
 -        /* fpInactive case: reads as FPDSCR_NS */
 -        TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
 -        storefn(s, opaque, tmp);
 -        lab_end = gen_new_label();
 -        tcg_gen_br(lab_end);
 -
 -        gen_set_label(lab_active);
 -        /*
 -         * !fpInactive: if FPU disabled, take NOCP exception;
 -         * otherwise PreserveFPState(), and then FPCXT_NS
 -         * reads the same as FPCXT_S.
 -         */
 -        if (s->fp_excp_el) {
 -            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 -                               syn_uncategorized(), s->fp_excp_el);
 -            /*
 -             * This was only a conditional exception, so override
 -             * gen_exception_insn()'s default to DISAS_NORETURN
 -             */
 -            s->base.is_jmp = DISAS_NEXT;
 -            break;
 -        }
 -        gen_preserve_fp_state(s);
 -        tmp = tcg_temp_new_i32();
 -        sfpa = tcg_temp_new_i32();
 -        fpscr = tcg_temp_new_i32();
 -        gen_helper_vfp_get_fpscr(fpscr, cpu_env);
 -        tcg_gen_andi_i32(tmp, fpscr, ~FPCR_NZCV_MASK);
 -        control = load_cpu_field(v7m.control[M_REG_S]);
 -        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
 -        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
 -        tcg_gen_or_i32(tmp, tmp, sfpa);
 -        tcg_temp_free_i32(control);
 -        /* Store result before updating FPSCR, in case it faults */
 -        storefn(s, opaque, tmp);
 -        /* If SFPA is zero then set FPSCR from FPDSCR_NS */
 -        fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
 -        zero = tcg_const_i32(0);
 -        tcg_gen_movcond_i32(TCG_COND_EQ, fpscr, sfpa, zero, fpdscr, fpscr);
 -        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 -        tcg_temp_free_i32(zero);
 -        tcg_temp_free_i32(sfpa);
 -        tcg_temp_free_i32(fpdscr);
 -        tcg_temp_free_i32(fpscr);
 -        break;
 -    }
 -    case ARM_VFP_VPR:
 -        /* Behaves as NOP if not privileged */
 -        if (IS_USER(s)) {
 -            break;
 -        }
 -        tmp = load_cpu_field(v7m.vpr);
 -        storefn(s, opaque, tmp);
 -        break;
 -    case ARM_VFP_P0:
 -        tmp = load_cpu_field(v7m.vpr);
 -        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
 -        storefn(s, opaque, tmp);
 -        break;
 -    default:
 -        g_assert_not_reached();
 -    }
 -
 -    if (lab_end) {
 -        gen_set_label(lab_end);
 -    }
 -    if (lookup_tb) {
 -        gen_lookup_tb(s);
 -    }
 -    return true;
 -}
 -
 -static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
 -{
 -    arg_VMSR_VMRS *a = opaque;
 -
 -    if (a->rt == 15) {
 -        /* Set the 4 flag bits in the CPSR */
 -        gen_set_nzcv(value);
 -        tcg_temp_free_i32(value);
 -    } else {
 -        store_reg(s, a->rt, value);
 -    }
 -}
 -
 -static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
 -{
 -    arg_VMSR_VMRS *a = opaque;
 -
 -    return load_reg(s, a->rt);
 -}
 -
 -static bool gen_M_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
 -{
 -    /*
 -     * Accesses to R15 are UNPREDICTABLE; we choose to undef.
 -     * FPSCR -> r15 is a special case which writes to the PSR flags;
 -     * set a->reg to a special value to tell gen_M_fp_sysreg_read()
 -     * we only care about the top 4 bits of FPSCR there.
 -     */
 -    if (a->rt == 15) {
 -        if (a->l && a->reg == ARM_VFP_FPSCR) {
 -            a->reg = QEMU_VFP_FPSCR_NZCV;
 -        } else {
 -            return false;
 -        }
 -    }
 -
 -    if (a->l) {
 -        /* VMRS, move FP system register to gp register */
 -        return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_gpr, a);
 -    } else {
 -        /* VMSR, move gp register to FP system register */
 -        return gen_M_fp_sysreg_write(s, a->reg, gpr_to_fp_sysreg, a);
 -    }
 -}
 -
  static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
  {
      TCGv_i32 tmp;
      bool ignore_vfp_enabled = false;
      if (arm_dc_feature(s, ARM_FEATURE_M)) {
 -        return gen_M_VMSR_VMRS(s, a);
 +        /* M profile version was already handled in m-nocp.decode */
 +        return false;
      }
+@@ -XXX,XX +XXX,XX @@ static void mps3r_an536_class_init(ObjectClass *oc, void *data)
-     if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+     };
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
-     return true;
+     mc->desc = "ARM MPS3 with AN536 FPGA image for Cortex-R52";
- }
+-    mc->default_cpus = 2;
+-    mc->min_cpus = mc->default_cpus;
--static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
+-    mc->max_cpus = mc->default_cpus;
--{
++    /*
--    arg_vldr_sysreg *a = opaque;
++     * In the real FPGA image there are always two cores, but the standard
--    uint32_t offset = a->imm;
++     * initial setting for the SCC SYSCON 0x000 register is 0x21, meaning
--    TCGv_i32 addr;
++     * that the second core is held in reset and halted. Many images built for
--
++     * the board do not expect the second core to run at startup (especially
--    if (!a->a) {
++     * since on the real FPGA image it is not possible to use LDREX/STREX
--        offset = -offset;
++     * in RAM between the two cores, so a true SMP setup isn't supported).
--    }
++     *
--
++     * As QEMU's equivalent of this, we support both -smp 1 and -smp 2,
--    addr = load_reg(s, a->rn);
++     * with the default being -smp 1. This seems a more intuitive UI for
--    if (a->p) {
++     * QEMU users than, for instance, having a machine property to allow
--        tcg_gen_addi_i32(addr, addr, offset);
++     * the user to set the initial value of the SYSCON 0x000 register.
--    }
++     */
--
++    mc->default_cpus = 1;
--    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
++    mc->min_cpus = 1;
--        gen_helper_v8m_stackcheck(cpu_env, addr);
++    mc->max_cpus = 2;
--    }
+     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-r52");
--
+     mc->valid_cpu_types = valid_cpu_types;
--    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
+     mmc->raminfo = an536_raminfo;
 -                    MO_UL | MO_ALIGN | s->be_data);
 -    tcg_temp_free_i32(value);
 -
 -    if (a->w) {
 -        /* writeback */
 -        if (!a->p) {
 -            tcg_gen_addi_i32(addr, addr, offset);
 -        }
 -        store_reg(s, a->rn, addr);
 -    } else {
 -        tcg_temp_free_i32(addr);
 -    }
 -}
 -
 -static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
 -{
 -    arg_vldr_sysreg *a = opaque;
 -    uint32_t offset = a->imm;
 -    TCGv_i32 addr;
 -    TCGv_i32 value = tcg_temp_new_i32();
 -
 -    if (!a->a) {
 -        offset = -offset;
 -    }
 -
 -    addr = load_reg(s, a->rn);
 -    if (a->p) {
 -        tcg_gen_addi_i32(addr, addr, offset);
 -    }
 -
 -    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
 -        gen_helper_v8m_stackcheck(cpu_env, addr);
 -    }
 -
 -    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
 -                    MO_UL | MO_ALIGN | s->be_data);
 -
 -    if (a->w) {
 -        /* writeback */
 -        if (!a->p) {
 -            tcg_gen_addi_i32(addr, addr, offset);
 -        }
 -        store_reg(s, a->rn, addr);
 -    } else {
 -        tcg_temp_free_i32(addr);
 -    }
 -    return value;
 -}
 -
 -static bool trans_VLDR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
 -{
 -    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 -        return false;
 -    }
 -    if (a->rn == 15) {
 -        return false;
 -    }
 -    return gen_M_fp_sysreg_write(s, a->reg, memory_to_fp_sysreg, a);
 -}
 -
 -static bool trans_VSTR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
 -{
 -    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 -        return false;
 -    }
 -    if (a->rn == 15) {
 -        return false;
 -    }
 -    return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_memory, a);
 -}
  static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
  {
 --
-.20.1
+.34.1

-[PULL 09/57] target/arm: Factor FP context update code out into helper function
+[PULL 32/35] hw/arm/mps3r: Add UARTs
-Factor the code in full_vfp_access_check() which updates the
+This board has a lot of UARTs: there is one UART per CPU in the
-ownership of the FP context and creates a new FP context
+per-CPU peripheral part of the address map, whose interrupts are
-out into its own function.
+connected as per-CPU interrupt lines.  Then there are 4 UARTs in the
 normal part of the peripheral space, whose interrupts are shared
 peripheral interrupts.
 Connect and wire them all up; this involves some OR gates where
 multiple overflow interrupts are wired into one GIC input.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20210618141019.10671-6-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-11-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.c | 104 +++++++++++++++++++++----------------
+ hw/arm/mps3r.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++
-file changed, 58 insertions(+), 46 deletions(-)
+file changed, 94 insertions(+)
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/translate-vfp.c
++++ b/hw/arm/mps3r.c
-@@ -XXX,XX +XXX,XX @@ void gen_preserve_fp_state(DisasContext *s)
+@@ -XXX,XX +XXX,XX @@
  #include "qapi/qmp/qlist.h"
  #include "exec/address-spaces.h"
  #include "cpu.h"
 +#include "sysemu/sysemu.h"
  #include "hw/boards.h"
 +#include "hw/or-irq.h"
  #include "hw/qdev-properties.h"
  #include "hw/arm/boot.h"
  #include "hw/arm/bsa.h"
 +#include "hw/char/cmsdk-apb-uart.h"
  #include "hw/intc/arm_gicv3.h"
  /* Define the layout of RAM and ROM in a board */
@@ -XXX,XX +XXX,XX @@ typedef struct RAMInfo {
  #define MPS3R_RAM_MAX 9
  #define MPS3R_CPU_MAX 2
 +#define MPS3R_UART_MAX 4 /* shared UART count */
  #define PERIPHBASE 0xf0000000
  #define NUM_SPIS 96
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
      MemoryRegion sysmem_alias[MPS3R_CPU_MAX];
      MemoryRegion cpu_ram[MPS3R_CPU_MAX];
      GICv3State gic;
 +    /* per-CPU UARTs followed by the shared UARTs */
 +    CMSDKAPBUART uart[MPS3R_CPU_MAX + MPS3R_UART_MAX];
 +    OrIRQState cpu_uart_oflow[MPS3R_CPU_MAX];
 +    OrIRQState uart_oflow;
  };
  #define TYPE_MPS3R_MACHINE "mps3r"
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
  OBJECT_DECLARE_TYPE(MPS3RMachineState, MPS3RMachineClass, MPS3R_MACHINE)
 +/*
 + * Main clock frequency CLK in Hz (50MHz). In the image there are also
 + * ACLK, MCLK, GPUCLK and PERIPHCLK at the same frequency; for our
 + * model we just roll them all into one.
 + */
 +#define CLK_FRQ 50000000
 +
  static const RAMInfo an536_raminfo[] = {
      {
          .name = "ATCM",
@@ -XXX,XX +XXX,XX @@ static void create_gic(MPS3RMachineState *mms, MemoryRegion *sysmem)
      }
  }
 +/*
-+ * Generate code for M-profile FP context handling: update the
++ * Create UART uartno, and map it into the MemoryRegion mem at address baseaddr.
-+ * ownership of the FP context, and create a new context if
++ * The qemu_irq arguments are where we connect the various IRQs from the UART.
 + * necessary. This corresponds to the parts of the pseudocode
 + * ExecuteFPCheck() after the inital PreserveFPState() call.
 + */
-+static void gen_update_fp_context(DisasContext *s)
++static void create_uart(MPS3RMachineState *mms, int uartno, MemoryRegion *mem,
 +                        hwaddr baseaddr, qemu_irq txirq, qemu_irq rxirq,
 +                        qemu_irq txoverirq, qemu_irq rxoverirq,
 +                        qemu_irq combirq)
 +{
-+    /* Update ownership of FP context: set FPCCR.S to match current state */
++    g_autofree char *s = g_strdup_printf("uart%d", uartno);
-+    if (s->v8m_fpccr_s_wrong) {
++    SysBusDevice *sbd;
 +        TCGv_i32 tmp;
 +
-+        tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
++    assert(uartno < ARRAY_SIZE(mms->uart));
-+        if (s->v8m_secure) {
++    object_initialize_child(OBJECT(mms), s, &mms->uart[uartno],
-+            tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
++                            TYPE_CMSDK_APB_UART);
-+        } else {
++    qdev_prop_set_uint32(DEVICE(&mms->uart[uartno]), "pclk-frq", CLK_FRQ);
-+            tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
++    qdev_prop_set_chr(DEVICE(&mms->uart[uartno]), "chardev", serial_hd(uartno));
-+        }
++    sbd = SYS_BUS_DEVICE(&mms->uart[uartno]);
-+        store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
++    sysbus_realize(sbd, &error_fatal);
-+        /* Don't need to do this for any further FP insns in this TB */
++    memory_region_add_subregion(mem, baseaddr,
-+        s->v8m_fpccr_s_wrong = false;
++                                sysbus_mmio_get_region(sbd, 0));
-+    }
++    sysbus_connect_irq(sbd, 0, txirq);
-+
++    sysbus_connect_irq(sbd, 1, rxirq);
-+    if (s->v7m_new_fp_ctxt_needed) {
++    sysbus_connect_irq(sbd, 2, txoverirq);
-+        /*
++    sysbus_connect_irq(sbd, 3, rxoverirq);
-+         * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA,
++    sysbus_connect_irq(sbd, 4, combirq);
 +         * the FPSCR, and VPR.
 +         */
 +        TCGv_i32 control, fpscr;
 +        uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
 +
 +        fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
 +        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 +        tcg_temp_free_i32(fpscr);
 +        if (dc_isar_feature(aa32_mve, s)) {
 +            TCGv_i32 z32 = tcg_const_i32(0);
 +            store_cpu_field(z32, v7m.vpr);
 +        }
 +
 +        /*
 +         * We don't need to arrange to end the TB, because the only
 +         * parts of FPSCR which we cache in the TB flags are the VECLEN
 +         * and VECSTRIDE, and those don't exist for M-profile.
 +         */
 +
 +        if (s->v8m_secure) {
 +            bits |= R_V7M_CONTROL_SFPA_MASK;
 +        }
 +        control = load_cpu_field(v7m.control[M_REG_S]);
 +        tcg_gen_ori_i32(control, control, bits);
 +        store_cpu_field(control, v7m.control[M_REG_S]);
 +        /* Don't need to do this for any further FP insns in this TB */
 +        s->v7m_new_fp_ctxt_needed = false;
 +    }
 +}
 +
- /*
+ static void mps3r_common_init(MachineState *machine)
-  * Check that VFP access is enabled. If it is, do the necessary
+ {
-  * M-profile lazy-FP handling and then return true.
+     MPS3RMachineState *mms = MPS3R_MACHINE(machine);
-@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+     MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
-         /* Trigger lazy-state preservation if necessary */
+     MemoryRegion *sysmem = get_system_memory();
-         gen_preserve_fp_state(s);
++    DeviceState *gicdev;
--        /* Update ownership of FP context: set FPCCR.S to match current state */
+     for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
--        if (s->v8m_fpccr_s_wrong) {
+         MemoryRegion *mr = mr_for_raminfo(mms, ri);
--            TCGv_i32 tmp;
+@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
 -
 -            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
 -            if (s->v8m_secure) {
 -                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
 -            } else {
 -                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
 -            }
 -            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
 -            /* Don't need to do this for any further FP insns in this TB */
 -            s->v8m_fpccr_s_wrong = false;
 -        }
 -
 -        if (s->v7m_new_fp_ctxt_needed) {
 -            /*
 -             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA,
 -             * the FPSCR, and VPR.
 -             */
 -            TCGv_i32 control, fpscr;
 -            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
 -
 -            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
 -            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 -            tcg_temp_free_i32(fpscr);
 -            if (dc_isar_feature(aa32_mve, s)) {
 -                TCGv_i32 z32 = tcg_const_i32(0);
 -                store_cpu_field(z32, v7m.vpr);
 -            }
 -
 -            /*
 -             * We don't need to arrange to end the TB, because the only
 -             * parts of FPSCR which we cache in the TB flags are the VECLEN
 -             * and VECSTRIDE, and those don't exist for M-profile.
 -             */
 -
 -            if (s->v8m_secure) {
 -                bits |= R_V7M_CONTROL_SFPA_MASK;
 -            }
 -            control = load_cpu_field(v7m.control[M_REG_S]);
 -            tcg_gen_ori_i32(control, control, bits);
 -            store_cpu_field(control, v7m.control[M_REG_S]);
 -            /* Don't need to do this for any further FP insns in this TB */
 -            s->v7m_new_fp_ctxt_needed = false;
 -        }
 +        /* Update ownership of FP context and create new FP context if needed */
 +        gen_update_fp_context(s);
      }
-     return true;
+     create_gic(mms, sysmem);
 +    gicdev = DEVICE(&mms->gic);
 +
 +    /*
 +     * UARTs 0 and 1 are per-CPU; their interrupts are wired to
 +     * the relevant CPU's PPI 0..3, aka INTID 16..19
 +     */
 +    for (int i = 0; i < machine->smp.cpus; i++) {
 +        int intidbase = NUM_SPIS + i * GIC_INTERNAL;
 +        g_autofree char *s = g_strdup_printf("cpu-uart-oflow-orgate%d", i);
 +        DeviceState *orgate;
 +
 +        /* The two overflow IRQs from the UART are ORed together into PPI 3 */
 +        object_initialize_child(OBJECT(mms), s, &mms->cpu_uart_oflow[i],
 +                                TYPE_OR_IRQ);
 +        orgate = DEVICE(&mms->cpu_uart_oflow[i]);
 +        qdev_prop_set_uint32(orgate, "num-lines", 2);
 +        qdev_realize(orgate, NULL, &error_fatal);
 +        qdev_connect_gpio_out(orgate, 0,
 +                              qdev_get_gpio_in(gicdev, intidbase + 19));
 +
 +        create_uart(mms, i, &mms->cpu_sysmem[i], 0xe7c00000,
 +                    qdev_get_gpio_in(gicdev, intidbase + 17), /* tx */
 +                    qdev_get_gpio_in(gicdev, intidbase + 16), /* rx */
 +                    qdev_get_gpio_in(orgate, 0), /* txover */
 +                    qdev_get_gpio_in(orgate, 1), /* rxover */
 +                    qdev_get_gpio_in(gicdev, intidbase + 18) /* combined */);
 +    }
 +    /*
 +     * UARTs 2 to 5 are whole-system; all overflow IRQs are ORed
 +     * together into IRQ 17
 +     */
 +    object_initialize_child(OBJECT(mms), "uart-oflow-orgate",
 +                            &mms->uart_oflow, TYPE_OR_IRQ);
 +    qdev_prop_set_uint32(DEVICE(&mms->uart_oflow), "num-lines",
 +                         MPS3R_UART_MAX * 2);
 +    qdev_realize(DEVICE(&mms->uart_oflow), NULL, &error_fatal);
 +    qdev_connect_gpio_out(DEVICE(&mms->uart_oflow), 0,
 +                          qdev_get_gpio_in(gicdev, 17));
 +
 +    for (int i = 0; i < MPS3R_UART_MAX; i++) {
 +        hwaddr baseaddr = 0xe0205000 + i * 0x1000;
 +        int rxirq = 5 + i * 2, txirq = 6 + i * 2, combirq = 13 + i;
 +
 +        create_uart(mms, i + MPS3R_CPU_MAX, sysmem, baseaddr,
 +                    qdev_get_gpio_in(gicdev, txirq),
 +                    qdev_get_gpio_in(gicdev, rxirq),
 +                    qdev_get_gpio_in(DEVICE(&mms->uart_oflow), i * 2),
 +                    qdev_get_gpio_in(DEVICE(&mms->uart_oflow), i * 2 + 1),
 +                    qdev_get_gpio_in(gicdev, combirq));
 +    }
      mms->bootinfo.ram_size = machine->ram_size;
      mms->bootinfo.board_id = -1;
 --
-.20.1
+.34.1

-[PULL 11/57] target/arm: Handle FPU check for FPCXT_NS insns via vfp_access_check_m()
+Deleted patch
-Instead of open-coding the "take NOCP exception if FPU disabled,
-otherwise call gen_preserve_fp_state()" code in the accessors for
-FPCXT_NS, add an argument to vfp_access_check_m() which tells it to
-skip the gen_update_fp_context() call, so we can use it for the
-FPCXT_NS case.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210618141019.10671-8-peter.maydell@linaro.org
----
- target/arm/translate-a32.h    |  2 +-
- target/arm/translate-m-nocp.c | 10 ++--------
- target/arm/translate-vfp.c    | 13 ++++++++-----
-files changed, 11 insertions(+), 14 deletions(-)
-diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a32.h
-+++ b/target/arm/translate-a32.h
-@@ -XXX,XX +XXX,XX @@ bool disas_neon_shared(DisasContext *s, uint32_t insn);
- void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
- void arm_gen_condlabel(DisasContext *s);
- bool vfp_access_check(DisasContext *s);
--void gen_preserve_fp_state(DisasContext *s);
-+bool vfp_access_check_m(DisasContext *s, bool skip_context_update);
- void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop);
- void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop);
- void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop);
-diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-m-nocp.c
-+++ b/target/arm/translate-m-nocp.c
-@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
-          * otherwise PreserveFPState(), and then FPCXT_NS writes
-          * behave the same as FPCXT_S writes.
-          */
--        if (s->fp_excp_el) {
--            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
--                               syn_uncategorized(), s->fp_excp_el);
-+        if (!vfp_access_check_m(s, true)) {
-             /*
-              * This was only a conditional exception, so override
-              * gen_exception_insn()'s default to DISAS_NORETURN
-@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
-             s->base.is_jmp = DISAS_NEXT;
-             break;
-         }
--        gen_preserve_fp_state(s);
-     }
-     /* fall through */
-     case ARM_VFP_FPCXT_S:
-@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
-          * otherwise PreserveFPState(), and then FPCXT_NS
-          * reads the same as FPCXT_S.
-          */
--        if (s->fp_excp_el) {
--            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
--                               syn_uncategorized(), s->fp_excp_el);
-+        if (!vfp_access_check_m(s, true)) {
-             /*
-              * This was only a conditional exception, so override
-              * gen_exception_insn()'s default to DISAS_NORETURN
-@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
-             s->base.is_jmp = DISAS_NEXT;
-             break;
-         }
--        gen_preserve_fp_state(s);
-         tmp = tcg_temp_new_i32();
-         sfpa = tcg_temp_new_i32();
-         fpscr = tcg_temp_new_i32();
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
-+++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static inline long vfp_f16_offset(unsigned reg, bool top)
-  * Generate code for M-profile lazy FP state preservation if needed;
-  * this corresponds to the pseudocode PreserveFPState() function.
-  */
--void gen_preserve_fp_state(DisasContext *s)
-+static void gen_preserve_fp_state(DisasContext *s)
- {
-     if (s->v7m_lspact) {
-         /*
-@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check_a(DisasContext *s, bool ignore_vfp_enabled)
-  * If VFP is enabled, do the necessary M-profile lazy-FP handling and then
-  * return true. If not, emit code to generate an appropriate exception and
-  * return false.
-+ * skip_context_update is true to skip the "update FP context" part of this.
-  */
--static bool vfp_access_check_m(DisasContext *s)
-+bool vfp_access_check_m(DisasContext *s, bool skip_context_update)
- {
-     if (s->fp_excp_el) {
-         /*
-@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check_m(DisasContext *s)
-     /* Trigger lazy-state preservation if necessary */
-     gen_preserve_fp_state(s);
--    /* Update ownership of FP context and create new FP context if needed */
--    gen_update_fp_context(s);
-+    if (!skip_context_update) {
-+        /* Update ownership of FP context and create new FP context if needed */
-+        gen_update_fp_context(s);
-+    }
-     return true;
- }
-@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check_m(DisasContext *s)
- bool vfp_access_check(DisasContext *s)
- {
-     if (arm_dc_feature(s, ARM_FEATURE_M)) {
--        return vfp_access_check_m(s);
-+        return vfp_access_check_m(s, false);
-     } else {
-         return vfp_access_check_a(s, false);
-     }
---
-.20.1

-[PULL 13/57] target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
+Deleted patch
-Implement the variants of MVE VLDR (encodings T1, T2) which perform
-"widening" loads where bytes or halfwords are loaded from memory and
-zero or sign-extended into halfword or word length vector elements,
-and the narrowing MVE VSTR (encodings T1, T2) where bytes or
-halfwords are stored from halfword or word elements.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-3-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    | 10 ++++++++++
- target/arm/mve.decode      | 25 +++++++++++++++++++++++--
- target/arm/mve_helper.c    | 11 +++++++++++
- target/arm/translate-mve.c | 14 ++++++++++++++
-files changed, 58 insertions(+), 2 deletions(-)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vldrw, TCG_CALL_NO_WG, void, env, ptr, i32)
- DEF_HELPER_FLAGS_3(mve_vstrb, TCG_CALL_NO_WG, void, env, ptr, i32)
- DEF_HELPER_FLAGS_3(mve_vstrh, TCG_CALL_NO_WG, void, env, ptr, i32)
- DEF_HELPER_FLAGS_3(mve_vstrw, TCG_CALL_NO_WG, void, env, ptr, i32)
-+
-+DEF_HELPER_FLAGS_3(mve_vldrb_sh, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vldrb_sw, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vldrb_uh, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vldrb_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vldrh_sw, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vldrh_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@
- %qd 22:1 13:3
--&vldr_vstr rn qd imm p a w size l
-+&vldr_vstr rn qd imm p a w size l u
--@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd
-+@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
-+# Note that both Rn and Qd are 3 bits only (no D bit)
-+@vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
- # Vector loads and stores
-+# Widening loads and narrowing stores:
-+# for these P=0 W=0 is 'related encoding'; sz=11 is 'related encoding'
-+# This means we need to expand out to multiple patterns for P, W, SZ.
-+# For stores the U bit must be 0 but we catch that in the trans_ function.
-+# The naming scheme here is "VLDSTB_H == in-memory byte load/store to/from
-+# signed halfword element in register", etc.
-+VLDSTB_H         111 . 110 0 a:1 0 1   . 0 ... ... 0 111 01 ....... @vldst_wn \
-+                 p=0 w=1 size=1
-+VLDSTB_H         111 . 110 1 a:1 0 w:1 . 0 ... ... 0 111 01 ....... @vldst_wn \
-+                 p=1 size=1
-+VLDSTB_W         111 . 110 0 a:1 0 1   . 0 ... ... 0 111 10 ....... @vldst_wn \
-+                 p=0 w=1 size=2
-+VLDSTB_W         111 . 110 1 a:1 0 w:1 . 0 ... ... 0 111 10 ....... @vldst_wn \
-+                 p=1 size=2
-+VLDSTH_W         111 . 110 0 a:1 0 1   . 1 ... ... 0 111 10 ....... @vldst_wn \
-+                 p=0 w=1 size=2
-+VLDSTH_W         111 . 110 1 a:1 0 w:1 . 1 ... ... 0 111 10 ....... @vldst_wn \
-+                 p=1 size=2
-+
- # Non-widening loads/stores (P=0 W=0 is 'related encoding')
- VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111100 .......   @vldr_vstr \
-                  size=0 p=0 w=1
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrb, 1, stb, 1, uint8_t)
- DO_VSTR(vstrh, 2, stw, 2, uint16_t)
- DO_VSTR(vstrw, 4, stl, 4, uint32_t)
-+DO_VLDR(vldrb_sh, 1, ldsb, 2, int16_t)
-+DO_VLDR(vldrb_sw, 1, ldsb, 4, int32_t)
-+DO_VLDR(vldrb_uh, 1, ldub, 2, uint16_t)
-+DO_VLDR(vldrb_uw, 1, ldub, 4, uint32_t)
-+DO_VLDR(vldrh_sw, 2, ldsw, 4, int32_t)
-+DO_VLDR(vldrh_uw, 2, lduw, 4, uint32_t)
-+
-+DO_VSTR(vstrb_h, 1, stb, 2, int16_t)
-+DO_VSTR(vstrb_w, 1, stb, 4, int32_t)
-+DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
-+
- #undef DO_VLDR
- #undef DO_VSTR
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
-     };
-     return do_ldst(s, a, ldstfns[a->size][a->l]);
- }
-+
-+#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST)                  \
-+    static bool trans_##OP(DisasContext *s, arg_VLDR_VSTR *a)   \
-+    {                                                           \
-+        static MVEGenLdStFn * const ldstfns[2][2] = {           \
-+            { gen_helper_mve_##ST, gen_helper_mve_##SLD },      \
-+            { NULL, gen_helper_mve_##ULD },                     \
-+        };                                                      \
-+        return do_ldst(s, a, ldstfns[a->u][a->l]);              \
-+    }
-+
-+DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
-+DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
-+DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
---
-.20.1

-[PULL 14/57] target/arm: Implement MVE VCLZ
+Deleted patch
-Implement the MVE VCLZ insn (and the necessary machinery
-for MVE 1-input vector ops).
-Note that for non-load instructions predication is always performed
-at a byte level granularity regardless of element size (R_ZLSJ),
-and so the masking logic here differs from that used in the VLDR
-and VSTR helpers.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-4-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    |  4 ++
- target/arm/mve.decode      |  8 ++++
- target/arm/mve_helper.c    | 82 ++++++++++++++++++++++++++++++++++++++
- target/arm/translate-mve.c | 38 ++++++++++++++++++
-files changed, 132 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vldrh_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
- DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
- DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
- DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
-+
-+DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@
- #
- %qd 22:1 13:3
-+%qm 5:1 1:3
- &vldr_vstr rn qd imm p a w size l u
-+&1op qd qm size
- @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
- # Note that both Rn and Qd are 3 bits only (no D bit)
- @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
-+@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
-+
- # Vector loads and stores
- # Widening loads and narrowing stores:
-@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
-                  size=1 p=1
- VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
-                  size=2 p=1
-+
-+# Vector miscellaneous
-+
-+VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
- #undef DO_VLDR
- #undef DO_VSTR
-+
-+/*
-+ * The mergemask(D, R, M) macro performs the operation "*D = R" but
-+ * storing only the bytes which correspond to 1 bits in M,
-+ * leaving other bytes in *D unchanged. We use _Generic
-+ * to select the correct implementation based on the type of D.
-+ */
-+
-+static void mergemask_ub(uint8_t *d, uint8_t r, uint16_t mask)
-+{
-+    if (mask & 1) {
-+        *d = r;
-+    }
-+}
-+
-+static void mergemask_sb(int8_t *d, int8_t r, uint16_t mask)
-+{
-+    mergemask_ub((uint8_t *)d, r, mask);
-+}
-+
-+static void mergemask_uh(uint16_t *d, uint16_t r, uint16_t mask)
-+{
-+    uint16_t bmask = expand_pred_b_data[mask & 3];
-+    *d = (*d & ~bmask) | (r & bmask);
-+}
-+
-+static void mergemask_sh(int16_t *d, int16_t r, uint16_t mask)
-+{
-+    mergemask_uh((uint16_t *)d, r, mask);
-+}
-+
-+static void mergemask_uw(uint32_t *d, uint32_t r, uint16_t mask)
-+{
-+    uint32_t bmask = expand_pred_b_data[mask & 0xf];
-+    *d = (*d & ~bmask) | (r & bmask);
-+}
-+
-+static void mergemask_sw(int32_t *d, int32_t r, uint16_t mask)
-+{
-+    mergemask_uw((uint32_t *)d, r, mask);
-+}
-+
-+static void mergemask_uq(uint64_t *d, uint64_t r, uint16_t mask)
-+{
-+    uint64_t bmask = expand_pred_b_data[mask & 0xff];
-+    *d = (*d & ~bmask) | (r & bmask);
-+}
-+
-+static void mergemask_sq(int64_t *d, int64_t r, uint16_t mask)
-+{
-+    mergemask_uq((uint64_t *)d, r, mask);
-+}
-+
-+#define mergemask(D, R, M)                      \
-+    _Generic(D,                                 \
-+             uint8_t *: mergemask_ub,           \
-+             int8_t *:  mergemask_sb,           \
-+             uint16_t *: mergemask_uh,          \
-+             int16_t *:  mergemask_sh,          \
-+             uint32_t *: mergemask_uw,          \
-+             int32_t *:  mergemask_sw,          \
-+             uint64_t *: mergemask_uq,          \
-+             int64_t *:  mergemask_sq)(D, R, M)
-+
-+#define DO_1OP(OP, ESIZE, TYPE, FN)                                     \
-+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
-+    {                                                                   \
-+        TYPE *d = vd, *m = vm;                                          \
-+        uint16_t mask = mve_element_mask(env);                          \
-+        unsigned e;                                                     \
-+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
-+            mergemask(&d[H##ESIZE(e)], FN(m[H##ESIZE(e)]), mask);       \
-+        }                                                               \
-+        mve_advance_vpt(env);                                           \
-+    }
-+
-+#define DO_CLZ_B(N)   (clz32(N) - 24)
-+#define DO_CLZ_H(N)   (clz32(N) - 16)
-+
-+DO_1OP(vclzb, 1, uint8_t, DO_CLZ_B)
-+DO_1OP(vclzh, 2, uint16_t, DO_CLZ_H)
-+DO_1OP(vclzw, 4, uint32_t, clz32)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@
- #include "decode-mve.c.inc"
- typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
-+typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
- /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
- static inline long mve_qreg_offset(unsigned reg)
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
- DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
- DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
- DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
-+
-+static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
-+{
-+    TCGv_ptr qd, qm;
-+
-+    if (!dc_isar_feature(aa32_mve, s) ||
-+        !mve_check_qreg_bank(s, a->qd | a->qm) ||
-+        !fn) {
-+        return false;
-+    }
-+
-+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    qd = mve_qreg_ptr(a->qd);
-+    qm = mve_qreg_ptr(a->qm);
-+    fn(cpu_env, qd, qm);
-+    tcg_temp_free_ptr(qd);
-+    tcg_temp_free_ptr(qm);
-+    mve_update_eci(s);
-+    return true;
-+}
-+
-+#define DO_1OP(INSN, FN)                                        \
-+    static bool trans_##INSN(DisasContext *s, arg_1op *a)       \
-+    {                                                           \
-+        static MVEGenOneOpFn * const fns[] = {                  \
-+            gen_helper_mve_##FN##b,                             \
-+            gen_helper_mve_##FN##h,                             \
-+            gen_helper_mve_##FN##w,                             \
-+            NULL,                                               \
-+        };                                                      \
-+        return do_1op(s, a, fns[a->size]);                      \
-+    }
-+
-+DO_1OP(VCLZ, vclz)
---
-.20.1

-[PULL 15/57] target/arm: Implement MVE VCLS
+Deleted patch
-Implement the MVE VCLS insn.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-5-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    | 4 ++++
- target/arm/mve.decode      | 1 +
- target/arm/mve_helper.c    | 7 +++++++
- target/arm/translate-mve.c | 1 +
-files changed, 13 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
- DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
- DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vclsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vclsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
- DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
- # Vector miscellaneous
-+VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
- VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void mergemask_sq(int64_t *d, int64_t r, uint16_t mask)
-         mve_advance_vpt(env);                                           \
-     }
-+#define DO_CLS_B(N)   (clrsb32(N) - 24)
-+#define DO_CLS_H(N)   (clrsb32(N) - 16)
-+
-+DO_1OP(vclsb, 1, int8_t, DO_CLS_B)
-+DO_1OP(vclsh, 2, int16_t, DO_CLS_H)
-+DO_1OP(vclsw, 4, int32_t, clrsb32)
-+
- #define DO_CLZ_B(N)   (clz32(N) - 24)
- #define DO_CLZ_H(N)   (clz32(N) - 16)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
-     }
- DO_1OP(VCLZ, vclz)
-+DO_1OP(VCLS, vcls)
---
-.20.1

-[PULL 16/57] target/arm: Implement MVE VREV16, VREV32, VREV64
+Deleted patch
-Implement the MVE instructions VREV16, VREV32 and VREV64.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-6-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    |  7 +++++++
- target/arm/mve.decode      |  4 ++++
- target/arm/mve_helper.c    |  7 +++++++
- target/arm/translate-mve.c | 33 +++++++++++++++++++++++++++++++++
-files changed, 51 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_3(mve_vrev16b, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vrev32b, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vrev32h, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vrev64b, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vrev64h, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vrev64w, TCG_CALL_NO_WG, void, env, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
- VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
- VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
-+
-+VREV16           1111 1111 1 . 11 .. 00 ... 0 0001 01 . 0 ... 0 @1op
-+VREV32           1111 1111 1 . 11 .. 00 ... 0 0000 11 . 0 ... 0 @1op
-+VREV64           1111 1111 1 . 11 .. 00 ... 0 0000 01 . 0 ... 0 @1op
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_1OP(vclsw, 4, int32_t, clrsb32)
- DO_1OP(vclzb, 1, uint8_t, DO_CLZ_B)
- DO_1OP(vclzh, 2, uint16_t, DO_CLZ_H)
- DO_1OP(vclzw, 4, uint32_t, clz32)
-+
-+DO_1OP(vrev16b, 2, uint16_t, bswap16)
-+DO_1OP(vrev32b, 4, uint32_t, bswap32)
-+DO_1OP(vrev32h, 4, uint32_t, hswap32)
-+DO_1OP(vrev64b, 8, uint64_t, bswap64)
-+DO_1OP(vrev64h, 8, uint64_t, hswap64)
-+DO_1OP(vrev64w, 8, uint64_t, wswap64)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
- DO_1OP(VCLZ, vclz)
- DO_1OP(VCLS, vcls)
-+
-+static bool trans_VREV16(DisasContext *s, arg_1op *a)
-+{
-+    static MVEGenOneOpFn * const fns[] = {
-+        gen_helper_mve_vrev16b,
-+        NULL,
-+        NULL,
-+        NULL,
-+    };
-+    return do_1op(s, a, fns[a->size]);
-+}
-+
-+static bool trans_VREV32(DisasContext *s, arg_1op *a)
-+{
-+    static MVEGenOneOpFn * const fns[] = {
-+        gen_helper_mve_vrev32b,
-+        gen_helper_mve_vrev32h,
-+        NULL,
-+        NULL,
-+    };
-+    return do_1op(s, a, fns[a->size]);
-+}
-+
-+static bool trans_VREV64(DisasContext *s, arg_1op *a)
-+{
-+    static MVEGenOneOpFn * const fns[] = {
-+        gen_helper_mve_vrev64b,
-+        gen_helper_mve_vrev64h,
-+        gen_helper_mve_vrev64w,
-+        NULL,
-+    };
-+    return do_1op(s, a, fns[a->size]);
-+}
---
-.20.1

-[PULL 17/57] target/arm: Implement MVE VMVN (register)
+Deleted patch
-Implement the MVE VMVN(register) operation.  Note that for
-predication this operation is byte-by-byte.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-7-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    | 2 ++
- target/arm/mve.decode      | 3 +++
- target/arm/mve_helper.c    | 4 ++++
- target/arm/translate-mve.c | 5 +++++
-files changed, 14 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vrev32h, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vrev64b, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vrev64h, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vrev64w, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_3(mve_vmvn, TCG_CALL_NO_WG, void, env, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@
- @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
- @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
-+@1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
- # Vector loads and stores
-@@ -XXX,XX +XXX,XX @@ VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
- VREV16           1111 1111 1 . 11 .. 00 ... 0 0001 01 . 0 ... 0 @1op
- VREV32           1111 1111 1 . 11 .. 00 ... 0 0000 11 . 0 ... 0 @1op
- VREV64           1111 1111 1 . 11 .. 00 ... 0 0000 01 . 0 ... 0 @1op
-+
-+VMVN             1111 1111 1 . 11 00 00 ... 0 0101 11 . 0 ... 0 @1op_nosz
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_1OP(vrev32h, 4, uint32_t, hswap32)
- DO_1OP(vrev64b, 8, uint64_t, bswap64)
- DO_1OP(vrev64h, 8, uint64_t, hswap64)
- DO_1OP(vrev64w, 8, uint64_t, wswap64)
-+
-+#define DO_NOT(N) (~(N))
-+
-+DO_1OP(vmvn, 8, uint64_t, DO_NOT)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_1op *a)
-     };
-     return do_1op(s, a, fns[a->size]);
- }
-+
-+static bool trans_VMVN(DisasContext *s, arg_1op *a)
-+{
-+    return do_1op(s, a, gen_helper_mve_vmvn);
-+}
---
-.20.1

-[PULL 18/57] target/arm: Implement MVE VABS
+Deleted patch
-Implement the MVE VABS functions (both integer and floating point).
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-8-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    |  6 ++++++
- target/arm/mve.decode      |  3 +++
- target/arm/mve_helper.c    | 13 +++++++++++++
- target/arm/translate-mve.c | 15 +++++++++++++++
-files changed, 37 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vrev64h, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vrev64w, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vmvn, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_3(mve_vabsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vabsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vfabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vfabss, TCG_CALL_NO_WG, void, env, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VREV32           1111 1111 1 . 11 .. 00 ... 0 0000 11 . 0 ... 0 @1op
- VREV64           1111 1111 1 . 11 .. 00 ... 0 0000 01 . 0 ... 0 @1op
- VMVN             1111 1111 1 . 11 00 00 ... 0 0101 11 . 0 ... 0 @1op_nosz
-+
-+VABS             1111 1111 1 . 11 .. 01 ... 0 0011 01 . 0 ... 0 @1op
-+VABS_fp          1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@
- #include "exec/helper-proto.h"
- #include "exec/cpu_ldst.h"
- #include "exec/exec-all.h"
-+#include "tcg/tcg.h"
- static uint16_t mve_element_mask(CPUARMState *env)
- {
-@@ -XXX,XX +XXX,XX @@ DO_1OP(vrev64w, 8, uint64_t, wswap64)
- #define DO_NOT(N) (~(N))
- DO_1OP(vmvn, 8, uint64_t, DO_NOT)
-+
-+#define DO_ABS(N) ((N) < 0 ? -(N) : (N))
-+#define DO_FABSH(N)  ((N) & dup_const(MO_16, 0x7fff))
-+#define DO_FABSS(N)  ((N) & dup_const(MO_32, 0x7fffffff))
-+
-+DO_1OP(vabsb, 1, int8_t, DO_ABS)
-+DO_1OP(vabsh, 2, int16_t, DO_ABS)
-+DO_1OP(vabsw, 4, int32_t, DO_ABS)
-+
-+/* We can do these 64 bits at a time */
-+DO_1OP(vfabsh, 8, uint64_t, DO_FABSH)
-+DO_1OP(vfabss, 8, uint64_t, DO_FABSS)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
- DO_1OP(VCLZ, vclz)
- DO_1OP(VCLS, vcls)
-+DO_1OP(VABS, vabs)
- static bool trans_VREV16(DisasContext *s, arg_1op *a)
- {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMVN(DisasContext *s, arg_1op *a)
- {
-     return do_1op(s, a, gen_helper_mve_vmvn);
- }
-+
-+static bool trans_VABS_fp(DisasContext *s, arg_1op *a)
-+{
-+    static MVEGenOneOpFn * const fns[] = {
-+        NULL,
-+        gen_helper_mve_vfabsh,
-+        gen_helper_mve_vfabss,
-+        NULL,
-+    };
-+    if (!dc_isar_feature(aa32_mve_fp, s)) {
-+        return false;
-+    }
-+    return do_1op(s, a, fns[a->size]);
-+}
---
-.20.1

-[PULL 19/57] target/arm: Implement MVE VNEG
+Deleted patch
-Implement the MVE VNEG insn (both integer and floating point forms).
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-9-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    |  6 ++++++
- target/arm/mve.decode      |  2 ++
- target/arm/mve_helper.c    | 12 ++++++++++++
- target/arm/translate-mve.c | 15 +++++++++++++++
-files changed, 35 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vabsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vfabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vfabss, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_3(mve_vnegb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VMVN             1111 1111 1 . 11 00 00 ... 0 0101 11 . 0 ... 0 @1op_nosz
- VABS             1111 1111 1 . 11 .. 01 ... 0 0011 01 . 0 ... 0 @1op
- VABS_fp          1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
-+VNEG             1111 1111 1 . 11 .. 01 ... 0 0011 11 . 0 ... 0 @1op
-+VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_1OP(vabsw, 4, int32_t, DO_ABS)
- /* We can do these 64 bits at a time */
- DO_1OP(vfabsh, 8, uint64_t, DO_FABSH)
- DO_1OP(vfabss, 8, uint64_t, DO_FABSS)
-+
-+#define DO_NEG(N)    (-(N))
-+#define DO_FNEGH(N) ((N) ^ dup_const(MO_16, 0x8000))
-+#define DO_FNEGS(N) ((N) ^ dup_const(MO_32, 0x80000000))
-+
-+DO_1OP(vnegb, 1, int8_t, DO_NEG)
-+DO_1OP(vnegh, 2, int16_t, DO_NEG)
-+DO_1OP(vnegw, 4, int32_t, DO_NEG)
-+
-+/* We can do these 64 bits at a time */
-+DO_1OP(vfnegh, 8, uint64_t, DO_FNEGH)
-+DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
- DO_1OP(VCLZ, vclz)
- DO_1OP(VCLS, vcls)
- DO_1OP(VABS, vabs)
-+DO_1OP(VNEG, vneg)
- static bool trans_VREV16(DisasContext *s, arg_1op *a)
- {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VABS_fp(DisasContext *s, arg_1op *a)
-     }
-     return do_1op(s, a, fns[a->size]);
- }
-+
-+static bool trans_VNEG_fp(DisasContext *s, arg_1op *a)
-+{
-+    static MVEGenOneOpFn * const fns[] = {
-+        NULL,
-+        gen_helper_mve_vfnegh,
-+        gen_helper_mve_vfnegs,
-+        NULL,
-+    };
-+    if (!dc_isar_feature(aa32_mve_fp, s)) {
-+        return false;
-+    }
-+    return do_1op(s, a, fns[a->size]);
-+}
---
-.20.1

-[PULL 21/57] target/arm: Implement MVE VDUP
+Deleted patch
-Implement the MVE VDUP insn, which duplicates a value from
-a general-purpose register into every lane of a vector
-register (subject to predication).
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-11-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    |  2 ++
- target/arm/mve.decode      | 10 ++++++++++
- target/arm/mve_helper.c    | 16 ++++++++++++++++
- target/arm/translate-mve.c | 27 +++++++++++++++++++++++++++
-files changed, 55 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
- DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
- DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
-+DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
-+
- DEF_HELPER_FLAGS_3(mve_vclsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vclsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@
- %qd 22:1 13:3
- %qm 5:1 1:3
-+%qn 7:1 17:3
- &vldr_vstr rn qd imm p a w size l u
- &1op qd qm size
-@@ -XXX,XX +XXX,XX @@ VABS             1111 1111 1 . 11 .. 01 ... 0 0011 01 . 0 ... 0 @1op
- VABS_fp          1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
- VNEG             1111 1111 1 . 11 .. 01 ... 0 0011 11 . 0 ... 0 @1op
- VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
-+
-+&vdup qd rt size
-+# Qd is in the fields usually named Qn
-+@vdup            .... .... . . .. ... . rt:4 .... . . . . .... qd=%qn &vdup
-+
-+# B and E bits encode size, which we decode here to the usual size values
-+VDUP             1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
-+VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
-+VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void mergemask_sq(int64_t *d, int64_t r, uint16_t mask)
-              uint64_t *: mergemask_uq,          \
-              int64_t *:  mergemask_sq)(D, R, M)
-+void HELPER(mve_vdup)(CPUARMState *env, void *vd, uint32_t val)
-+{
-+    /*
-+     * The generated code already replicated an 8 or 16 bit constant
-+     * into the 32-bit value, so we only need to write the 32-bit
-+     * value to all elements of the Qreg, allowing for predication.
-+     */
-+    uint32_t *d = vd;
-+    uint16_t mask = mve_element_mask(env);
-+    unsigned e;
-+    for (e = 0; e < 16 / 4; e++, mask >>= 4) {
-+        mergemask(&d[H4(e)], val, mask);
-+    }
-+    mve_advance_vpt(env);
-+}
-+
- #define DO_1OP(OP, ESIZE, TYPE, FN)                                     \
-     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
-     {                                                                   \
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
- DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
- DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
-+static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
-+{
-+    TCGv_ptr qd;
-+    TCGv_i32 rt;
-+
-+    if (!dc_isar_feature(aa32_mve, s) ||
-+        !mve_check_qreg_bank(s, a->qd)) {
-+        return false;
-+    }
-+    if (a->rt == 13 || a->rt == 15) {
-+        /* UNPREDICTABLE; we choose to UNDEF */
-+        return false;
-+    }
-+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    qd = mve_qreg_ptr(a->qd);
-+    rt = load_reg(s, a->rt);
-+    tcg_gen_dup_i32(a->size, rt, rt);
-+    gen_helper_mve_vdup(cpu_env, qd, rt);
-+    tcg_temp_free_ptr(qd);
-+    tcg_temp_free_i32(rt);
-+    mve_update_eci(s);
-+    return true;
-+}
-+
- static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
- {
-     TCGv_ptr qd, qm;
---
-.20.1

-[PULL 22/57] target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
+Deleted patch
-Implement the MVE vector logical operations operating
-on two registers.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-12-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    |  6 ++++++
- target/arm/mve.decode      |  9 +++++++++
- target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
- target/arm/translate-mve.c | 37 +++++++++++++++++++++++++++++++++++++
-files changed, 78 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
- DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_4(mve_vand, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vbic, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@
- &vldr_vstr rn qd imm p a w size l u
- &1op qd qm size
-+&2op qd qm qn size
- @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
- # Note that both Rn and Qd are 3 bits only (no D bit)
-@@ -XXX,XX +XXX,XX @@
- @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
- @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
-+@2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
- # Vector loads and stores
-@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
- VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
-                  size=2 p=1
-+# Vector 2-op
-+VAND             1110 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
-+VBIC             1110 1111 0 . 01 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
-+VORR             1110 1111 0 . 10 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
-+VORN             1110 1111 0 . 11 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
-+VEOR             1111 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
-+
- # Vector miscellaneous
- VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_1OP(vnegw, 4, int32_t, DO_NEG)
- /* We can do these 64 bits at a time */
- DO_1OP(vfnegh, 8, uint64_t, DO_FNEGH)
- DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
-+
-+#define DO_2OP(OP, ESIZE, TYPE, FN)                                     \
-+    void HELPER(glue(mve_, OP))(CPUARMState *env,                       \
-+                                void *vd, void *vn, void *vm)           \
-+    {                                                                   \
-+        TYPE *d = vd, *n = vn, *m = vm;                                 \
-+        uint16_t mask = mve_element_mask(env);                          \
-+        unsigned e;                                                     \
-+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
-+            mergemask(&d[H##ESIZE(e)],                                  \
-+                      FN(n[H##ESIZE(e)], m[H##ESIZE(e)]), mask);        \
-+        }                                                               \
-+        mve_advance_vpt(env);                                           \
-+    }
-+
-+#define DO_AND(N, M)  ((N) & (M))
-+#define DO_BIC(N, M)  ((N) & ~(M))
-+#define DO_ORR(N, M)  ((N) | (M))
-+#define DO_ORN(N, M)  ((N) | ~(M))
-+#define DO_EOR(N, M)  ((N) ^ (M))
-+
-+DO_2OP(vand, 8, uint64_t, DO_AND)
-+DO_2OP(vbic, 8, uint64_t, DO_BIC)
-+DO_2OP(vorr, 8, uint64_t, DO_ORR)
-+DO_2OP(vorn, 8, uint64_t, DO_ORN)
-+DO_2OP(veor, 8, uint64_t, DO_EOR)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@
- typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
- typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-+typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
- /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
- static inline long mve_qreg_offset(unsigned reg)
-@@ -XXX,XX +XXX,XX @@ static bool trans_VNEG_fp(DisasContext *s, arg_1op *a)
-     }
-     return do_1op(s, a, fns[a->size]);
- }
-+
-+static bool do_2op(DisasContext *s, arg_2op *a, MVEGenTwoOpFn fn)
-+{
-+    TCGv_ptr qd, qn, qm;
-+
-+    if (!dc_isar_feature(aa32_mve, s) ||
-+        !mve_check_qreg_bank(s, a->qd | a->qn | a->qm) ||
-+        !fn) {
-+        return false;
-+    }
-+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    qd = mve_qreg_ptr(a->qd);
-+    qn = mve_qreg_ptr(a->qn);
-+    qm = mve_qreg_ptr(a->qm);
-+    fn(cpu_env, qd, qn, qm);
-+    tcg_temp_free_ptr(qd);
-+    tcg_temp_free_ptr(qn);
-+    tcg_temp_free_ptr(qm);
-+    mve_update_eci(s);
-+    return true;
-+}
-+
-+#define DO_LOGIC(INSN, HELPER)                                  \
-+    static bool trans_##INSN(DisasContext *s, arg_2op *a)       \
-+    {                                                           \
-+        return do_2op(s, a, HELPER);                            \
-+    }
-+
-+DO_LOGIC(VAND, gen_helper_mve_vand)
-+DO_LOGIC(VBIC, gen_helper_mve_vbic)
-+DO_LOGIC(VORR, gen_helper_mve_vorr)
-+DO_LOGIC(VORN, gen_helper_mve_vorn)
-+DO_LOGIC(VEOR, gen_helper_mve_veor)
---
-.20.1

-[PULL 23/57] target/arm: Implement MVE VADD, VSUB, VMUL
+[PULL 33/35] hw/arm/mps3r: Add GPIO, watchdog, dual-timer, I2C devices
-Implement the MVE VADD, VSUB and VMUL insns.
+Add the GPIO, watchdog, dual-timer and I2C devices to the mps3-an536
 board.  These are all simple devices that just need to be created and
 wired up.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20210617121628.20116-13-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-12-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    | 12 ++++++++++++
+ hw/arm/mps3r.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
- target/arm/mve.decode      |  5 +++++
+file changed, 59 insertions(+)
  target/arm/mve_helper.c    | 14 ++++++++++++++
  target/arm/translate-mve.c | 16 ++++++++++++++++
 files changed, 47 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/mps3r.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vbic, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@
- DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "sysemu/sysemu.h"
- DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "hw/boards.h"
- DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "hw/or-irq.h"
 +#include "hw/qdev-clock.h"
  #include "hw/qdev-properties.h"
  #include "hw/arm/boot.h"
  #include "hw/arm/bsa.h"
  #include "hw/char/cmsdk-apb-uart.h"
 +#include "hw/i2c/arm_sbcon_i2c.h"
  #include "hw/intc/arm_gicv3.h"
 +#include "hw/misc/unimp.h"
 +#include "hw/timer/cmsdk-apb-dualtimer.h"
 +#include "hw/watchdog/cmsdk-apb-watchdog.h"
  /* Define the layout of RAM and ROM in a board */
  typedef struct RAMInfo {
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
      CMSDKAPBUART uart[MPS3R_CPU_MAX + MPS3R_UART_MAX];
      OrIRQState cpu_uart_oflow[MPS3R_CPU_MAX];
      OrIRQState uart_oflow;
 +    CMSDKAPBWatchdog watchdog;
 +    CMSDKAPBDualTimer dualtimer;
 +    ArmSbconI2CState i2c[5];
 +    Clock *clk;
  };
  #define TYPE_MPS3R_MACHINE "mps3r"
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
      MemoryRegion *sysmem = get_system_memory();
      DeviceState *gicdev;
 +    mms->clk = clock_new(OBJECT(machine), "CLK");
 +    clock_set_hz(mms->clk, CLK_FRQ);
 +
-+DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+     for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
-+DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+         MemoryRegion *mr = mr_for_raminfo(mms, ri);
-+DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+         memory_region_add_subregion(sysmem, ri->base, mr);
-+
+@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
-+DEF_HELPER_FLAGS_4(mve_vsubb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+                     qdev_get_gpio_in(gicdev, combirq));
 +DEF_HELPER_FLAGS_4(mve_vsubh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vsubw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vmulb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vmulh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vmulw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
  @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
 +@2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
  @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
  # Vector loads and stores
@@ -XXX,XX +XXX,XX @@ VORR             1110 1111 0 . 10 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
  VORN             1110 1111 0 . 11 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
  VEOR             1111 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 +VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 +VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 +VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
          mve_advance_vpt(env);                                           \
      }
-+/* provide unsigned 2-op helpers for all sizes */
++    for (int i = 0; i < 4; i++) {
-+#define DO_2OP_U(OP, FN)                        \
++        /* CMSDK GPIO controllers */
-+    DO_2OP(OP##b, 1, uint8_t, FN)               \
++        g_autofree char *s = g_strdup_printf("gpio%d", i);
-+    DO_2OP(OP##h, 2, uint16_t, FN)              \
++        create_unimplemented_device(s, 0xe0000000 + i * 0x1000, 0x1000);
 +    DO_2OP(OP##w, 4, uint32_t, FN)
 +
  #define DO_AND(N, M)  ((N) & (M))
  #define DO_BIC(N, M)  ((N) & ~(M))
  #define DO_ORR(N, M)  ((N) | (M))
@@ -XXX,XX +XXX,XX @@ DO_2OP(vbic, 8, uint64_t, DO_BIC)
  DO_2OP(vorr, 8, uint64_t, DO_ORR)
  DO_2OP(vorn, 8, uint64_t, DO_ORN)
  DO_2OP(veor, 8, uint64_t, DO_EOR)
 +
 +#define DO_ADD(N, M) ((N) + (M))
 +#define DO_SUB(N, M) ((N) - (M))
 +#define DO_MUL(N, M) ((N) * (M))
 +
 +DO_2OP_U(vadd, DO_ADD)
 +DO_2OP_U(vsub, DO_SUB)
 +DO_2OP_U(vmul, DO_MUL)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_LOGIC(VBIC, gen_helper_mve_vbic)
  DO_LOGIC(VORR, gen_helper_mve_vorr)
  DO_LOGIC(VORN, gen_helper_mve_vorn)
  DO_LOGIC(VEOR, gen_helper_mve_veor)
 +
 +#define DO_2OP(INSN, FN) \
 +    static bool trans_##INSN(DisasContext *s, arg_2op *a)       \
 +    {                                                           \
 +        static MVEGenTwoOpFn * const fns[] = {                  \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +            gen_helper_mve_##FN##w,                             \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_2op(s, a, fns[a->size]);                      \
 +    }
 +
-+DO_2OP(VADD, vadd)
++    object_initialize_child(OBJECT(mms), "watchdog", &mms->watchdog,
-+DO_2OP(VSUB, vsub)
++                            TYPE_CMSDK_APB_WATCHDOG);
-+DO_2OP(VMUL, vmul)
++    qdev_connect_clock_in(DEVICE(&mms->watchdog), "WDOGCLK", mms->clk);
 +    sysbus_realize(SYS_BUS_DEVICE(&mms->watchdog), &error_fatal);
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->watchdog), 0,
 +                       qdev_get_gpio_in(gicdev, 0));
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->watchdog), 0, 0xe0100000);
 +
 +    object_initialize_child(OBJECT(mms), "dualtimer", &mms->dualtimer,
 +                            TYPE_CMSDK_APB_DUALTIMER);
 +    qdev_connect_clock_in(DEVICE(&mms->dualtimer), "TIMCLK", mms->clk);
 +    sysbus_realize(SYS_BUS_DEVICE(&mms->dualtimer), &error_fatal);
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 0,
 +                       qdev_get_gpio_in(gicdev, 3));
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 1,
 +                       qdev_get_gpio_in(gicdev, 1));
 +    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 2,
 +                       qdev_get_gpio_in(gicdev, 2));
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->dualtimer), 0, 0xe0101000);
 +
 +    for (int i = 0; i < ARRAY_SIZE(mms->i2c); i++) {
 +        static const hwaddr i2cbase[] = {0xe0102000,    /* Touch */
 +                                         0xe0103000,    /* Audio */
 +                                         0xe0107000,    /* Shield0 */
 +                                         0xe0108000,    /* Shield1 */
 +                                         0xe0109000};   /* DDR4 EEPROM */
 +        g_autofree char *s = g_strdup_printf("i2c%d", i);
 +
 +        object_initialize_child(OBJECT(mms), s, &mms->i2c[i],
 +                                TYPE_ARM_SBCON_I2C);
 +        sysbus_realize(SYS_BUS_DEVICE(&mms->i2c[i]), &error_fatal);
 +        sysbus_mmio_map(SYS_BUS_DEVICE(&mms->i2c[i]), 0, i2cbase[i]);
 +        if (i != 2 && i != 3) {
 +            /*
 +             * internal-only bus: mark it full to avoid user-created
 +             * i2c devices being plugged into it.
 +             */
 +            qbus_mark_full(qdev_get_child_bus(DEVICE(&mms->i2c[i]), "i2c"));
 +        }
 +    }
 +
      mms->bootinfo.ram_size = machine->ram_size;
      mms->bootinfo.board_id = -1;
      mms->bootinfo.loader_start = mmc->loader_start;
 --
-.20.1
+.34.1

-[PULL 24/57] target/arm: Implement MVE VMULH
+Deleted patch
-Implement the MVE VMULH insn, which performs a vector
-multiply and returns the high half of the result.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-14-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    |  7 +++++++
- target/arm/mve.decode      |  3 +++
- target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
- target/arm/translate-mve.c |  2 ++
-files changed, 38 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vsubw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vmulb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vmulh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vmulw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_4(mve_vmulhsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmulhsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
- VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
- VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
-+VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
-+VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
-+
- # Vector miscellaneous
- VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_2OP(veor, 8, uint64_t, DO_EOR)
- DO_2OP_U(vadd, DO_ADD)
- DO_2OP_U(vsub, DO_SUB)
- DO_2OP_U(vmul, DO_MUL)
-+
-+/*
-+ * Because the computation type is at least twice as large as required,
-+ * these work for both signed and unsigned source types.
-+ */
-+static inline uint8_t do_mulh_b(int32_t n, int32_t m)
-+{
-+    return (n * m) >> 8;
-+}
-+
-+static inline uint16_t do_mulh_h(int32_t n, int32_t m)
-+{
-+    return (n * m) >> 16;
-+}
-+
-+static inline uint32_t do_mulh_w(int64_t n, int64_t m)
-+{
-+    return (n * m) >> 32;
-+}
-+
-+DO_2OP(vmulhsb, 1, int8_t, do_mulh_b)
-+DO_2OP(vmulhsh, 2, int16_t, do_mulh_h)
-+DO_2OP(vmulhsw, 4, int32_t, do_mulh_w)
-+DO_2OP(vmulhub, 1, uint8_t, do_mulh_b)
-+DO_2OP(vmulhuh, 2, uint16_t, do_mulh_h)
-+DO_2OP(vmulhuw, 4, uint32_t, do_mulh_w)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ DO_LOGIC(VEOR, gen_helper_mve_veor)
- DO_2OP(VADD, vadd)
- DO_2OP(VSUB, vsub)
- DO_2OP(VMUL, vmul)
-+DO_2OP(VMULH_S, vmulhs)
-+DO_2OP(VMULH_U, vmulhu)
---
-.20.1

-[PULL 25/57] target/arm: Implement MVE VRMULH
+Deleted patch
-Implement the MVE VRMULH insn, which performs a rounding multiply
-and then returns the high half.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-15-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    |  7 +++++++
- target/arm/mve.decode      |  3 +++
- target/arm/mve_helper.c    | 22 ++++++++++++++++++++++
- target/arm/translate-mve.c |  2 ++
-files changed, 34 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_4(mve_vrmulhsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vrmulhsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vrmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vrmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vrmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vrmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
- VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
- VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
-+VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
-+VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
-+
- # Vector miscellaneous
- VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t do_mulh_w(int64_t n, int64_t m)
-     return (n * m) >> 32;
- }
-+static inline uint8_t do_rmulh_b(int32_t n, int32_t m)
-+{
-+    return (n * m + (1U << 7)) >> 8;
-+}
-+
-+static inline uint16_t do_rmulh_h(int32_t n, int32_t m)
-+{
-+    return (n * m + (1U << 15)) >> 16;
-+}
-+
-+static inline uint32_t do_rmulh_w(int64_t n, int64_t m)
-+{
-+    return (n * m + (1U << 31)) >> 32;
-+}
-+
- DO_2OP(vmulhsb, 1, int8_t, do_mulh_b)
- DO_2OP(vmulhsh, 2, int16_t, do_mulh_h)
- DO_2OP(vmulhsw, 4, int32_t, do_mulh_w)
- DO_2OP(vmulhub, 1, uint8_t, do_mulh_b)
- DO_2OP(vmulhuh, 2, uint16_t, do_mulh_h)
- DO_2OP(vmulhuw, 4, uint32_t, do_mulh_w)
-+
-+DO_2OP(vrmulhsb, 1, int8_t, do_rmulh_b)
-+DO_2OP(vrmulhsh, 2, int16_t, do_rmulh_h)
-+DO_2OP(vrmulhsw, 4, int32_t, do_rmulh_w)
-+DO_2OP(vrmulhub, 1, uint8_t, do_rmulh_b)
-+DO_2OP(vrmulhuh, 2, uint16_t, do_rmulh_h)
-+DO_2OP(vrmulhuw, 4, uint32_t, do_rmulh_w)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ DO_2OP(VSUB, vsub)
- DO_2OP(VMUL, vmul)
- DO_2OP(VMULH_S, vmulhs)
- DO_2OP(VMULH_U, vmulhu)
-+DO_2OP(VRMULH_S, vrmulhs)
-+DO_2OP(VRMULH_U, vrmulhu)
---
-.20.1

-[PULL 26/57] target/arm: Implement MVE VMAX, VMIN
+Deleted patch
-Implement the MVE VMAX and VMIN insns.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-16-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    | 14 ++++++++++++++
- target/arm/mve.decode      |  5 +++++
- target/arm/mve_helper.c    | 14 ++++++++++++++
- target/arm/translate-mve.c |  4 ++++
-files changed, 37 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vrmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vrmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vrmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_4(mve_vmaxsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmaxsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmaxsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmaxub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmaxuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmaxuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_4(mve_vminsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vminsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vminsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vminub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vminuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vminuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
- VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
- VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
-+VMAX_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
-+VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
-+VMIN_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
-+VMIN_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
-+
- # Vector miscellaneous
- VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
-     DO_2OP(OP##h, 2, uint16_t, FN)              \
-     DO_2OP(OP##w, 4, uint32_t, FN)
-+/* provide signed 2-op helpers for all sizes */
-+#define DO_2OP_S(OP, FN)                        \
-+    DO_2OP(OP##b, 1, int8_t, FN)                \
-+    DO_2OP(OP##h, 2, int16_t, FN)               \
-+    DO_2OP(OP##w, 4, int32_t, FN)
-+
- #define DO_AND(N, M)  ((N) & (M))
- #define DO_BIC(N, M)  ((N) & ~(M))
- #define DO_ORR(N, M)  ((N) | (M))
-@@ -XXX,XX +XXX,XX @@ DO_2OP(vrmulhsw, 4, int32_t, do_rmulh_w)
- DO_2OP(vrmulhub, 1, uint8_t, do_rmulh_b)
- DO_2OP(vrmulhuh, 2, uint16_t, do_rmulh_h)
- DO_2OP(vrmulhuw, 4, uint32_t, do_rmulh_w)
-+
-+#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
-+#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
-+
-+DO_2OP_S(vmaxs, DO_MAX)
-+DO_2OP_U(vmaxu, DO_MAX)
-+DO_2OP_S(vmins, DO_MIN)
-+DO_2OP_U(vminu, DO_MIN)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULH_S, vmulhs)
- DO_2OP(VMULH_U, vmulhu)
- DO_2OP(VRMULH_S, vrmulhs)
- DO_2OP(VRMULH_U, vrmulhu)
-+DO_2OP(VMAX_S, vmaxs)
-+DO_2OP(VMAX_U, vmaxu)
-+DO_2OP(VMIN_S, vmins)
-+DO_2OP(VMIN_U, vminu)
---
-.20.1

-[PULL 27/57] target/arm: Implement MVE VABD
+Deleted patch
-Implement the MVE VABD insn.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-17-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    | 7 +++++++
- target/arm/mve.decode      | 3 +++
- target/arm/mve_helper.c    | 5 +++++
- target/arm/translate-mve.c | 2 ++
-files changed, 17 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vminsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vminub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vminuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vminuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_4(mve_vabdsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vabdsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vabdsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vabdub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vabduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vabduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
- VMIN_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
- VMIN_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
-+VABD_S           111 0 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
-+VABD_U           111 1 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
-+
- # Vector miscellaneous
- VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_2OP_S(vmaxs, DO_MAX)
- DO_2OP_U(vmaxu, DO_MAX)
- DO_2OP_S(vmins, DO_MIN)
- DO_2OP_U(vminu, DO_MIN)
-+
-+#define DO_ABD(N, M)  ((N) >= (M) ? (N) - (M) : (M) - (N))
-+
-+DO_2OP_S(vabds, DO_ABD)
-+DO_2OP_U(vabdu, DO_ABD)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ DO_2OP(VMAX_S, vmaxs)
- DO_2OP(VMAX_U, vmaxu)
- DO_2OP(VMIN_S, vmins)
- DO_2OP(VMIN_U, vminu)
-+DO_2OP(VABD_S, vabds)
-+DO_2OP(VABD_U, vabdu)
---
-.20.1

-[PULL 29/57] target/arm: Implement MVE VMULL
+Deleted patch
-Implement the MVE VMULL insn, which multiplies two single
-width integer elements to produce a double width result.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-19-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    | 14 ++++++++++++++
- target/arm/mve.decode      |  5 +++++
- target/arm/mve_helper.c    | 34 ++++++++++++++++++++++++++++++++++
- target/arm/translate-mve.c |  4 ++++
-files changed, 57 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vhsubsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vhsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vhsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_4(mve_vhsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_4(mve_vmullbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmullbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmullbsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmullbub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmullbuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmullbuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+
-+DEF_HELPER_FLAGS_4(mve_vmulltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmulltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmulltsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VHADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
- VHSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
- VHSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
-+VMULL_BS         111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
-+VMULL_BU         111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
-+VMULL_TS         111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
-+VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
-+
- # Vector miscellaneous
- VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
-     DO_2OP(OP##h, 2, int16_t, FN)               \
-     DO_2OP(OP##w, 4, int32_t, FN)
-+/*
-+ * "Long" operations where two half-sized inputs (taken from either the
-+ * top or the bottom of the input vector) produce a double-width result.
-+ * Here ESIZE, TYPE are for the input, and LESIZE, LTYPE for the output.
-+ */
-+#define DO_2OP_L(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)               \
-+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \
-+    {                                                                   \
-+        LTYPE *d = vd;                                                  \
-+        TYPE *n = vn, *m = vm;                                          \
-+        uint16_t mask = mve_element_mask(env);                          \
-+        unsigned le;                                                    \
-+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
-+            LTYPE r = FN((LTYPE)n[H##ESIZE(le * 2 + TOP)],              \
-+                         m[H##ESIZE(le * 2 + TOP)]);                    \
-+            mergemask(&d[H##LESIZE(le)], r, mask);                      \
-+        }                                                               \
-+        mve_advance_vpt(env);                                           \
-+    }
-+
- #define DO_AND(N, M)  ((N) & (M))
- #define DO_BIC(N, M)  ((N) & ~(M))
- #define DO_ORR(N, M)  ((N) | (M))
-@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vadd, DO_ADD)
- DO_2OP_U(vsub, DO_SUB)
- DO_2OP_U(vmul, DO_MUL)
-+DO_2OP_L(vmullbsb, 0, 1, int8_t, 2, int16_t, DO_MUL)
-+DO_2OP_L(vmullbsh, 0, 2, int16_t, 4, int32_t, DO_MUL)
-+DO_2OP_L(vmullbsw, 0, 4, int32_t, 8, int64_t, DO_MUL)
-+DO_2OP_L(vmullbub, 0, 1, uint8_t, 2, uint16_t, DO_MUL)
-+DO_2OP_L(vmullbuh, 0, 2, uint16_t, 4, uint32_t, DO_MUL)
-+DO_2OP_L(vmullbuw, 0, 4, uint32_t, 8, uint64_t, DO_MUL)
-+
-+DO_2OP_L(vmulltsb, 1, 1, int8_t, 2, int16_t, DO_MUL)
-+DO_2OP_L(vmulltsh, 1, 2, int16_t, 4, int32_t, DO_MUL)
-+DO_2OP_L(vmulltsw, 1, 4, int32_t, 8, int64_t, DO_MUL)
-+DO_2OP_L(vmulltub, 1, 1, uint8_t, 2, uint16_t, DO_MUL)
-+DO_2OP_L(vmulltuh, 1, 2, uint16_t, 4, uint32_t, DO_MUL)
-+DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
-+
- /*
-  * Because the computation type is at least twice as large as required,
-  * these work for both signed and unsigned source types.
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ DO_2OP(VHADD_S, vhadds)
- DO_2OP(VHADD_U, vhaddu)
- DO_2OP(VHSUB_S, vhsubs)
- DO_2OP(VHSUB_U, vhsubu)
-+DO_2OP(VMULL_BS, vmullbs)
-+DO_2OP(VMULL_BU, vmullbu)
-+DO_2OP(VMULL_TS, vmullts)
-+DO_2OP(VMULL_TU, vmulltu)
---
-.20.1

-[PULL 31/57] target/arm: Implement MVE VMLSLDAV
+Deleted patch
-Implement the MVE insn VMLSLDAV, which multiplies source elements,
-alternately adding and subtracting them, and accumulates into a
--bit result in a pair of general purpose registers.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-21-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    |  5 +++++
- target/arm/mve.decode      |  2 ++
- target/arm/mve_helper.c    |  5 +++++
- target/arm/translate-mve.c | 11 +++++++++++
-files changed, 23 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmlaldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- DEF_HELPER_FLAGS_4(mve_vmlaldavuh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- DEF_HELPER_FLAGS_4(mve_vmlaldavuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+
-+DEF_HELPER_FLAGS_4(mve_vmlsldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+DEF_HELPER_FLAGS_4(mve_vmlsldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+DEF_HELPER_FLAGS_4(mve_vmlsldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+DEF_HELPER_FLAGS_4(mve_vmlsldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
-                  qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
- VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
- VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
-+
-+VMLSLDAV         1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlaldavxsw, 4, int32_t, true, +=, +=)
- DO_LDAV(vmlaldavuh, 2, uint16_t, false, +=, +=)
- DO_LDAV(vmlaldavuw, 4, uint32_t, false, +=, +=)
-+
-+DO_LDAV(vmlsldavsh, 2, int16_t, false, +=, -=)
-+DO_LDAV(vmlsldavxsh, 2, int16_t, true, +=, -=)
-+DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
-+DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
-     };
-     return do_long_dual_acc(s, a, fns[a->size][a->x]);
- }
-+
-+static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
-+{
-+    static MVEGenDualAccOpFn * const fns[4][2] = {
-+        { NULL, NULL },
-+        { gen_helper_mve_vmlsldavsh, gen_helper_mve_vmlsldavxsh },
-+        { gen_helper_mve_vmlsldavsw, gen_helper_mve_vmlsldavxsw },
-+        { NULL, NULL },
-+    };
-+    return do_long_dual_acc(s, a, fns[a->size][a->x]);
-+}
---
-.20.1

-[PULL 32/57] target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
+Deleted patch
-Implement the MVE VRMLALDAVH and VRMLSLDAVH insns, which accumulate
-the results of a rounded multiply of pairs of elements into a 72-bit
-accumulator, returning the top 64 bits in a pair of general purpose
-registers.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-22-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    |  8 ++++++++
- target/arm/mve.decode      |  7 +++++++
- target/arm/mve_helper.c    | 37 +++++++++++++++++++++++++++++++++++++
- target/arm/translate-mve.c | 24 ++++++++++++++++++++++++
-files changed, 76 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmlsldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- DEF_HELPER_FLAGS_4(mve_vmlsldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- DEF_HELPER_FLAGS_4(mve_vmlsldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- DEF_HELPER_FLAGS_4(mve_vmlsldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+
-+DEF_HELPER_FLAGS_4(mve_vrmlaldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+DEF_HELPER_FLAGS_4(mve_vrmlaldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+
-+DEF_HELPER_FLAGS_4(mve_vrmlaldavhuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+
-+DEF_HELPER_FLAGS_4(mve_vrmlsldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+DEF_HELPER_FLAGS_4(mve_vrmlsldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
- @vmlaldav        .... .... . ... ... . ... . .... .... qm:3 . \
-                  qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
-+@vmlaldav_nosz   .... .... . ... ... . ... . .... .... qm:3 . \
-+                 qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
- VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
- VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
- VMLSLDAV         1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav
-+
-+VRMLALDAVH_S     1110 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
-+VRMLALDAVH_U     1111 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
-+
-+VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_nosz
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@
-  */
- #include "qemu/osdep.h"
-+#include "qemu/int128.h"
- #include "cpu.h"
- #include "internals.h"
- #include "vec_internal.h"
-@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavsh, 2, int16_t, false, +=, -=)
- DO_LDAV(vmlsldavxsh, 2, int16_t, true, +=, -=)
- DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
- DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
-+
-+/*
-+ * Rounding multiply add long dual accumulate high: we must keep
-+ * a 72-bit internal accumulator value and return the top 64 bits.
-+ */
-+#define DO_LDAVH(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC, TO128)         \
-+    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
-+                                    void *vm, uint64_t a)               \
-+    {                                                                   \
-+        uint16_t mask = mve_element_mask(env);                          \
-+        unsigned e;                                                     \
-+        TYPE *n = vn, *m = vm;                                          \
-+        Int128 acc = int128_lshift(TO128(a), 8);                        \
-+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
-+            if (mask & 1) {                                             \
-+                if (e & 1) {                                            \
-+                    acc = ODDACC(acc, TO128(n[H##ESIZE(e - 1 * XCHG)] * \
-+                                            m[H##ESIZE(e)]));           \
-+                } else {                                                \
-+                    acc = EVENACC(acc, TO128(n[H##ESIZE(e + 1 * XCHG)] * \
-+                                             m[H##ESIZE(e)]));          \
-+                }                                                       \
-+                acc = int128_add(acc, 1 << 7);                          \
-+            }                                                           \
-+        }                                                               \
-+        mve_advance_vpt(env);                                           \
-+        return int128_getlo(int128_rshift(acc, 8));                     \
-+    }
-+
-+DO_LDAVH(vrmlaldavhsw, 4, int32_t, false, int128_add, int128_add, int128_makes64)
-+DO_LDAVH(vrmlaldavhxsw, 4, int32_t, true, int128_add, int128_add, int128_makes64)
-+
-+DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64)
-+
-+DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
-+DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
-     };
-     return do_long_dual_acc(s, a, fns[a->size][a->x]);
- }
-+
-+static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
-+{
-+    static MVEGenDualAccOpFn * const fns[] = {
-+        gen_helper_mve_vrmlaldavhsw, gen_helper_mve_vrmlaldavhxsw,
-+    };
-+    return do_long_dual_acc(s, a, fns[a->x]);
-+}
-+
-+static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
-+{
-+    static MVEGenDualAccOpFn * const fns[] = {
-+        gen_helper_mve_vrmlaldavhuw, NULL,
-+    };
-+    return do_long_dual_acc(s, a, fns[a->x]);
-+}
-+
-+static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
-+{
-+    static MVEGenDualAccOpFn * const fns[] = {
-+        gen_helper_mve_vrmlsldavhsw, gen_helper_mve_vrmlsldavhxsw,
-+    };
-+    return do_long_dual_acc(s, a, fns[a->x]);
-+}
---
-.20.1

-[PULL 33/57] target/arm: Implement MVE VADD (scalar)
+[PULL 34/35] hw/arm/mps3r: Add remaining devices
-Implement the scalar form of the MVE VADD insn. This takes the
+Add the remaining devices (or unimplemented-device stubs) for
-scalar operand from a general purpose register.
+this board: SPI controllers, SCC, FPGAIO, I2S, RTC, the
 QSPI write-config block, and ethernet.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20210617121628.20116-23-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-13-peter.maydell@linaro.org
 ---
- target/arm/helper-mve.h    |  4 ++++
+ hw/arm/mps3r.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++
- target/arm/mve.decode      |  7 ++++++
+file changed, 74 insertions(+)
  target/arm/mve_helper.c    | 22 +++++++++++++++++++
  target/arm/translate-mve.c | 45 ++++++++++++++++++++++++++++++++++++++
 files changed, 78 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
+--- a/hw/arm/mps3r.c
-+++ b/target/arm/helper-mve.h
++++ b/hw/arm/mps3r.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+@@ -XXX,XX +XXX,XX @@
- DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "hw/char/cmsdk-apb-uart.h"
- DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ #include "hw/i2c/arm_sbcon_i2c.h"
+ #include "hw/intc/arm_gicv3.h"
-+DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++#include "hw/misc/mps2-scc.h"
-+DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++#include "hw/misc/mps2-fpgaio.h"
-+DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ #include "hw/misc/unimp.h"
 +#include "hw/net/lan9118.h"
 +#include "hw/rtc/pl031.h"
 +#include "hw/ssi/pl022.h"
  #include "hw/timer/cmsdk-apb-dualtimer.h"
  #include "hw/watchdog/cmsdk-apb-watchdog.h"
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
      CMSDKAPBWatchdog watchdog;
      CMSDKAPBDualTimer dualtimer;
      ArmSbconI2CState i2c[5];
 +    PL022State spi[3];
 +    MPS2SCC scc;
 +    MPS2FPGAIO fpgaio;
 +    UnimplementedDeviceState i2s_audio;
 +    PL031State rtc;
      Clock *clk;
  };
@@ -XXX,XX +XXX,XX @@ static const RAMInfo an536_raminfo[] = {
      }
  };
 +static const int an536_oscclk[] = {
 +    24000000, /* 24MHz reference for RTC and timers */
 +    50000000, /* 50MHz ACLK */
 +    50000000, /* 50MHz MCLK */
 +    50000000, /* 50MHz GPUCLK */
 +    24576000, /* 24.576MHz AUDCLK */
 +    23750000, /* 23.75MHz HDLCDCLK */
 +    100000000, /* 100MHz DDR4_REF_CLK */
 +};
 +
- DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+ static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
- DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+                                     const RAMInfo *raminfo)
- DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+ {
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
-index XXXXXXX..XXXXXXX 100644
+     MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
---- a/target/arm/mve.decode
+     MemoryRegion *sysmem = get_system_memory();
-+++ b/target/arm/mve.decode
+     DeviceState *gicdev;
-@@ -XXX,XX +XXX,XX @@
++    QList *oscclk;
- &vldr_vstr rn qd imm p a w size l u
- &1op qd qm size
+     mms->clk = clock_new(OBJECT(machine), "CLK");
- &2op qd qm qn size
+     clock_set_hz(mms->clk, CLK_FRQ);
-+&2scalar qd qn rm size
+@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
+         }
- @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
+     }
- # Note that both Rn and Qd are 3 bits only (no D bit)
-@@ -XXX,XX +XXX,XX @@
++    for (int i = 0; i < ARRAY_SIZE(mms->spi); i++) {
- @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
++        g_autofree char *s = g_strdup_printf("spi%d", i);
- @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
++        hwaddr baseaddr = 0xe0104000 + i * 0x1000;
 +@2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 +
- # Vector loads and stores
++        object_initialize_child(OBJECT(mms), s, &mms->spi[i], TYPE_PL022);
++        sysbus_realize(SYS_BUS_DEVICE(&mms->spi[i]), &error_fatal);
- # Widening loads and narrowing stores:
++        sysbus_mmio_map(SYS_BUS_DEVICE(&mms->spi[i]), 0, baseaddr);
-@@ -XXX,XX +XXX,XX @@ VRMLALDAVH_S     1110 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_no
++        sysbus_connect_irq(SYS_BUS_DEVICE(&mms->spi[i]), 0,
- VRMLALDAVH_U     1111 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
++                           qdev_get_gpio_in(gicdev, 22 + i));
  VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_nosz
 +
 +# Scalar operations
 +
 +VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_S(vhsubs, do_vhsub_s)
  DO_2OP_U(vhsubu, do_vhsub_u)
 +#define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
 +                                uint32_t rm)                            \
 +    {                                                                   \
 +        TYPE *d = vd, *n = vn;                                          \
 +        TYPE m = rm;                                                    \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            mergemask(&d[H##ESIZE(e)], FN(n[H##ESIZE(e)], m), mask);    \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-+/* provide unsigned 2-op scalar helpers for all sizes */
++    object_initialize_child(OBJECT(mms), "scc", &mms->scc, TYPE_MPS2_SCC);
-+#define DO_2OP_SCALAR_U(OP, FN)                 \
++    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-cfg0", 0);
-+    DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
++    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-cfg4", 0x2);
-+    DO_2OP_SCALAR(OP##h, 2, uint16_t, FN)       \
++    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-aid", 0x00200008);
-+    DO_2OP_SCALAR(OP##w, 4, uint32_t, FN)
++    qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-id", 0x41055360);
 +    oscclk = qlist_new();
 +    for (int i = 0; i < ARRAY_SIZE(an536_oscclk); i++) {
 +        qlist_append_int(oscclk, an536_oscclk[i]);
 +    }
 +    qdev_prop_set_array(DEVICE(&mms->scc), "oscclk", oscclk);
 +    sysbus_realize(SYS_BUS_DEVICE(&mms->scc), &error_fatal);
 +    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->scc), 0, 0xe0200000);
 +
-+DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
++    create_unimplemented_device("i2s-audio", 0xe0201000, 0x1000);
 +
- /*
++    object_initialize_child(OBJECT(mms), "fpgaio", &mms->fpgaio,
-  * Multiply add long dual accumulate ops.
++                            TYPE_MPS2_FPGAIO);
-  */
++    qdev_prop_set_uint32(DEVICE(&mms->fpgaio), "prescale-clk", an536_oscclk[1]);
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
++    qdev_prop_set_uint32(DEVICE(&mms->fpgaio), "num-leds", 10);
-index XXXXXXX..XXXXXXX 100644
++    qdev_prop_set_bit(DEVICE(&mms->fpgaio), "has-switches", true);
---- a/target/arm/translate-mve.c
++    qdev_prop_set_bit(DEVICE(&mms->fpgaio), "has-dbgctrl", false);
-+++ b/target/arm/translate-mve.c
++    sysbus_realize(SYS_BUS_DEVICE(&mms->fpgaio), &error_fatal);
-@@ -XXX,XX +XXX,XX @@
++    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->fpgaio), 0, 0xe0202000);
  typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 +typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULL_BU, vmullbu)
  DO_2OP(VMULL_TS, vmullts)
  DO_2OP(VMULL_TU, vmulltu)
 +static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
 +                          MVEGenTwoOpScalarFn fn)
 +{
 +    TCGv_ptr qd, qn;
 +    TCGv_i32 rm;
 +
-+    if (!dc_isar_feature(aa32_mve, s) ||
++    create_unimplemented_device("clcd", 0xe0209000, 0x1000);
 +        !mve_check_qreg_bank(s, a->qd | a->qn) ||
 +        !fn) {
 +        return false;
 +    }
 +    if (a->rm == 13 || a->rm == 15) {
 +        /* UNPREDICTABLE */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    qd = mve_qreg_ptr(a->qd);
++    object_initialize_child(OBJECT(mms), "rtc", &mms->rtc, TYPE_PL031);
-+    qn = mve_qreg_ptr(a->qn);
++    sysbus_realize(SYS_BUS_DEVICE(&mms->rtc), &error_fatal);
-+    rm = load_reg(s, a->rm);
++    sysbus_mmio_map(SYS_BUS_DEVICE(&mms->rtc), 0, 0xe020a000);
-+    fn(cpu_env, qd, qn, rm);
++    sysbus_connect_irq(SYS_BUS_DEVICE(&mms->rtc), 0,
-+    tcg_temp_free_i32(rm);
++                       qdev_get_gpio_in(gicdev, 4));
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_ptr(qn);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
-+#define DO_2OP_SCALAR(INSN, FN) \
++    /*
-+    static bool trans_##INSN(DisasContext *s, arg_2scalar *a)   \
++     * In hardware this is a LAN9220; the LAN9118 is software compatible
-+    {                                                           \
++     * except that it doesn't support the checksum-offload feature.
-+        static MVEGenTwoOpScalarFn * const fns[] = {            \
++     */
-+            gen_helper_mve_##FN##b,                             \
++    lan9118_init(0xe0300000,
-+            gen_helper_mve_##FN##h,                             \
++                 qdev_get_gpio_in(gicdev, 18));
 +            gen_helper_mve_##FN##w,                             \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_2op_scalar(s, a, fns[a->size]);               \
 +    }
 +
-+DO_2OP_SCALAR(VADD_scalar, vadd_scalar)
++    create_unimplemented_device("usb", 0xe0301000, 0x1000);
 +    create_unimplemented_device("qspi-write-config", 0xe0600000, 0x1000);
 +
- static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
+     mms->bootinfo.ram_size = machine->ram_size;
-                              MVEGenDualAccOpFn *fn)
+     mms->bootinfo.board_id = -1;
- {
+     mms->bootinfo.loader_start = mmc->loader_start;
 --
-.20.1
+.34.1

-[PULL 35/57] target/arm: Implement MVE VHADD, VHSUB (scalar)
+Deleted patch
-Implement the scalar variants of the MVE VHADD and VHSUB insns.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-25-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    | 16 ++++++++++++++++
- target/arm/mve.decode      |  4 ++++
- target/arm/mve_helper.c    |  8 ++++++++
- target/arm/translate-mve.c |  4 ++++
-files changed, 32 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmul_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- DEF_HELPER_FLAGS_4(mve_vmul_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- DEF_HELPER_FLAGS_4(mve_vmul_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vhadds_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vhadds_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vhadds_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+
-+DEF_HELPER_FLAGS_4(mve_vhaddu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vhaddu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vhaddu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+
-+DEF_HELPER_FLAGS_4(mve_vhsubs_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vhsubs_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vhsubs_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+
-+DEF_HELPER_FLAGS_4(mve_vhsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vhsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vhsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+
- DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_no
- VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
- VSUB_scalar      1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
- VMUL_scalar      1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
-+VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
-+VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
-+VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
-+VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhsubu, do_vhsub_u)
-     DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
-     DO_2OP_SCALAR(OP##h, 2, uint16_t, FN)       \
-     DO_2OP_SCALAR(OP##w, 4, uint32_t, FN)
-+#define DO_2OP_SCALAR_S(OP, FN)                 \
-+    DO_2OP_SCALAR(OP##b, 1, int8_t, FN)         \
-+    DO_2OP_SCALAR(OP##h, 2, int16_t, FN)        \
-+    DO_2OP_SCALAR(OP##w, 4, int32_t, FN)
- DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
- DO_2OP_SCALAR_U(vsub_scalar, DO_SUB)
- DO_2OP_SCALAR_U(vmul_scalar, DO_MUL)
-+DO_2OP_SCALAR_S(vhadds_scalar, do_vhadd_s)
-+DO_2OP_SCALAR_U(vhaddu_scalar, do_vhadd_u)
-+DO_2OP_SCALAR_S(vhsubs_scalar, do_vhsub_s)
-+DO_2OP_SCALAR_U(vhsubu_scalar, do_vhsub_u)
- /*
-  * Multiply add long dual accumulate ops.
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
- DO_2OP_SCALAR(VADD_scalar, vadd_scalar)
- DO_2OP_SCALAR(VSUB_scalar, vsub_scalar)
- DO_2OP_SCALAR(VMUL_scalar, vmul_scalar)
-+DO_2OP_SCALAR(VHADD_S_scalar, vhadds_scalar)
-+DO_2OP_SCALAR(VHADD_U_scalar, vhaddu_scalar)
-+DO_2OP_SCALAR(VHSUB_S_scalar, vhsubs_scalar)
-+DO_2OP_SCALAR(VHSUB_U_scalar, vhsubu_scalar)
- static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
-                              MVEGenDualAccOpFn *fn)
---
-.20.1

-[PULL 36/57] target/arm: Implement MVE VBRSR
+Deleted patch
-Implement the MVE VBRSR insn, which reverses a specified
-number of bits in each element, setting the rest to zero.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210617121628.20116-26-peter.maydell@linaro.org
----
- target/arm/helper-mve.h    |  4 ++++
- target/arm/mve.decode      |  1 +
- target/arm/mve_helper.c    | 43 ++++++++++++++++++++++++++++++++++++++
- target/arm/translate-mve.c |  1 +
-files changed, 49 insertions(+)
-diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- DEF_HELPER_FLAGS_4(mve_vhsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- DEF_HELPER_FLAGS_4(mve_vhsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+
- DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
- VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
- VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
- VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
-+VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
-diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve_helper.c
-+++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR_U(vhaddu_scalar, do_vhadd_u)
- DO_2OP_SCALAR_S(vhsubs_scalar, do_vhsub_s)
- DO_2OP_SCALAR_U(vhsubu_scalar, do_vhsub_u)
-+static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
-+{
-+    m &= 0xff;
-+    if (m == 0) {
-+        return 0;
-+    }
-+    n = revbit8(n);
-+    if (m < 8) {
-+        n >>= 8 - m;
-+    }
-+    return n;
-+}
-+
-+static inline uint32_t do_vbrsrh(uint32_t n, uint32_t m)
-+{
-+    m &= 0xff;
-+    if (m == 0) {
-+        return 0;
-+    }
-+    n = revbit16(n);
-+    if (m < 16) {
-+        n >>= 16 - m;
-+    }
-+    return n;
-+}
-+
-+static inline uint32_t do_vbrsrw(uint32_t n, uint32_t m)
-+{
-+    m &= 0xff;
-+    if (m == 0) {
-+        return 0;
-+    }
-+    n = revbit32(n);
-+    if (m < 32) {
-+        n >>= 32 - m;
-+    }
-+    return n;
-+}
-+
-+DO_2OP_SCALAR(vbrsrb, 1, uint8_t, do_vbrsrb)
-+DO_2OP_SCALAR(vbrsrh, 2, uint16_t, do_vbrsrh)
-+DO_2OP_SCALAR(vbrsrw, 4, uint32_t, do_vbrsrw)
-+
- /*
-  * Multiply add long dual accumulate ops.
-  */
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VHADD_S_scalar, vhadds_scalar)
- DO_2OP_SCALAR(VHADD_U_scalar, vhaddu_scalar)
- DO_2OP_SCALAR(VHSUB_S_scalar, vhsubs_scalar)
- DO_2OP_SCALAR(VHSUB_U_scalar, vhsubu_scalar)
-+DO_2OP_SCALAR(VBRSR, vbrsr)
- static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
-                              MVEGenDualAccOpFn *fn)
---
-.20.1

-[PULL 37/57] target/arm: Implement MVE VPST
+[PULL 35/35] docs: Add documentation for the mps3-an536 board
-Implement the MVE VPST insn, which sets the predicate mask
+Add documentation for the mps3-an536 board type.
 fields in the VPR to the immediate value encoded in the insn.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20210617121628.20116-27-peter.maydell@linaro.org
+Message-id: 20240206132931.38376-14-peter.maydell@linaro.org
 ---
- target/arm/mve.decode      |  4 +++
+ docs/system/arm/mps2.rst | 37 ++++++++++++++++++++++++++++++++++---
- target/arm/translate-mve.c | 59 ++++++++++++++++++++++++++++++++++++++
+file changed, 34 insertions(+), 3 deletions(-)
 files changed, 63 insertions(+)
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+diff --git a/docs/system/arm/mps2.rst b/docs/system/arm/mps2.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mve.decode
+--- a/docs/system/arm/mps2.rst
-+++ b/target/arm/mve.decode
++++ b/docs/system/arm/mps2.rst
-@@ -XXX,XX +XXX,XX @@ VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
+@@ -XXX,XX +XXX,XX @@
- VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+-Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``, ``mps3-an547``)
- VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+-=========================================================================================================================================================
- VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
++Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``, ``mps3-an536``, ``mps3-an547``)
 +=========================================================================================================================================================================
 -These board models all use Arm M-profile CPUs.
 +These board models use Arm M-profile or R-profile CPUs.
  The Arm MPS2, MPS2+ and MPS3 dev boards are FPGA based (the 2+ has a
  bigger FPGA but is otherwise the same as the 2; the 3 has a bigger
@@ -XXX,XX +XXX,XX @@ FPGA image.
  QEMU models the following FPGA images:
 +FPGA images using M-profile CPUs:
 +
-+# Predicate operations
+ ``mps2-an385``
-+%mask_22_13      22:1 13:3
+   Cortex-M3 as documented in Arm Application Note AN385
-+VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
+ ``mps2-an386``
-diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+@@ -XXX,XX +XXX,XX @@ QEMU models the following FPGA images:
-index XXXXXXX..XXXXXXX 100644
+ ``mps3-an547``
---- a/target/arm/translate-mve.c
+   Cortex-M55 on an MPS3, as documented in Arm Application Note AN547
-+++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
++FPGA images using R-profile CPUs:
      }
  }
 +static void mve_update_and_store_eci(DisasContext *s)
 +{
 +    /*
 +     * For insns which don't call a helper function that will call
 +     * mve_advance_vpt(), this version updates s->eci and also stores
 +     * it out to the CPUState field.
 +     */
 +    if (s->eci) {
 +        mve_update_eci(s);
 +        store_cpu_field(tcg_constant_i32(s->eci << 4), condexec_bits);
 +    }
 +}
 +
- static bool mve_skip_first_beat(DisasContext *s)
++``mps3-an536``
- {
++  Dual Cortex-R52 on an MPS3, as documented in Arm Application Note AN536
      /* Return true if PSR.ECI says we must skip the first beat of this insn */
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
      };
      return do_long_dual_acc(s, a, fns[a->x]);
  }
 +
-+static bool trans_VPST(DisasContext *s, arg_VPST *a)
+ Differences between QEMU and real hardware:
-+{
-+    TCGv_i32 vpr;
+ - AN385/AN386 remapping of low 16K of memory to either ZBT SSRAM1 or to
@@ -XXX,XX +XXX,XX @@ Differences between QEMU and real hardware:
    flash, but only as simple ROM, so attempting to rewrite the flash
    from the guest will fail
  - QEMU does not model the USB controller in MPS3 boards
 +- AN536 does not support runtime control of CPU reset and halt via
 +  the SCC CFG_REG0 register.
 +- AN536 does not support enabling or disabling the flash and ATCM
 +  interfaces via the SCC CFG_REG1 register.
 +- AN536 does not support setting of the initial vector table
 +  base address via the SCC CFG_REG6 and CFG_REG7 register config,
 +  and does not provide a mechanism for specifying these values at
 +  startup, so all guest images must be built to start from TCM
 +  (i.e. to expect the interrupt vector base at 0 from reset).
 +- AN536 defaults to only creating a single CPU; this is the equivalent
 +  of the way the real FPGA image usually runs with the second Cortex-R52
 +  held in halt via the initial SCC CFG_REG0 register setting. You can
 +  create the second CPU with ``-smp 2``; both CPUs will then start
 +  execution immediately on startup.
 +
-+    /* mask == 0 is a "related encoding" */
++Note that for the AN536 the first UART is accessible only by
-+    if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
++CPU0, and the second UART is accessible only by CPU1. The
-+        return false;
++first UART accessible shared between both CPUs is the third
-+    }
++UART. Guest software might therefore be built to use either
-+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
++the first UART or the third UART; if you don't see any output
-+        return true;
++from the UART you are looking at, try one of the others.
-+    }
++(Even if the AN536 machine is started with a single CPU and so
-+    /*
++no "CPU1-only UART", the UART numbering remains the same,
-+     * Set the VPR mask fields. We take advantage of MASK01 and MASK23
++with the third UART being the first of the shared ones.)
-+     * being adjacent fields in the register.
-+     *
+ Machine-specific options
-+     * This insn is not predicated, but it is subject to beat-wise
+ """"""""""""""""""""""""
 +     * execution, and the mask is updated on the odd-numbered beats.
 +     * So if PSR.ECI says we should skip beat 1, we mustn't update the
 +     * 01 mask field.
 +     */
 +    vpr = load_cpu_field(v7m.vpr);
 +    switch (s->eci) {
 +    case ECI_NONE:
 +    case ECI_A0:
 +        /* Update both 01 and 23 fields */
 +        tcg_gen_deposit_i32(vpr, vpr,
 +                            tcg_constant_i32(a->mask | (a->mask << 4)),
 +                            R_V7M_VPR_MASK01_SHIFT,
 +                            R_V7M_VPR_MASK01_LENGTH + R_V7M_VPR_MASK23_LENGTH);
 +        break;
 +    case ECI_A0A1:
 +    case ECI_A0A1A2:
 +    case ECI_A0A1A2B0:
 +        /* Update only the 23 mask field */
 +        tcg_gen_deposit_i32(vpr, vpr,
 +                            tcg_constant_i32(a->mask),
 +                            R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    store_cpu_field(vpr, v7m.vpr);
 +    mve_update_and_store_eci(s);
 +    return true;
 +}
 --
-.20.1
+.34.1

The following changes since commit 53f306f316549d20c76886903181413d20842423:

Merge remote-tracking branch 'remotes/ehabkost-gl/tags/x86-next-pull-request' into staging (2021-06-21 11:26:04 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210621

for you to fetch changes up to a83f1d9263d281f938a3984cda7104d55affd43a:

docs/system: arm: Add nRF boards description (2021-06-21 17:24:33 +0100)

----------------------------------------------------------------
target-arm queue:
 * Don't require 'virt' board to be compiled in for ACPI GHES code
 * docs: Document which architecture extensions we emulate
 * Fix bugs in M-profile FPCXT_NS accesses
 * First slice of MVE patches
 * Implement MTE3
 * docs/system: arm: Add nRF boards description

----------------------------------------------------------------
Alexandre Iooss (1):
      docs/system: arm: Add nRF boards description

Peter Collingbourne (1):
      target/arm: Implement MTE3

Peter Maydell (55):
      hw/acpi: Provide stub version of acpi_ghes_record_errors()
      hw/acpi: Provide function acpi_ghes_present()
      target/arm: Use acpi_ghes_present() to see if we report ACPI memory errors
      docs/system/arm: Document which architecture extensions we emulate
      target/arm/translate-vfp.c: Whitespace fixes
      target/arm: Handle FPU being disabled in FPCXT_NS accesses
      target/arm: Don't NOCP fault for FPCXT_NS accesses
      target/arm: Handle writeback in VLDR/VSTR sysreg with no memory access
      target/arm: Factor FP context update code out into helper function
      target/arm: Split vfp_access_check() into A and M versions
      target/arm: Handle FPU check for FPCXT_NS insns via vfp_access_check_m()
      target/arm: Implement MVE VLDR/VSTR (non-widening forms)
      target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
      target/arm: Implement MVE VCLZ
      target/arm: Implement MVE VCLS
      target/arm: Implement MVE VREV16, VREV32, VREV64
      target/arm: Implement MVE VMVN (register)
      target/arm: Implement MVE VABS
      target/arm: Implement MVE VNEG
      tcg: Make gen_dup_i32/i64() public as tcg_gen_dup_i32/i64
      target/arm: Implement MVE VDUP
      target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
      target/arm: Implement MVE VADD, VSUB, VMUL
      target/arm: Implement MVE VMULH
      target/arm: Implement MVE VRMULH
      target/arm: Implement MVE VMAX, VMIN
      target/arm: Implement MVE VABD
      target/arm: Implement MVE VHADD, VHSUB
      target/arm: Implement MVE VMULL
      target/arm: Implement MVE VMLALDAV
      target/arm: Implement MVE VMLSLDAV
      target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
      target/arm: Implement MVE VADD (scalar)
      target/arm: Implement MVE VSUB, VMUL (scalar)
      target/arm: Implement MVE VHADD, VHSUB (scalar)
      target/arm: Implement MVE VBRSR
      target/arm: Implement MVE VPST
      target/arm: Implement MVE VQADD and VQSUB
      target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
      target/arm: Implement MVE VQDMULL scalar
      target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
      target/arm: Implement MVE VQADD, VQSUB (vector)
      target/arm: Implement MVE VQSHL (vector)
      target/arm: Implement MVE VQRSHL
      target/arm: Implement MVE VSHL insn
      target/arm: Implement MVE VRSHL
      target/arm: Implement MVE VQDMLADH and VQRDMLADH
      target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
      target/arm: Implement MVE VQDMULL (vector)
      target/arm: Implement MVE VRHADD
      target/arm: Implement MVE VADC, VSBC
      target/arm: Implement MVE VCADD
      target/arm: Implement MVE VHCADD
      target/arm: Implement MVE VADDV
      target/arm: Make VMOV scalar <-> gpreg beatwise for MVE

docs/system/arm/emulation.rst |  103 ++++
 docs/system/arm/nrf.rst       |   51 ++
 docs/system/target-arm.rst    |    7 +
 include/hw/acpi/ghes.h        |    9 +
 include/tcg/tcg-op.h          |    8 +
 include/tcg/tcg.h             |    1 -
 target/arm/helper-mve.h       |  357 +++++++++++++
 target/arm/helper.h           |    2 +
 target/arm/internals.h        |   11 +
 target/arm/translate-a32.h    |    3 +
 target/arm/translate.h        |   10 +
 target/arm/m-nocp.decode      |   24 +
 target/arm/mve.decode         |  240 +++++++++
 target/arm/vfp.decode         |   14 -
 hw/acpi/ghes-stub.c           |   22 +
 hw/acpi/ghes.c                |   17 +
 target/arm/cpu64.c            |    2 +-
 target/arm/kvm64.c            |    6 +-
 target/arm/mte_helper.c       |   82 +--
 target/arm/mve_helper.c       | 1160 +++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-m-nocp.c |  550 +++++++++++++++++++
 target/arm/translate-mve.c    |  759 +++++++++++++++++++++++++++
 target/arm/translate-vfp.c    |  741 +++++++-------------------
 tcg/tcg-op-gvec.c             |   20 +-
 MAINTAINERS                   |    1 +
 hw/acpi/meson.build           |    6 +-
 target/arm/meson.build        |    1 +
 27 files changed, 3578 insertions(+), 629 deletions(-)
 create mode 100644 docs/system/arm/emulation.rst
 create mode 100644 docs/system/arm/nrf.rst
 create mode 100644 target/arm/helper-mve.h
 create mode 100644 hw/acpi/ghes-stub.c
 create mode 100644 target/arm/mve_helper.c

Generic code in target/arm wants to call acpi_ghes_record_errors();
provide a stub version so that we don't fail to link when
CONFIG_ACPI_APEI is not set. This requires us to add a new
ghes-stub.c file to contain it and the meson.build mechanics
to use it when appropriate.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
Message-id: 20210603171259.27962-2-peter.maydell@linaro.org
---
 hw/acpi/ghes-stub.c | 17 +++++++++++++++++
 hw/acpi/meson.build |  6 +++---
 2 files changed, 20 insertions(+), 3 deletions(-)
 create mode 100644 hw/acpi/ghes-stub.c

diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/acpi/ghes-stub.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Support for generating APEI tables and recording CPER for Guests:
+ * stub functions.
+ *
+ * Copyright (c) 2021 Linaro, Ltd
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/acpi/ghes.h"
+
+int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+{
+    return -1;
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -XXX,XX +XXX,XX @@ acpi_ss.add(when: 'CONFIG_ACPI_PCI', if_true: files('pci.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_VMGENID', if_true: files('vmgenid.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_HW_REDUCED', if_true: files('generic_event_device.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_HMAT', if_true: files('hmat.c'))
-acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'))
+acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'), if_false: files('ghes-stub.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_X86', if_true: files('core.c', 'piix4.c', 'pcihp.c'), if_false: files('acpi-stub.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_X86_ICH', if_true: files('ich9.c', 'tco.c'))
 acpi_ss.add(when: 'CONFIG_IPMI', if_true: files('ipmi.c'), if_false: files('ipmi-stub.c'))
 acpi_ss.add(when: 'CONFIG_PC', if_false: files('acpi-x86-stub.c'))
 acpi_ss.add(when: 'CONFIG_TPM', if_true: files('tpm.c'))
-softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c'))
+softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c'))
 softmmu_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
 softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('acpi-stub.c', 'aml-build-stub.c',
-                                                  'acpi-x86-stub.c', 'ipmi-stub.c'))
+                                                  'acpi-x86-stub.c', 'ipmi-stub.c', 'ghes-stub.c'))
-- 
2.20.1

Allow code elsewhere in the system to check whether the ACPI GHES
table is present, so it can determine whether it is OK to try to
record an error by calling acpi_ghes_record_errors().

(We don't need to migrate the new 'present' field in AcpiGhesState,
because it is set once at system initialization and doesn't change.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
Message-id: 20210603171259.27962-3-peter.maydell@linaro.org
---
 include/hw/acpi/ghes.h |  9 +++++++++
 hw/acpi/ghes-stub.c    |  5 +++++
 hw/acpi/ghes.c         | 17 +++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -XXX,XX +XXX,XX @@ enum {
 
 typedef struct AcpiGhesState {
     uint64_t ghes_addr_le;
+    bool present; /* True if GHES is present at all on this board */
 } AcpiGhesState;
 
 void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
@@ -XXX,XX +XXX,XX @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
 void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
 int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
+
+/**
+ * acpi_ghes_present: Report whether ACPI GHES table is present
+ *
+ * Returns: true if the system has an ACPI GHES table and it is
+ * safe to call acpi_ghes_record_errors() to record a memory error.
+ */
+bool acpi_ghes_present(void);
 #endif
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
 {
     return -1;
 }
+
+bool acpi_ghes_present(void)
+{
+    return false;
+}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -XXX,XX +XXX,XX @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
     /* Create a read-write fw_cfg file for Address */
     fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
         NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
+
+    ags->present = true;
 }
 
 int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
 
     return ret;
 }
+
+bool acpi_ghes_present(void)
+{
+    AcpiGedState *acpi_ged_state;
+    AcpiGhesState *ags;
+
+    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
+                                                       NULL));
+
+    if (!acpi_ged_state) {
+        return false;
+    }
+    ags = &acpi_ged_state->ghes_state;
+    return ags->present;
+}
-- 
2.20.1

The virt_is_acpi_enabled() function is specific to the virt board, as
is the check for its 'ras' property.  Use the new acpi_ghes_present()
function to check whether we should report memory errors via
acpi_ghes_record_errors().

This avoids a link error if QEMU was built without support for the
virt board, and provides a mechanism that can be used by any future
board models that want to add ACPI memory error reporting support
(they only need to call acpi_ghes_add_fw_cfg()).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
Message-id: 20210603171259.27962-4-peter.maydell@linaro.org
---
 target/arm/kvm64.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
 {
     ram_addr_t ram_addr;
     hwaddr paddr;
-    Object *obj = qdev_get_machine();
-    VirtMachineState *vms = VIRT_MACHINE(obj);
-    bool acpi_enabled = virt_is_acpi_enabled(vms);
 
     assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
 
-    if (acpi_enabled && addr &&
-            object_property_get_bool(obj, "ras", NULL)) {
+    if (acpi_ghes_present() && addr) {
         ram_addr = qemu_ram_addr_from_host(addr);
         if (ram_addr != RAM_ADDR_INVALID &&
             kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
-- 
2.20.1

These days the Arm architecture has a wide range of fine-grained
optional extra architectural features. We implement quite a lot
of these but by no means all of them. Document what we do implement,
so that users can find out without having to dig through back-issues
of our Changelog on the wiki.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210617140328.28622-1-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 docs/system/arm/emulation.rst | 102 ++++++++++++++++++++++++++++++++++
 docs/system/target-arm.rst    |   6 ++
 2 files changed, 108 insertions(+)
 create mode 100644 docs/system/arm/emulation.rst

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@
+A-profile CPU architecture support
+==================================
+
+QEMU's TCG emulation includes support for the Armv5, Armv6, Armv7 and
+Armv8 versions of the A-profile architecture. It also has support for
+the following architecture extensions:
+
+- FEAT_AA32BF16 (AArch32 BFloat16 instructions)
+- FEAT_AA32HPD (AArch32 hierarchical permission disables)
+- FEAT_AA32I8MM (AArch32 Int8 matrix multiplication instructions)
+- FEAT_AES (AESD and AESE instructions)
+- FEAT_BF16 (AArch64 BFloat16 instructions)
+- FEAT_BTI (Branch Target Identification)
+- FEAT_DIT (Data Independent Timing instructions)
+- FEAT_DPB (DC CVAP instruction)
+- FEAT_DotProd (Advanced SIMD dot product instructions)
+- FEAT_FCMA (Floating-point complex number instructions)
+- FEAT_FHM (Floating-point half-precision multiplication instructions)
+- FEAT_FP16 (Half-precision floating-point data processing)
+- FEAT_FRINTTS (Floating-point to integer instructions)
+- FEAT_FlagM (Flag manipulation instructions v2)
+- FEAT_FlagM2 (Enhancements to flag manipulation instructions)
+- FEAT_HPDS (Hierarchical permission disables)
+- FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
+- FEAT_JSCVT (JavaScript conversion instructions)
+- FEAT_LOR (Limited ordering regions)
+- FEAT_LRCPC (Load-acquire RCpc instructions)
+- FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
+- FEAT_LSE (Large System Extensions)
+- FEAT_MTE (Memory Tagging Extension)
+- FEAT_MTE2 (Memory Tagging Extension)
+- FEAT_PAN (Privileged access never)
+- FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN)
+- FEAT_PAuth (Pointer authentication)
+- FEAT_PMULL (PMULL, PMULL2 instructions)
+- FEAT_PMUv3p1 (PMU Extensions v3.1)
+- FEAT_PMUv3p4 (PMU Extensions v3.4)
+- FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
+- FEAT_RNG (Random number generator)
+- FEAT_SB (Speculation Barrier)
+- FEAT_SEL2 (Secure EL2)
+- FEAT_SHA1 (SHA1 instructions)
+- FEAT_SHA256 (SHA256 instructions)
+- FEAT_SHA3 (Advanced SIMD SHA3 instructions)
+- FEAT_SHA512 (Advanced SIMD SHA512 instructions)
+- FEAT_SM3 (Advanced SIMD SM3 instructions)
+- FEAT_SM4 (Advanced SIMD SM4 instructions)
+- FEAT_SPECRES (Speculation restriction instructions)
+- FEAT_SSBS (Speculative Store Bypass Safe)
+- FEAT_TLBIOS (TLB invalidate instructions in Outer Shareable domain)
+- FEAT_TLBIRANGE (TLB invalidate range instructions)
+- FEAT_TTCNP (Translation table Common not private translations)
+- FEAT_TTST (Small translation tables)
+- FEAT_UAO (Unprivileged Access Override control)
+- FEAT_VHE (Virtualization Host Extensions)
+- FEAT_VMID16 (16-bit VMID)
+- FEAT_XNX (Translation table stage 2 Unprivileged Execute-never)
+- SVE (The Scalable Vector Extension)
+- SVE2 (The Scalable Vector Extension v2)
+
+For information on the specifics of these extensions, please refer
+to the `Armv8-A Arm Architecture Reference Manual
+<https://developer.arm.com/documentation/ddi0487/latest>`_.
+
+When a specific named CPU is being emulated, only those features which
+are present in hardware for that CPU are emulated. (If a feature is
+not in the list above then it is not supported, even if the real
+hardware should have it.) The ``max`` CPU enables all features.
+
+R-profile CPU architecture support
+==================================
+
+QEMU's TCG emulation support for R-profile CPUs is currently limited.
+We emulate only the Cortex-R5 and Cortex-R5F CPUs.
+
+M-profile CPU architecture support
+==================================
+
+QEMU's TCG emulation includes support for Armv6-M, Armv7-M, Armv8-M, and
+Armv8.1-M versions of the M-profile architucture.  It also has support
+for the following architecture extensions:
+
+- FP (Floating-point Extension)
+- FPCXT (FPCXT access instructions)
+- HP (Half-precision floating-point instructions)
+- LOB (Low Overhead loops and Branch future)
+- M (Main Extension)
+- MPU (Memory Protection Unit Extension)
+- PXN (Privileged Execute Never)
+- RAS (Reliability, Serviceability and Availability): "minimum RAS Extension" only
+- S (Security Extension)
+- ST (System Timer Extension)
+
+For information on the specifics of these extensions, please refer
+to the `Armv8-M Arm Architecture Reference Manual
+<https://developer.arm.com/documentation/ddi0553/latest>`_.
+
+When a specific named CPU is being emulated, only those features which
+are present in hardware for that CPU are emulated. (If a feature is
+not in the list above then it is not supported, even if the real
+hardware should have it.) There is no equivalent of the ``max`` CPU for
+M-profile.
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -XXX,XX +XXX,XX @@ undocumented; you can get a complete list by running
    arm/virt
    arm/xlnx-versal-virt
 
+Emulated CPU architecture support
+=================================
+
+.. toctree::
+   arm/emulation
+
 Arm CPU features
 ================
 
-- 
2.20.1

In the code for handling VFP system register accesses there is some
stray whitespace after a unary '-' operator, and also some incorrect
indent in a couple of function prototypes.  We're about to move this
code to another file, so fix the code style issues first so
checkpatch doesn't complain about the code-movement patch.

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
 }
 
 static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
-
                                   fp_sysreg_loadfn *loadfn,
-                                 void *opaque)
+                                  void *opaque)
 {
     /* Do a write to an M-profile floating point system register */
     TCGv_i32 tmp;
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
 }
 
 static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
-                                fp_sysreg_storefn *storefn,
-                                void *opaque)
+                                 fp_sysreg_storefn *storefn,
+                                 void *opaque)
 {
     /* Do a read from an M-profile floating point system register */
     TCGv_i32 tmp;
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
     TCGv_i32 addr;
 
     if (!a->a) {
-        offset = - offset;
+        offset = -offset;
     }
 
     addr = load_reg(s, a->rn);
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
     TCGv_i32 value = tcg_temp_new_i32();
 
     if (!a->a) {
-        offset = - offset;
+        offset = -offset;
     }
 
     addr = load_reg(s, a->rn);
-- 
2.20.1

If the guest makes an FPCXT_NS access when the FPU is disabled,
one of two things happens:
 * if there is no active FP context, then the insn behaves the
   same way as if the FPU was enabled: writes ignored, reads
   same value as FPDSCR_NS
 * if there is an active FP context, then we take a NOCP
   exception

Add code to the sysreg read/write functions which emits
code to take the NOCP exception in the latter case.

At the moment this will never be used, because the NOCP checks in
m-nocp.decode happen first, and so the trans functions are never
called when the FPU is disabled.  The code will be needed when we
move the sysreg access insns to before the NOCP patterns in the
following commit.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-3-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
         lab_end = gen_new_label();
         /* fpInactive case: write is a NOP, so branch to end */
         gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
-        /* !fpInactive: PreserveFPState(), and reads same as FPCXT_S */
+        /*
+         * !fpInactive: if FPU disabled, take NOCP exception;
+         * otherwise PreserveFPState(), and then FPCXT_NS writes
+         * behave the same as FPCXT_S writes.
+         */
+        if (s->fp_excp_el) {
+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                               syn_uncategorized(), s->fp_excp_el);
+            /*
+             * This was only a conditional exception, so override
+             * gen_exception_insn()'s default to DISAS_NORETURN
+             */
+            s->base.is_jmp = DISAS_NEXT;
+            break;
+        }
         gen_preserve_fp_state(s);
         /* fall through */
     case ARM_VFP_FPCXT_S:
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         tcg_gen_br(lab_end);
 
         gen_set_label(lab_active);
-        /* !fpInactive: Reads the same as FPCXT_S, but side effects differ */
+        /*
+         * !fpInactive: if FPU disabled, take NOCP exception;
+         * otherwise PreserveFPState(), and then FPCXT_NS
+         * reads the same as FPCXT_S.
+         */
+        if (s->fp_excp_el) {
+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                               syn_uncategorized(), s->fp_excp_el);
+            /*
+             * This was only a conditional exception, so override
+             * gen_exception_insn()'s default to DISAS_NORETURN
+             */
+            s->base.is_jmp = DISAS_NEXT;
+            break;
+        }
         gen_preserve_fp_state(s);
         tmp = tcg_temp_new_i32();
         sfpa = tcg_temp_new_i32();
-- 
2.20.1

The M-profile architecture requires that accesses to FPCXT_NS when
there is no active FP state must not take a NOCP fault even if the
FPU is disabled. We were not implementing this correctly, because
in our decode we catch the NOCP faults early in m-nocp.decode.

Fix this bug by moving all the handling of M-profile FP system
register accesses from vfp.decode into m-nocp.decode and putting
it above the NOCP blocks. This provides the correct behaviour:
 * for accesses other than FPCXT_NS the trans functions call
   vfp_access_check(), which will check for FPU disabled and
   raise a NOCP exception if necessary
 * for FPCXT_NS we have the special case code that doesn't
   call vfp_access_check()
 * when these trans functions want to raise an UNDEF they return
   false, so the decoder will fall through into the NOCP blocks.
   This means that NOCP correctly takes precedence over UNDEF
   for these insns. (This is a difference from the other insns
   handled by m-nocp.decode, where UNDEF takes precedence and
   which we implement by having those trans functions call
   unallocated_encoding() in the appropriate places.)

[Note for backport to stable: this commit has a semantic dependency
on commit 9a486856e9173af, which was not marked as cc-stable because
we didn't know we'd need it for a for-stable bugfix.]

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-4-peter.maydell@linaro.org
---
 target/arm/translate-a32.h    |   1 +
 target/arm/m-nocp.decode      |  24 ++
 target/arm/vfp.decode         |  14 -
 target/arm/translate-m-nocp.c | 514 +++++++++++++++++++++++++++++++++
 target/arm/translate-vfp.c    | 517 +---------------------------------
 5 files changed, 542 insertions(+), 528 deletions(-)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@ bool disas_neon_shared(DisasContext *s, uint32_t insn);
 void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
 void arm_gen_condlabel(DisasContext *s);
 bool vfp_access_check(DisasContext *s);
+void gen_preserve_fp_state(DisasContext *s);
 void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop);
 void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop);
 void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop);
diff --git a/target/arm/m-nocp.decode b/target/arm/m-nocp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m-nocp.decode
+++ b/target/arm/m-nocp.decode
@@ -XXX,XX +XXX,XX @@
 
 &nocp cp
 
+# M-profile VLDR/VSTR to sysreg
+%vldr_sysreg 22:1 13:3
+%imm7_0x4 0:7 !function=times_4
+
+&vldr_sysreg rn reg imm a w p
+@vldr_sysreg .... ... . a:1 . . . rn:4 ... . ... .. ....... \
+             reg=%vldr_sysreg imm=%imm7_0x4 &vldr_sysreg
+
 {
   # Special cases which do not take an early NOCP: VLLDM and VLSTM
   VLLDM_VLSTM  1110 1100 001 l:1 rn:4 0000 1010 op:1 000 0000
@@ -XXX,XX +XXX,XX @@
   VSCCLRM      1110 1100 1.01 1111 .... 1011 imm:7 0   vd=%vd_dp size=3
   VSCCLRM      1110 1100 1.01 1111 .... 1010 imm:8     vd=%vd_sp size=2
 
+  # FP system register accesses: these are a special case because accesses
+  # to FPCXT_NS succeed even if the FPU is disabled. We therefore need
+  # to handle them before the big NOCP blocks. Note that within these
+  # insns NOCP still has higher priority than UNDEFs; this is implemented
+  # by their returning 'false' for UNDEF so as to fall through into the
+  # NOCP check (in contrast to VLLDM etc, which call unallocated_encoding()
+  # for the UNDEFs there that must take precedence over NOCP.)
+
+  VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
+
+  # P=0 W=0 is SEE "Related encodings", so split into two patterns
+  VLDR_sysreg  ---- 110 1 . . w:1 1 .... ... 0 111 11 ....... @vldr_sysreg p=1
+  VLDR_sysreg  ---- 110 0 . . 1   1 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
+  VSTR_sysreg  ---- 110 1 . . w:1 0 .... ... 0 111 11 ....... @vldr_sysreg p=1
+  VSTR_sysreg  ---- 110 0 . . 1   0 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
+
   NOCP         111- 1110 ---- ---- ---- cp:4 ---- ---- &nocp
   NOCP         111- 110- ---- ---- ---- cp:4 ---- ---- &nocp
   # From v8.1M onwards this range will also NOCP:
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR_hp ---- 1101 u:1 .0 l:1 rn:4 .... 1001 imm:8      vd=%vd_sp
 VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8      vd=%vd_sp
 VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8      vd=%vd_dp
 
-# M-profile VLDR/VSTR to sysreg
-%vldr_sysreg 22:1 13:3
-%imm7_0x4 0:7 !function=times_4
-
-&vldr_sysreg rn reg imm a w p
-@vldr_sysreg .... ... . a:1 . . . rn:4 ... . ... .. ....... \
-             reg=%vldr_sysreg imm=%imm7_0x4 &vldr_sysreg
-
-# P=0 W=0 is SEE "Related encodings", so split into two patterns
-VLDR_sysreg  ---- 110 1 . . w:1 1 .... ... 0 111 11 ....... @vldr_sysreg p=1
-VLDR_sysreg  ---- 110 0 . . 1   1 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
-VSTR_sysreg  ---- 110 1 . . w:1 0 .... ... 0 111 11 ....... @vldr_sysreg p=1
-VSTR_sysreg  ---- 110 0 . . 1   0 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
-
 # We split the load/store multiple up into two patterns to avoid
 # overlap with other insns in the "Advanced SIMD load/store and 64-bit move"
 # grouping:
diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-m-nocp.c
+++ b/target/arm/translate-m-nocp.c
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/osdep.h"
 #include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
 #include "translate.h"
 #include "translate-a32.h"
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
     return true;
 }
 
+/*
+ * M-profile provides two different sets of instructions that can
+ * access floating point system registers: VMSR/VMRS (which move
+ * to/from a general purpose register) and VLDR/VSTR sysreg (which
+ * move directly to/from memory). In some cases there are also side
+ * effects which must happen after any write to memory (which could
+ * cause an exception). So we implement the common logic for the
+ * sysreg access in gen_M_fp_sysreg_write() and gen_M_fp_sysreg_read(),
+ * which take pointers to callback functions which will perform the
+ * actual "read/write general purpose register" and "read/write
+ * memory" operations.
+ */
+
+/*
+ * Emit code to store the sysreg to its final destination; frees the
+ * TCG temp 'value' it is passed.
+ */
+typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
+/*
+ * Emit code to load the value to be copied to the sysreg; returns
+ * a new TCG temporary
+ */
+typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
+
+/* Common decode/access checks for fp sysreg read/write */
+typedef enum FPSysRegCheckResult {
+    FPSysRegCheckFailed, /* caller should return false */
+    FPSysRegCheckDone, /* caller should return true */
+    FPSysRegCheckContinue, /* caller should continue generating code */
+} FPSysRegCheckResult;
+
+static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
+{
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
+        return FPSysRegCheckFailed;
+    }
+
+    switch (regno) {
+    case ARM_VFP_FPSCR:
+    case QEMU_VFP_FPSCR_NZCV:
+        break;
+    case ARM_VFP_FPSCR_NZCVQC:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+            return FPSysRegCheckFailed;
+        }
+        break;
+    case ARM_VFP_FPCXT_S:
+    case ARM_VFP_FPCXT_NS:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+            return FPSysRegCheckFailed;
+        }
+        if (!s->v8m_secure) {
+            return FPSysRegCheckFailed;
+        }
+        break;
+    case ARM_VFP_VPR:
+    case ARM_VFP_P0:
+        if (!dc_isar_feature(aa32_mve, s)) {
+            return FPSysRegCheckFailed;
+        }
+        break;
+    default:
+        return FPSysRegCheckFailed;
+    }
+
+    /*
+     * FPCXT_NS is a special case: it has specific handling for
+     * "current FP state is inactive", and must do the PreserveFPState()
+     * but not the usual full set of actions done by ExecuteFPCheck().
+     * So we don't call vfp_access_check() and the callers must handle this.
+     */
+    if (regno != ARM_VFP_FPCXT_NS && !vfp_access_check(s)) {
+        return FPSysRegCheckDone;
+    }
+    return FPSysRegCheckContinue;
+}
+
+static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
+                                  TCGLabel *label)
+{
+    /*
+     * FPCXT_NS is a special case: it has specific handling for
+     * "current FP state is inactive", and must do the PreserveFPState()
+     * but not the usual full set of actions done by ExecuteFPCheck().
+     * We don't have a TB flag that matches the fpInactive check, so we
+     * do it at runtime as we don't expect FPCXT_NS accesses to be frequent.
+     *
+     * Emit code that checks fpInactive and does a conditional
+     * branch to label based on it:
+     *  if cond is TCG_COND_NE then branch if fpInactive != 0 (ie if inactive)
+     *  if cond is TCG_COND_EQ then branch if fpInactive == 0 (ie if active)
+     */
+    assert(cond == TCG_COND_EQ || cond == TCG_COND_NE);
+
+    /* fpInactive = FPCCR_NS.ASPEN == 1 && CONTROL.FPCA == 0 */
+    TCGv_i32 aspen, fpca;
+    aspen = load_cpu_field(v7m.fpccr[M_REG_NS]);
+    fpca = load_cpu_field(v7m.control[M_REG_S]);
+    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
+    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
+    tcg_gen_andi_i32(fpca, fpca, R_V7M_CONTROL_FPCA_MASK);
+    tcg_gen_or_i32(fpca, fpca, aspen);
+    tcg_gen_brcondi_i32(tcg_invert_cond(cond), fpca, 0, label);
+    tcg_temp_free_i32(aspen);
+    tcg_temp_free_i32(fpca);
+}
+
+static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
+                                  fp_sysreg_loadfn *loadfn,
+                                  void *opaque)
+{
+    /* Do a write to an M-profile floating point system register */
+    TCGv_i32 tmp;
+    TCGLabel *lab_end = NULL;
+
+    switch (fp_sysreg_checks(s, regno)) {
+    case FPSysRegCheckFailed:
+        return false;
+    case FPSysRegCheckDone:
+        return true;
+    case FPSysRegCheckContinue:
+        break;
+    }
+
+    switch (regno) {
+    case ARM_VFP_FPSCR:
+        tmp = loadfn(s, opaque);
+        gen_helper_vfp_set_fpscr(cpu_env, tmp);
+        tcg_temp_free_i32(tmp);
+        gen_lookup_tb(s);
+        break;
+    case ARM_VFP_FPSCR_NZCVQC:
+    {
+        TCGv_i32 fpscr;
+        tmp = loadfn(s, opaque);
+        if (dc_isar_feature(aa32_mve, s)) {
+            /* QC is only present for MVE; otherwise RES0 */
+            TCGv_i32 qc = tcg_temp_new_i32();
+            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
+            /*
+             * The 4 vfp.qc[] fields need only be "zero" vs "non-zero";
+             * here writing the same value into all elements is simplest.
+             */
+            tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
+                                 16, 16, qc);
+        }
+        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
+        fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
+        tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
+        tcg_gen_or_i32(fpscr, fpscr, tmp);
+        store_cpu_field(fpscr, vfp.xregs[ARM_VFP_FPSCR]);
+        tcg_temp_free_i32(tmp);
+        break;
+    }
+    case ARM_VFP_FPCXT_NS:
+        lab_end = gen_new_label();
+        /* fpInactive case: write is a NOP, so branch to end */
+        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
+        /*
+         * !fpInactive: if FPU disabled, take NOCP exception;
+         * otherwise PreserveFPState(), and then FPCXT_NS writes
+         * behave the same as FPCXT_S writes.
+         */
+        if (s->fp_excp_el) {
+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                               syn_uncategorized(), s->fp_excp_el);
+            /*
+             * This was only a conditional exception, so override
+             * gen_exception_insn()'s default to DISAS_NORETURN
+             */
+            s->base.is_jmp = DISAS_NEXT;
+            break;
+        }
+        gen_preserve_fp_state(s);
+        /* fall through */
+    case ARM_VFP_FPCXT_S:
+    {
+        TCGv_i32 sfpa, control;
+        /*
+         * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
+         * bits [27:0] from value and zeroes bits [31:28].
+         */
+        tmp = loadfn(s, opaque);
+        sfpa = tcg_temp_new_i32();
+        tcg_gen_shri_i32(sfpa, tmp, 31);
+        control = load_cpu_field(v7m.control[M_REG_S]);
+        tcg_gen_deposit_i32(control, control, sfpa,
+                            R_V7M_CONTROL_SFPA_SHIFT, 1);
+        store_cpu_field(control, v7m.control[M_REG_S]);
+        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
+        gen_helper_vfp_set_fpscr(cpu_env, tmp);
+        tcg_temp_free_i32(tmp);
+        tcg_temp_free_i32(sfpa);
+        break;
+    }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = loadfn(s, opaque);
+        store_cpu_field(tmp, v7m.vpr);
+        break;
+    case ARM_VFP_P0:
+    {
+        TCGv_i32 vpr;
+        tmp = loadfn(s, opaque);
+        vpr = load_cpu_field(v7m.vpr);
+        tcg_gen_deposit_i32(vpr, vpr, tmp,
+                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        store_cpu_field(vpr, v7m.vpr);
+        tcg_temp_free_i32(tmp);
+        break;
+    }
+    default:
+        g_assert_not_reached();
+    }
+    if (lab_end) {
+        gen_set_label(lab_end);
+    }
+    return true;
+}
+
+static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
+                                 fp_sysreg_storefn *storefn,
+                                 void *opaque)
+{
+    /* Do a read from an M-profile floating point system register */
+    TCGv_i32 tmp;
+    TCGLabel *lab_end = NULL;
+    bool lookup_tb = false;
+
+    switch (fp_sysreg_checks(s, regno)) {
+    case FPSysRegCheckFailed:
+        return false;
+    case FPSysRegCheckDone:
+        return true;
+    case FPSysRegCheckContinue:
+        break;
+    }
+
+    if (regno == ARM_VFP_FPSCR_NZCVQC && !dc_isar_feature(aa32_mve, s)) {
+        /* QC is RES0 without MVE, so NZCVQC simplifies to NZCV */
+        regno = QEMU_VFP_FPSCR_NZCV;
+    }
+
+    switch (regno) {
+    case ARM_VFP_FPSCR:
+        tmp = tcg_temp_new_i32();
+        gen_helper_vfp_get_fpscr(tmp, cpu_env);
+        storefn(s, opaque, tmp);
+        break;
+    case ARM_VFP_FPSCR_NZCVQC:
+        tmp = tcg_temp_new_i32();
+        gen_helper_vfp_get_fpscr(tmp, cpu_env);
+        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
+        storefn(s, opaque, tmp);
+        break;
+    case QEMU_VFP_FPSCR_NZCV:
+        /*
+         * Read just NZCV; this is a special case to avoid the
+         * helper call for the "VMRS to CPSR.NZCV" insn.
+         */
+        tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
+        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
+        storefn(s, opaque, tmp);
+        break;
+    case ARM_VFP_FPCXT_S:
+    {
+        TCGv_i32 control, sfpa, fpscr;
+        /* Bits [27:0] from FPSCR, bit [31] from CONTROL.SFPA */
+        tmp = tcg_temp_new_i32();
+        sfpa = tcg_temp_new_i32();
+        gen_helper_vfp_get_fpscr(tmp, cpu_env);
+        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
+        control = load_cpu_field(v7m.control[M_REG_S]);
+        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
+        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
+        tcg_gen_or_i32(tmp, tmp, sfpa);
+        tcg_temp_free_i32(sfpa);
+        /*
+         * Store result before updating FPSCR etc, in case
+         * it is a memory write which causes an exception.
+         */
+        storefn(s, opaque, tmp);
+        /*
+         * Now we must reset FPSCR from FPDSCR_NS, and clear
+         * CONTROL.SFPA; so we'll end the TB here.
+         */
+        tcg_gen_andi_i32(control, control, ~R_V7M_CONTROL_SFPA_MASK);
+        store_cpu_field(control, v7m.control[M_REG_S]);
+        fpscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
+        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
+        tcg_temp_free_i32(fpscr);
+        lookup_tb = true;
+        break;
+    }
+    case ARM_VFP_FPCXT_NS:
+    {
+        TCGv_i32 control, sfpa, fpscr, fpdscr, zero;
+        TCGLabel *lab_active = gen_new_label();
+
+        lookup_tb = true;
+
+        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
+        /* fpInactive case: reads as FPDSCR_NS */
+        TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
+        storefn(s, opaque, tmp);
+        lab_end = gen_new_label();
+        tcg_gen_br(lab_end);
+
+        gen_set_label(lab_active);
+        /*
+         * !fpInactive: if FPU disabled, take NOCP exception;
+         * otherwise PreserveFPState(), and then FPCXT_NS
+         * reads the same as FPCXT_S.
+         */
+        if (s->fp_excp_el) {
+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                               syn_uncategorized(), s->fp_excp_el);
+            /*
+             * This was only a conditional exception, so override
+             * gen_exception_insn()'s default to DISAS_NORETURN
+             */
+            s->base.is_jmp = DISAS_NEXT;
+            break;
+        }
+        gen_preserve_fp_state(s);
+        tmp = tcg_temp_new_i32();
+        sfpa = tcg_temp_new_i32();
+        fpscr = tcg_temp_new_i32();
+        gen_helper_vfp_get_fpscr(fpscr, cpu_env);
+        tcg_gen_andi_i32(tmp, fpscr, ~FPCR_NZCV_MASK);
+        control = load_cpu_field(v7m.control[M_REG_S]);
+        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
+        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
+        tcg_gen_or_i32(tmp, tmp, sfpa);
+        tcg_temp_free_i32(control);
+        /* Store result before updating FPSCR, in case it faults */
+        storefn(s, opaque, tmp);
+        /* If SFPA is zero then set FPSCR from FPDSCR_NS */
+        fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
+        zero = tcg_const_i32(0);
+        tcg_gen_movcond_i32(TCG_COND_EQ, fpscr, sfpa, zero, fpdscr, fpscr);
+        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
+        tcg_temp_free_i32(zero);
+        tcg_temp_free_i32(sfpa);
+        tcg_temp_free_i32(fpdscr);
+        tcg_temp_free_i32(fpscr);
+        break;
+    }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = load_cpu_field(v7m.vpr);
+        storefn(s, opaque, tmp);
+        break;
+    case ARM_VFP_P0:
+        tmp = load_cpu_field(v7m.vpr);
+        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        storefn(s, opaque, tmp);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (lab_end) {
+        gen_set_label(lab_end);
+    }
+    if (lookup_tb) {
+        gen_lookup_tb(s);
+    }
+    return true;
+}
+
+static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
+{
+    arg_VMSR_VMRS *a = opaque;
+
+    if (a->rt == 15) {
+        /* Set the 4 flag bits in the CPSR */
+        gen_set_nzcv(value);
+        tcg_temp_free_i32(value);
+    } else {
+        store_reg(s, a->rt, value);
+    }
+}
+
+static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
+{
+    arg_VMSR_VMRS *a = opaque;
+
+    return load_reg(s, a->rt);
+}
+
+static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
+{
+    /*
+     * Accesses to R15 are UNPREDICTABLE; we choose to undef.
+     * FPSCR -> r15 is a special case which writes to the PSR flags;
+     * set a->reg to a special value to tell gen_M_fp_sysreg_read()
+     * we only care about the top 4 bits of FPSCR there.
+     */
+    if (a->rt == 15) {
+        if (a->l && a->reg == ARM_VFP_FPSCR) {
+            a->reg = QEMU_VFP_FPSCR_NZCV;
+        } else {
+            return false;
+        }
+    }
+
+    if (a->l) {
+        /* VMRS, move FP system register to gp register */
+        return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_gpr, a);
+    } else {
+        /* VMSR, move gp register to FP system register */
+        return gen_M_fp_sysreg_write(s, a->reg, gpr_to_fp_sysreg, a);
+    }
+}
+
+static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
+{
+    arg_vldr_sysreg *a = opaque;
+    uint32_t offset = a->imm;
+    TCGv_i32 addr;
+
+    if (!a->a) {
+        offset = -offset;
+    }
+
+    addr = load_reg(s, a->rn);
+    if (a->p) {
+        tcg_gen_addi_i32(addr, addr, offset);
+    }
+
+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
+        gen_helper_v8m_stackcheck(cpu_env, addr);
+    }
+
+    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
+                    MO_UL | MO_ALIGN | s->be_data);
+    tcg_temp_free_i32(value);
+
+    if (a->w) {
+        /* writeback */
+        if (!a->p) {
+            tcg_gen_addi_i32(addr, addr, offset);
+        }
+        store_reg(s, a->rn, addr);
+    } else {
+        tcg_temp_free_i32(addr);
+    }
+}
+
+static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
+{
+    arg_vldr_sysreg *a = opaque;
+    uint32_t offset = a->imm;
+    TCGv_i32 addr;
+    TCGv_i32 value = tcg_temp_new_i32();
+
+    if (!a->a) {
+        offset = -offset;
+    }
+
+    addr = load_reg(s, a->rn);
+    if (a->p) {
+        tcg_gen_addi_i32(addr, addr, offset);
+    }
+
+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
+        gen_helper_v8m_stackcheck(cpu_env, addr);
+    }
+
+    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
+                    MO_UL | MO_ALIGN | s->be_data);
+
+    if (a->w) {
+        /* writeback */
+        if (!a->p) {
+            tcg_gen_addi_i32(addr, addr, offset);
+        }
+        store_reg(s, a->rn, addr);
+    } else {
+        tcg_temp_free_i32(addr);
+    }
+    return value;
+}
+
+static bool trans_VLDR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        return false;
+    }
+    if (a->rn == 15) {
+        return false;
+    }
+    return gen_M_fp_sysreg_write(s, a->reg, memory_to_fp_sysreg, a);
+}
+
+static bool trans_VSTR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        return false;
+    }
+    if (a->rn == 15) {
+        return false;
+    }
+    return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_memory, a);
+}
+
 static bool trans_NOCP(DisasContext *s, arg_nocp *a)
 {
     /*
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static inline long vfp_f16_offset(unsigned reg, bool top)
  * Generate code for M-profile lazy FP state preservation if needed;
  * this corresponds to the pseudocode PreserveFPState() function.
  */
-static void gen_preserve_fp_state(DisasContext *s)
+void gen_preserve_fp_state(DisasContext *s)
 {
     if (s->v7m_lspact) {
         /*
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
     return true;
 }
 
-/*
- * M-profile provides two different sets of instructions that can
- * access floating point system registers: VMSR/VMRS (which move
- * to/from a general purpose register) and VLDR/VSTR sysreg (which
- * move directly to/from memory). In some cases there are also side
- * effects which must happen after any write to memory (which could
- * cause an exception). So we implement the common logic for the
- * sysreg access in gen_M_fp_sysreg_write() and gen_M_fp_sysreg_read(),
- * which take pointers to callback functions which will perform the
- * actual "read/write general purpose register" and "read/write
- * memory" operations.
- */
-
-/*
- * Emit code to store the sysreg to its final destination; frees the
- * TCG temp 'value' it is passed.
- */
-typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
-/*
- * Emit code to load the value to be copied to the sysreg; returns
- * a new TCG temporary
- */
-typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
-
-/* Common decode/access checks for fp sysreg read/write */
-typedef enum FPSysRegCheckResult {
-    FPSysRegCheckFailed, /* caller should return false */
-    FPSysRegCheckDone, /* caller should return true */
-    FPSysRegCheckContinue, /* caller should continue generating code */
-} FPSysRegCheckResult;
-
-static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
-{
-    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-        return FPSysRegCheckFailed;
-    }
-
-    switch (regno) {
-    case ARM_VFP_FPSCR:
-    case QEMU_VFP_FPSCR_NZCV:
-        break;
-    case ARM_VFP_FPSCR_NZCVQC:
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return FPSysRegCheckFailed;
-        }
-        break;
-    case ARM_VFP_FPCXT_S:
-    case ARM_VFP_FPCXT_NS:
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return FPSysRegCheckFailed;
-        }
-        if (!s->v8m_secure) {
-            return FPSysRegCheckFailed;
-        }
-        break;
-    case ARM_VFP_VPR:
-    case ARM_VFP_P0:
-        if (!dc_isar_feature(aa32_mve, s)) {
-            return FPSysRegCheckFailed;
-        }
-        break;
-    default:
-        return FPSysRegCheckFailed;
-    }
-
-    /*
-     * FPCXT_NS is a special case: it has specific handling for
-     * "current FP state is inactive", and must do the PreserveFPState()
-     * but not the usual full set of actions done by ExecuteFPCheck().
-     * So we don't call vfp_access_check() and the callers must handle this.
-     */
-    if (regno != ARM_VFP_FPCXT_NS && !vfp_access_check(s)) {
-        return FPSysRegCheckDone;
-    }
-    return FPSysRegCheckContinue;
-}
-
-static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
-                                  TCGLabel *label)
-{
-    /*
-     * FPCXT_NS is a special case: it has specific handling for
-     * "current FP state is inactive", and must do the PreserveFPState()
-     * but not the usual full set of actions done by ExecuteFPCheck().
-     * We don't have a TB flag that matches the fpInactive check, so we
-     * do it at runtime as we don't expect FPCXT_NS accesses to be frequent.
-     *
-     * Emit code that checks fpInactive and does a conditional
-     * branch to label based on it:
-     *  if cond is TCG_COND_NE then branch if fpInactive != 0 (ie if inactive)
-     *  if cond is TCG_COND_EQ then branch if fpInactive == 0 (ie if active)
-     */
-    assert(cond == TCG_COND_EQ || cond == TCG_COND_NE);
-
-    /* fpInactive = FPCCR_NS.ASPEN == 1 && CONTROL.FPCA == 0 */
-    TCGv_i32 aspen, fpca;
-    aspen = load_cpu_field(v7m.fpccr[M_REG_NS]);
-    fpca = load_cpu_field(v7m.control[M_REG_S]);
-    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
-    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
-    tcg_gen_andi_i32(fpca, fpca, R_V7M_CONTROL_FPCA_MASK);
-    tcg_gen_or_i32(fpca, fpca, aspen);
-    tcg_gen_brcondi_i32(tcg_invert_cond(cond), fpca, 0, label);
-    tcg_temp_free_i32(aspen);
-    tcg_temp_free_i32(fpca);
-}
-
-static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
-                                  fp_sysreg_loadfn *loadfn,
-                                  void *opaque)
-{
-    /* Do a write to an M-profile floating point system register */
-    TCGv_i32 tmp;
-    TCGLabel *lab_end = NULL;
-
-    switch (fp_sysreg_checks(s, regno)) {
-    case FPSysRegCheckFailed:
-        return false;
-    case FPSysRegCheckDone:
-        return true;
-    case FPSysRegCheckContinue:
-        break;
-    }
-
-    switch (regno) {
-    case ARM_VFP_FPSCR:
-        tmp = loadfn(s, opaque);
-        gen_helper_vfp_set_fpscr(cpu_env, tmp);
-        tcg_temp_free_i32(tmp);
-        gen_lookup_tb(s);
-        break;
-    case ARM_VFP_FPSCR_NZCVQC:
-    {
-        TCGv_i32 fpscr;
-        tmp = loadfn(s, opaque);
-        if (dc_isar_feature(aa32_mve, s)) {
-            /* QC is only present for MVE; otherwise RES0 */
-            TCGv_i32 qc = tcg_temp_new_i32();
-            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
-            /*
-             * The 4 vfp.qc[] fields need only be "zero" vs "non-zero";
-             * here writing the same value into all elements is simplest.
-             */
-            tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
-                                 16, 16, qc);
-        }
-        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
-        fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
-        tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
-        tcg_gen_or_i32(fpscr, fpscr, tmp);
-        store_cpu_field(fpscr, vfp.xregs[ARM_VFP_FPSCR]);
-        tcg_temp_free_i32(tmp);
-        break;
-    }
-    case ARM_VFP_FPCXT_NS:
-        lab_end = gen_new_label();
-        /* fpInactive case: write is a NOP, so branch to end */
-        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
-        /*
-         * !fpInactive: if FPU disabled, take NOCP exception;
-         * otherwise PreserveFPState(), and then FPCXT_NS writes
-         * behave the same as FPCXT_S writes.
-         */
-        if (s->fp_excp_el) {
-            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-                               syn_uncategorized(), s->fp_excp_el);
-            /*
-             * This was only a conditional exception, so override
-             * gen_exception_insn()'s default to DISAS_NORETURN
-             */
-            s->base.is_jmp = DISAS_NEXT;
-            break;
-        }
-        gen_preserve_fp_state(s);
-        /* fall through */
-    case ARM_VFP_FPCXT_S:
-    {
-        TCGv_i32 sfpa, control;
-        /*
-         * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
-         * bits [27:0] from value and zeroes bits [31:28].
-         */
-        tmp = loadfn(s, opaque);
-        sfpa = tcg_temp_new_i32();
-        tcg_gen_shri_i32(sfpa, tmp, 31);
-        control = load_cpu_field(v7m.control[M_REG_S]);
-        tcg_gen_deposit_i32(control, control, sfpa,
-                            R_V7M_CONTROL_SFPA_SHIFT, 1);
-        store_cpu_field(control, v7m.control[M_REG_S]);
-        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
-        gen_helper_vfp_set_fpscr(cpu_env, tmp);
-        tcg_temp_free_i32(tmp);
-        tcg_temp_free_i32(sfpa);
-        break;
-    }
-    case ARM_VFP_VPR:
-        /* Behaves as NOP if not privileged */
-        if (IS_USER(s)) {
-            break;
-        }
-        tmp = loadfn(s, opaque);
-        store_cpu_field(tmp, v7m.vpr);
-        break;
-    case ARM_VFP_P0:
-    {
-        TCGv_i32 vpr;
-        tmp = loadfn(s, opaque);
-        vpr = load_cpu_field(v7m.vpr);
-        tcg_gen_deposit_i32(vpr, vpr, tmp,
-                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
-        store_cpu_field(vpr, v7m.vpr);
-        tcg_temp_free_i32(tmp);
-        break;
-    }
-    default:
-        g_assert_not_reached();
-    }
-    if (lab_end) {
-        gen_set_label(lab_end);
-    }
-    return true;
-}
-
-static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
-                                 fp_sysreg_storefn *storefn,
-                                 void *opaque)
-{
-    /* Do a read from an M-profile floating point system register */
-    TCGv_i32 tmp;
-    TCGLabel *lab_end = NULL;
-    bool lookup_tb = false;
-
-    switch (fp_sysreg_checks(s, regno)) {
-    case FPSysRegCheckFailed:
-        return false;
-    case FPSysRegCheckDone:
-        return true;
-    case FPSysRegCheckContinue:
-        break;
-    }
-
-    if (regno == ARM_VFP_FPSCR_NZCVQC && !dc_isar_feature(aa32_mve, s)) {
-        /* QC is RES0 without MVE, so NZCVQC simplifies to NZCV */
-        regno = QEMU_VFP_FPSCR_NZCV;
-    }
-
-    switch (regno) {
-    case ARM_VFP_FPSCR:
-        tmp = tcg_temp_new_i32();
-        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-        storefn(s, opaque, tmp);
-        break;
-    case ARM_VFP_FPSCR_NZCVQC:
-        tmp = tcg_temp_new_i32();
-        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
-        storefn(s, opaque, tmp);
-        break;
-    case QEMU_VFP_FPSCR_NZCV:
-        /*
-         * Read just NZCV; this is a special case to avoid the
-         * helper call for the "VMRS to CPSR.NZCV" insn.
-         */
-        tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
-        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
-        storefn(s, opaque, tmp);
-        break;
-    case ARM_VFP_FPCXT_S:
-    {
-        TCGv_i32 control, sfpa, fpscr;
-        /* Bits [27:0] from FPSCR, bit [31] from CONTROL.SFPA */
-        tmp = tcg_temp_new_i32();
-        sfpa = tcg_temp_new_i32();
-        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
-        control = load_cpu_field(v7m.control[M_REG_S]);
-        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
-        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
-        tcg_gen_or_i32(tmp, tmp, sfpa);
-        tcg_temp_free_i32(sfpa);
-        /*
-         * Store result before updating FPSCR etc, in case
-         * it is a memory write which causes an exception.
-         */
-        storefn(s, opaque, tmp);
-        /*
-         * Now we must reset FPSCR from FPDSCR_NS, and clear
-         * CONTROL.SFPA; so we'll end the TB here.
-         */
-        tcg_gen_andi_i32(control, control, ~R_V7M_CONTROL_SFPA_MASK);
-        store_cpu_field(control, v7m.control[M_REG_S]);
-        fpscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-        tcg_temp_free_i32(fpscr);
-        lookup_tb = true;
-        break;
-    }
-    case ARM_VFP_FPCXT_NS:
-    {
-        TCGv_i32 control, sfpa, fpscr, fpdscr, zero;
-        TCGLabel *lab_active = gen_new_label();
-
-        lookup_tb = true;
-
-        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
-        /* fpInactive case: reads as FPDSCR_NS */
-        TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-        storefn(s, opaque, tmp);
-        lab_end = gen_new_label();
-        tcg_gen_br(lab_end);
-
-        gen_set_label(lab_active);
-        /*
-         * !fpInactive: if FPU disabled, take NOCP exception;
-         * otherwise PreserveFPState(), and then FPCXT_NS
-         * reads the same as FPCXT_S.
-         */
-        if (s->fp_excp_el) {
-            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-                               syn_uncategorized(), s->fp_excp_el);
-            /*
-             * This was only a conditional exception, so override
-             * gen_exception_insn()'s default to DISAS_NORETURN
-             */
-            s->base.is_jmp = DISAS_NEXT;
-            break;
-        }
-        gen_preserve_fp_state(s);
-        tmp = tcg_temp_new_i32();
-        sfpa = tcg_temp_new_i32();
-        fpscr = tcg_temp_new_i32();
-        gen_helper_vfp_get_fpscr(fpscr, cpu_env);
-        tcg_gen_andi_i32(tmp, fpscr, ~FPCR_NZCV_MASK);
-        control = load_cpu_field(v7m.control[M_REG_S]);
-        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
-        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
-        tcg_gen_or_i32(tmp, tmp, sfpa);
-        tcg_temp_free_i32(control);
-        /* Store result before updating FPSCR, in case it faults */
-        storefn(s, opaque, tmp);
-        /* If SFPA is zero then set FPSCR from FPDSCR_NS */
-        fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-        zero = tcg_const_i32(0);
-        tcg_gen_movcond_i32(TCG_COND_EQ, fpscr, sfpa, zero, fpdscr, fpscr);
-        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-        tcg_temp_free_i32(zero);
-        tcg_temp_free_i32(sfpa);
-        tcg_temp_free_i32(fpdscr);
-        tcg_temp_free_i32(fpscr);
-        break;
-    }
-    case ARM_VFP_VPR:
-        /* Behaves as NOP if not privileged */
-        if (IS_USER(s)) {
-            break;
-        }
-        tmp = load_cpu_field(v7m.vpr);
-        storefn(s, opaque, tmp);
-        break;
-    case ARM_VFP_P0:
-        tmp = load_cpu_field(v7m.vpr);
-        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
-        storefn(s, opaque, tmp);
-        break;
-    default:
-        g_assert_not_reached();
-    }
-
-    if (lab_end) {
-        gen_set_label(lab_end);
-    }
-    if (lookup_tb) {
-        gen_lookup_tb(s);
-    }
-    return true;
-}
-
-static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
-{
-    arg_VMSR_VMRS *a = opaque;
-
-    if (a->rt == 15) {
-        /* Set the 4 flag bits in the CPSR */
-        gen_set_nzcv(value);
-        tcg_temp_free_i32(value);
-    } else {
-        store_reg(s, a->rt, value);
-    }
-}
-
-static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
-{
-    arg_VMSR_VMRS *a = opaque;
-
-    return load_reg(s, a->rt);
-}
-
-static bool gen_M_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
-{
-    /*
-     * Accesses to R15 are UNPREDICTABLE; we choose to undef.
-     * FPSCR -> r15 is a special case which writes to the PSR flags;
-     * set a->reg to a special value to tell gen_M_fp_sysreg_read()
-     * we only care about the top 4 bits of FPSCR there.
-     */
-    if (a->rt == 15) {
-        if (a->l && a->reg == ARM_VFP_FPSCR) {
-            a->reg = QEMU_VFP_FPSCR_NZCV;
-        } else {
-            return false;
-        }
-    }
-
-    if (a->l) {
-        /* VMRS, move FP system register to gp register */
-        return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_gpr, a);
-    } else {
-        /* VMSR, move gp register to FP system register */
-        return gen_M_fp_sysreg_write(s, a->reg, gpr_to_fp_sysreg, a);
-    }
-}
-
 static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
 {
     TCGv_i32 tmp;
     bool ignore_vfp_enabled = false;
 
     if (arm_dc_feature(s, ARM_FEATURE_M)) {
-        return gen_M_VMSR_VMRS(s, a);
+        /* M profile version was already handled in m-nocp.decode */
+        return false;
     }
 
     if (!dc_isar_feature(aa32_fpsp_v2, s)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
     return true;
 }
 
-static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
-{
-    arg_vldr_sysreg *a = opaque;
-    uint32_t offset = a->imm;
-    TCGv_i32 addr;
-
-    if (!a->a) {
-        offset = -offset;
-    }
-
-    addr = load_reg(s, a->rn);
-    if (a->p) {
-        tcg_gen_addi_i32(addr, addr, offset);
-    }
-
-    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
-        gen_helper_v8m_stackcheck(cpu_env, addr);
-    }
-
-    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
-                    MO_UL | MO_ALIGN | s->be_data);
-    tcg_temp_free_i32(value);
-
-    if (a->w) {
-        /* writeback */
-        if (!a->p) {
-            tcg_gen_addi_i32(addr, addr, offset);
-        }
-        store_reg(s, a->rn, addr);
-    } else {
-        tcg_temp_free_i32(addr);
-    }
-}
-
-static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
-{
-    arg_vldr_sysreg *a = opaque;
-    uint32_t offset = a->imm;
-    TCGv_i32 addr;
-    TCGv_i32 value = tcg_temp_new_i32();
-
-    if (!a->a) {
-        offset = -offset;
-    }
-
-    addr = load_reg(s, a->rn);
-    if (a->p) {
-        tcg_gen_addi_i32(addr, addr, offset);
-    }
-
-    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
-        gen_helper_v8m_stackcheck(cpu_env, addr);
-    }
-
-    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
-                    MO_UL | MO_ALIGN | s->be_data);
-
-    if (a->w) {
-        /* writeback */
-        if (!a->p) {
-            tcg_gen_addi_i32(addr, addr, offset);
-        }
-        store_reg(s, a->rn, addr);
-    } else {
-        tcg_temp_free_i32(addr);
-    }
-    return value;
-}
-
-static bool trans_VLDR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
-{
-    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-        return false;
-    }
-    if (a->rn == 15) {
-        return false;
-    }
-    return gen_M_fp_sysreg_write(s, a->reg, memory_to_fp_sysreg, a);
-}
-
-static bool trans_VSTR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
-{
-    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-        return false;
-    }
-    if (a->rn == 15) {
-        return false;
-    }
-    return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_memory, a);
-}
 
 static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
 {
-- 
2.20.1

A few subcases of VLDR/VSTR sysreg succeed but do not perform a
memory access:
 * VSTR of VPR when unprivileged
 * VLDR to VPR when unprivileged
 * VLDR to FPCXT_NS when fpInactive

In these cases, even though we don't do the memory access we should
still update the base register and perform the stack limit check if
the insn's addressing mode specifies writeback.  Our implementation
failed to do this, because we handle these side-effects inside the
memory_to_fp_sysreg() and fp_sysreg_to_memory() callback functions,
which are only called if there's something to load or store.

Fix this by adding an extra argument to the callbacks which is set to
true to actually perform the access and false to only do side effects
like writeback, and calling the callback with do_access = false
for the three cases listed above.

This produces slightly suboptimal code for the case of a write
to FPCXT_NS when the FPU is inactive and the insn didn't have
side effects (ie no writeback, or via VMSR), in which case we'll
generate a conditional branch over an unconditional branch.
But this doesn't seem to be important enough to merit requiring
the callback to report back whether it generated any code or not.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-5-peter.maydell@linaro.org
---
 target/arm/translate-m-nocp.c | 102 ++++++++++++++++++++++++----------
 1 file changed, 72 insertions(+), 30 deletions(-)

diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-m-nocp.c
+++ b/target/arm/translate-m-nocp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
 
 /*
  * Emit code to store the sysreg to its final destination; frees the
- * TCG temp 'value' it is passed.
+ * TCG temp 'value' it is passed. do_access is true to do the store,
+ * and false to skip it and only perform side-effects like base
+ * register writeback.
  */
-typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
+typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value,
+                               bool do_access);
 /*
  * Emit code to load the value to be copied to the sysreg; returns
- * a new TCG temporary
+ * a new TCG temporary. do_access is true to do the store,
+ * and false to skip it and only perform side-effects like base
+ * register writeback.
  */
-typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
+typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque,
+                                  bool do_access);
 
 /* Common decode/access checks for fp sysreg read/write */
 typedef enum FPSysRegCheckResult {
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
 
     switch (regno) {
     case ARM_VFP_FPSCR:
-        tmp = loadfn(s, opaque);
+        tmp = loadfn(s, opaque, true);
         gen_helper_vfp_set_fpscr(cpu_env, tmp);
         tcg_temp_free_i32(tmp);
         gen_lookup_tb(s);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
     case ARM_VFP_FPSCR_NZCVQC:
     {
         TCGv_i32 fpscr;
-        tmp = loadfn(s, opaque);
+        tmp = loadfn(s, opaque, true);
         if (dc_isar_feature(aa32_mve, s)) {
             /* QC is only present for MVE; otherwise RES0 */
             TCGv_i32 qc = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
         break;
     }
     case ARM_VFP_FPCXT_NS:
+    {
+        TCGLabel *lab_active = gen_new_label();
+
         lab_end = gen_new_label();
-        /* fpInactive case: write is a NOP, so branch to end */
-        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
+        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
+        /*
+         * fpInactive case: write is a NOP, so only do side effects
+         * like register writeback before we branch to end
+         */
+        loadfn(s, opaque, false);
+        tcg_gen_br(lab_end);
+
+        gen_set_label(lab_active);
         /*
          * !fpInactive: if FPU disabled, take NOCP exception;
          * otherwise PreserveFPState(), and then FPCXT_NS writes
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
             break;
         }
         gen_preserve_fp_state(s);
-        /* fall through */
+    }
+    /* fall through */
     case ARM_VFP_FPCXT_S:
     {
         TCGv_i32 sfpa, control;
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
          * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
          * bits [27:0] from value and zeroes bits [31:28].
          */
-        tmp = loadfn(s, opaque);
+        tmp = loadfn(s, opaque, true);
         sfpa = tcg_temp_new_i32();
         tcg_gen_shri_i32(sfpa, tmp, 31);
         control = load_cpu_field(v7m.control[M_REG_S]);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
     case ARM_VFP_VPR:
         /* Behaves as NOP if not privileged */
         if (IS_USER(s)) {
+            loadfn(s, opaque, false);
             break;
         }
-        tmp = loadfn(s, opaque);
+        tmp = loadfn(s, opaque, true);
         store_cpu_field(tmp, v7m.vpr);
         break;
     case ARM_VFP_P0:
     {
         TCGv_i32 vpr;
-        tmp = loadfn(s, opaque);
+        tmp = loadfn(s, opaque, true);
         vpr = load_cpu_field(v7m.vpr);
         tcg_gen_deposit_i32(vpr, vpr, tmp,
                             R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
     case ARM_VFP_FPSCR:
         tmp = tcg_temp_new_i32();
         gen_helper_vfp_get_fpscr(tmp, cpu_env);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         break;
     case ARM_VFP_FPSCR_NZCVQC:
         tmp = tcg_temp_new_i32();
         gen_helper_vfp_get_fpscr(tmp, cpu_env);
         tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         break;
     case QEMU_VFP_FPSCR_NZCV:
         /*
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
          */
         tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
         tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         break;
     case ARM_VFP_FPCXT_S:
     {
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
          * Store result before updating FPSCR etc, in case
          * it is a memory write which causes an exception.
          */
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         /*
          * Now we must reset FPSCR from FPDSCR_NS, and clear
          * CONTROL.SFPA; so we'll end the TB here.
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
         /* fpInactive case: reads as FPDSCR_NS */
         TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         lab_end = gen_new_label();
         tcg_gen_br(lab_end);
 
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         tcg_gen_or_i32(tmp, tmp, sfpa);
         tcg_temp_free_i32(control);
         /* Store result before updating FPSCR, in case it faults */
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         /* If SFPA is zero then set FPSCR from FPDSCR_NS */
         fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
         zero = tcg_const_i32(0);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
     case ARM_VFP_VPR:
         /* Behaves as NOP if not privileged */
         if (IS_USER(s)) {
+            storefn(s, opaque, NULL, false);
             break;
         }
         tmp = load_cpu_field(v7m.vpr);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         break;
     case ARM_VFP_P0:
         tmp = load_cpu_field(v7m.vpr);
         tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         break;
     default:
         g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
     return true;
 }
 
-static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
+static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value,
+                             bool do_access)
 {
     arg_VMSR_VMRS *a = opaque;
 
+    if (!do_access) {
+        return;
+    }
+
     if (a->rt == 15) {
         /* Set the 4 flag bits in the CPSR */
         gen_set_nzcv(value);
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
     }
 }
 
-static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
+static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque, bool do_access)
 {
     arg_VMSR_VMRS *a = opaque;
 
+    if (!do_access) {
+        return NULL;
+    }
     return load_reg(s, a->rt);
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
     }
 }
 
-static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
+static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value,
+                                bool do_access)
 {
     arg_vldr_sysreg *a = opaque;
     uint32_t offset = a->imm;
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
         offset = -offset;
     }
 
+    if (!do_access && !a->w) {
+        return;
+    }
+
     addr = load_reg(s, a->rn);
     if (a->p) {
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
         gen_helper_v8m_stackcheck(cpu_env, addr);
     }
 
-    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
-                    MO_UL | MO_ALIGN | s->be_data);
-    tcg_temp_free_i32(value);
+    if (do_access) {
+        gen_aa32_st_i32(s, value, addr, get_mem_index(s),
+                        MO_UL | MO_ALIGN | s->be_data);
+        tcg_temp_free_i32(value);
+    }
 
     if (a->w) {
         /* writeback */
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
     }
 }
 
-static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
+static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque,
+                                    bool do_access)
 {
     arg_vldr_sysreg *a = opaque;
     uint32_t offset = a->imm;
     TCGv_i32 addr;
-    TCGv_i32 value = tcg_temp_new_i32();
+    TCGv_i32 value = NULL;
 
     if (!a->a) {
         offset = -offset;
     }
 
+    if (!do_access && !a->w) {
+        return NULL;
+    }
+
     addr = load_reg(s, a->rn);
     if (a->p) {
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
         gen_helper_v8m_stackcheck(cpu_env, addr);
     }
 
-    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
-                    MO_UL | MO_ALIGN | s->be_data);
+    if (do_access) {
+        value = tcg_temp_new_i32();
+        gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
+                        MO_UL | MO_ALIGN | s->be_data);
+    }
 
     if (a->w) {
         /* writeback */
-- 
2.20.1

Factor the code in full_vfp_access_check() which updates the
ownership of the FP context and creates a new FP context
out into its own function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-6-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 104 +++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 46 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ void gen_preserve_fp_state(DisasContext *s)
     }
 }
 
+/*
+ * Generate code for M-profile FP context handling: update the
+ * ownership of the FP context, and create a new context if
+ * necessary. This corresponds to the parts of the pseudocode
+ * ExecuteFPCheck() after the inital PreserveFPState() call.
+ */
+static void gen_update_fp_context(DisasContext *s)
+{
+    /* Update ownership of FP context: set FPCCR.S to match current state */
+    if (s->v8m_fpccr_s_wrong) {
+        TCGv_i32 tmp;
+
+        tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
+        if (s->v8m_secure) {
+            tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
+        } else {
+            tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
+        }
+        store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
+        /* Don't need to do this for any further FP insns in this TB */
+        s->v8m_fpccr_s_wrong = false;
+    }
+
+    if (s->v7m_new_fp_ctxt_needed) {
+        /*
+         * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA,
+         * the FPSCR, and VPR.
+         */
+        TCGv_i32 control, fpscr;
+        uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
+
+        fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
+        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
+        tcg_temp_free_i32(fpscr);
+        if (dc_isar_feature(aa32_mve, s)) {
+            TCGv_i32 z32 = tcg_const_i32(0);
+            store_cpu_field(z32, v7m.vpr);
+        }
+
+        /*
+         * We don't need to arrange to end the TB, because the only
+         * parts of FPSCR which we cache in the TB flags are the VECLEN
+         * and VECSTRIDE, and those don't exist for M-profile.
+         */
+
+        if (s->v8m_secure) {
+            bits |= R_V7M_CONTROL_SFPA_MASK;
+        }
+        control = load_cpu_field(v7m.control[M_REG_S]);
+        tcg_gen_ori_i32(control, control, bits);
+        store_cpu_field(control, v7m.control[M_REG_S]);
+        /* Don't need to do this for any further FP insns in this TB */
+        s->v7m_new_fp_ctxt_needed = false;
+    }
+}
+
 /*
  * Check that VFP access is enabled. If it is, do the necessary
  * M-profile lazy-FP handling and then return true.
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
         /* Trigger lazy-state preservation if necessary */
         gen_preserve_fp_state(s);
 
-        /* Update ownership of FP context: set FPCCR.S to match current state */
-        if (s->v8m_fpccr_s_wrong) {
-            TCGv_i32 tmp;
-
-            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
-            if (s->v8m_secure) {
-                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
-            } else {
-                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
-            }
-            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
-            /* Don't need to do this for any further FP insns in this TB */
-            s->v8m_fpccr_s_wrong = false;
-        }
-
-        if (s->v7m_new_fp_ctxt_needed) {
-            /*
-             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA,
-             * the FPSCR, and VPR.
-             */
-            TCGv_i32 control, fpscr;
-            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
-
-            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
-            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-            tcg_temp_free_i32(fpscr);
-            if (dc_isar_feature(aa32_mve, s)) {
-                TCGv_i32 z32 = tcg_const_i32(0);
-                store_cpu_field(z32, v7m.vpr);
-            }
-
-            /*
-             * We don't need to arrange to end the TB, because the only
-             * parts of FPSCR which we cache in the TB flags are the VECLEN
-             * and VECSTRIDE, and those don't exist for M-profile.
-             */
-
-            if (s->v8m_secure) {
-                bits |= R_V7M_CONTROL_SFPA_MASK;
-            }
-            control = load_cpu_field(v7m.control[M_REG_S]);
-            tcg_gen_ori_i32(control, control, bits);
-            store_cpu_field(control, v7m.control[M_REG_S]);
-            /* Don't need to do this for any further FP insns in this TB */
-            s->v7m_new_fp_ctxt_needed = false;
-        }
+        /* Update ownership of FP context and create new FP context if needed */
+        gen_update_fp_context(s);
     }
 
     return true;
-- 
2.20.1

vfp_access_check and its helper routine full_vfp_access_check() has
gradually grown and is now an awkward mix of A-profile only and
M-profile only pieces.  Refactor it into an A-profile only and an
M-profile only version, taking advantage of the fact that now the
only direct call to full_vfp_access_check() is in A-profile-only
code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-7-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 79 +++++++++++++++++++++++---------------
 1 file changed, 48 insertions(+), 31 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static void gen_update_fp_context(DisasContext *s)
 }
 
 /*
- * Check that VFP access is enabled. If it is, do the necessary
- * M-profile lazy-FP handling and then return true.
- * If not, emit code to generate an appropriate exception and
- * return false.
+ * Check that VFP access is enabled, A-profile specific version.
+ *
+ * If VFP is enabled, return true. If not, emit code to generate an
+ * appropriate exception and return false.
  * The ignore_vfp_enabled argument specifies that we should ignore
- * whether VFP is enabled via FPEXC[EN]: this should be true for FMXR/FMRX
+ * whether VFP is enabled via FPEXC.EN: this should be true for FMXR/FMRX
  * accesses to FPSID, FPEXC, MVFR0, MVFR1, MVFR2, and false for all other insns.
  */
-static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+static bool vfp_access_check_a(DisasContext *s, bool ignore_vfp_enabled)
 {
     if (s->fp_excp_el) {
-        if (arm_dc_feature(s, ARM_FEATURE_M)) {
-            /*
-             * M-profile mostly catches the "FPU disabled" case early, in
-             * disas_m_nocp(), but a few insns (eg LCTP, WLSTP, DLSTP)
-             * which do coprocessor-checks are outside the large ranges of
-             * the encoding space handled by the patterns in m-nocp.decode,
-             * and for them we may need to raise NOCP here.
-             */
-            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-                               syn_uncategorized(), s->fp_excp_el);
-        } else {
-            gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                               syn_fp_access_trap(1, 0xe, false),
-                               s->fp_excp_el);
-        }
+        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
         unallocated_encoding(s);
         return false;
     }
+    return true;
+}
 
-    if (arm_dc_feature(s, ARM_FEATURE_M)) {
-        /* Handle M-profile lazy FP state mechanics */
-
-        /* Trigger lazy-state preservation if necessary */
-        gen_preserve_fp_state(s);
-
-        /* Update ownership of FP context and create new FP context if needed */
-        gen_update_fp_context(s);
+/*
+ * Check that VFP access is enabled, M-profile specific version.
+ *
+ * If VFP is enabled, do the necessary M-profile lazy-FP handling and then
+ * return true. If not, emit code to generate an appropriate exception and
+ * return false.
+ */
+static bool vfp_access_check_m(DisasContext *s)
+{
+    if (s->fp_excp_el) {
+        /*
+         * M-profile mostly catches the "FPU disabled" case early, in
+         * disas_m_nocp(), but a few insns (eg LCTP, WLSTP, DLSTP)
+         * which do coprocessor-checks are outside the large ranges of
+         * the encoding space handled by the patterns in m-nocp.decode,
+         * and for them we may need to raise NOCP here.
+         */
+        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                           syn_uncategorized(), s->fp_excp_el);
+        return false;
     }
 
+    /* Handle M-profile lazy FP state mechanics */
+
+    /* Trigger lazy-state preservation if necessary */
+    gen_preserve_fp_state(s);
+
+    /* Update ownership of FP context and create new FP context if needed */
+    gen_update_fp_context(s);
+
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
  */
 bool vfp_access_check(DisasContext *s)
 {
-    return full_vfp_access_check(s, false);
+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
+        return vfp_access_check_m(s);
+    } else {
+        return vfp_access_check_a(s, false);
+    }
 }
 
 static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
         return false;
     }
 
-    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
+    /*
+     * Call vfp_access_check_a() directly, because we need to tell
+     * it to ignore FPEXC.EN for some register accesses.
+     */
+    if (!vfp_access_check_a(s, ignore_vfp_enabled)) {
         return true;
     }
 
-- 
2.20.1

Instead of open-coding the "take NOCP exception if FPU disabled,
otherwise call gen_preserve_fp_state()" code in the accessors for
FPCXT_NS, add an argument to vfp_access_check_m() which tells it to
skip the gen_update_fp_context() call, so we can use it for the
FPCXT_NS case.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-8-peter.maydell@linaro.org
---
 target/arm/translate-a32.h    |  2 +-
 target/arm/translate-m-nocp.c | 10 ++--------
 target/arm/translate-vfp.c    | 13 ++++++++-----
 3 files changed, 11 insertions(+), 14 deletions(-)

Implement the forms of the MVE VLDR and VSTR insns which perform
non-widening loads of bytes, halfwords or words from memory into
vector elements of the same width (encodings T5, T6, T7).

(At the moment we know for MVE and M-profile in general that
vfp_access_check() can never return false, but we include the
conventional return-true-on-failure check for consistency
with non-M-profile translation code.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-2-peter.maydell@linaro.org
---
 target/arm/{translate-mve.c => helper-mve.h} |  19 +-
 target/arm/helper.h                          |   2 +
 target/arm/internals.h                       |  11 ++
 target/arm/mve.decode                        |  22 +++
 target/arm/mve_helper.c                      | 172 +++++++++++++++++++
 target/arm/translate-mve.c                   | 119 +++++++++++++
 target/arm/meson.build                       |   1 +
 7 files changed, 334 insertions(+), 12 deletions(-)
 copy target/arm/{translate-mve.c => helper-mve.h} (61%)
 create mode 100644 target/arm/mve_helper.c

diff --git a/target/arm/translate-mve.c b/target/arm/helper-mve.h
similarity index 61%
copy from target/arm/translate-mve.c
copy to target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@
 /*
- *  ARM translation: M-profile MVE instructions
+ *  M-profile MVE specific helper definitions
  *
  *  Copyright (c) 2021 Linaro, Ltd.
  *
@@ -XXX,XX +XXX,XX @@
  * You should have received a copy of the GNU Lesser General Public
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
-
-#include "qemu/osdep.h"
-#include "tcg/tcg-op.h"
-#include "tcg/tcg-op-gvec.h"
-#include "exec/exec-all.h"
-#include "exec/gen-icount.h"
-#include "translate.h"
-#include "translate-a32.h"
-
-/* Include the generated decoder */
-#include "decode-mve.c.inc"
+DEF_HELPER_FLAGS_3(mve_vldrb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vldrh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vldrw, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrw, TCG_CALL_NO_WG, void, env, ptr, i32)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
 #include "helper-a64.h"
 #include "helper-sve.h"
 #endif
+
+#include "helper-mve.h"
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t useronly_maybe_clean_ptr(uint32_t desc, uint64_t ptr)
     return ptr;
 }
 
+/* Values for M-profile PSR.ECI for MVE insns */
+enum MVEECIState {
+    ECI_NONE = 0, /* No completed beats */
+    ECI_A0 = 1, /* Completed: A0 */
+    ECI_A0A1 = 2, /* Completed: A0, A1 */
+    /* 3 is reserved */
+    ECI_A0A1A2 = 4, /* Completed: A0, A1, A2 */
+    ECI_A0A1A2B0 = 5, /* Completed: A0, A1, A2, B0 */
+    /* All other values reserved */
+};
+
 #endif
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 #
 # This file is processed by scripts/decodetree.py
 #
+
+%qd 22:1 13:3
+
+&vldr_vstr rn qd imm p a w size l
+
+@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd
+
+# Vector loads and stores
+
+# Non-widening loads/stores (P=0 W=0 is 'related encoding')
+VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111100 .......   @vldr_vstr \
+                 size=0 p=0 w=1
+VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111101 .......   @vldr_vstr \
+                 size=1 p=0 w=1
+VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111110 .......   @vldr_vstr \
+                 size=2 p=0 w=1
+VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111100 .......   @vldr_vstr \
+                 size=0 p=1
+VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
+                 size=1 p=1
+VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
+                 size=2 p=1
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * M-profile MVE Operations
+ *
+ * Copyright (c) 2021 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "internals.h"
+#include "vec_internal.h"
+#include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
+#include "exec/exec-all.h"
+
+static uint16_t mve_element_mask(CPUARMState *env)
+{
+    /*
+     * Return the mask of which elements in the MVE vector should be
+     * updated. This is a combination of multiple things:
+     *  (1) by default, we update every lane in the vector
+     *  (2) VPT predication stores its state in the VPR register;
+     *  (3) low-overhead-branch tail predication will mask out part
+     *      the vector on the final iteration of the loop
+     *  (4) if EPSR.ECI is set then we must execute only some beats
+     *      of the insn
+     * We combine all these into a 16-bit result with the same semantics
+     * as VPR.P0: 0 to mask the lane, 1 if it is active.
+     * 8-bit vector ops will look at all bits of the result;
+     * 16-bit ops will look at bits 0, 2, 4, ...;
+     * 32-bit ops will look at bits 0, 4, 8 and 12.
+     * Compare pseudocode GetCurInstrBeat(), though that only returns
+     * the 4-bit slice of the mask corresponding to a single beat.
+     */
+    uint16_t mask = FIELD_EX32(env->v7m.vpr, V7M_VPR, P0);
+
+    if (!(env->v7m.vpr & R_V7M_VPR_MASK01_MASK)) {
+        mask |= 0xff;
+    }
+    if (!(env->v7m.vpr & R_V7M_VPR_MASK23_MASK)) {
+        mask |= 0xff00;
+    }
+
+    if (env->v7m.ltpsize < 4 &&
+        env->regs[14] <= (1 << (4 - env->v7m.ltpsize))) {
+        /*
+         * Tail predication active, and this is the last loop iteration.
+         * The element size is (1 << ltpsize), and we only want to process
+         * loopcount elements, so we want to retain the least significant
+         * (loopcount * esize) predicate bits and zero out bits above that.
+         */
+        int masklen = env->regs[14] << env->v7m.ltpsize;
+        assert(masklen <= 16);
+        mask &= MAKE_64BIT_MASK(0, masklen);
+    }
+
+    if ((env->condexec_bits & 0xf) == 0) {
+        /*
+         * ECI bits indicate which beats are already executed;
+         * we handle this by effectively predicating them out.
+         */
+        int eci = env->condexec_bits >> 4;
+        switch (eci) {
+        case ECI_NONE:
+            break;
+        case ECI_A0:
+            mask &= 0xfff0;
+            break;
+        case ECI_A0A1:
+            mask &= 0xff00;
+            break;
+        case ECI_A0A1A2:
+        case ECI_A0A1A2B0:
+            mask &= 0xf000;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    return mask;
+}
+
+static void mve_advance_vpt(CPUARMState *env)
+{
+    /* Advance the VPT and ECI state if necessary */
+    uint32_t vpr = env->v7m.vpr;
+    unsigned mask01, mask23;
+
+    if ((env->condexec_bits & 0xf) == 0) {
+        env->condexec_bits = (env->condexec_bits == (ECI_A0A1A2B0 << 4)) ?
+            (ECI_A0 << 4) : (ECI_NONE << 4);
+    }
+
+    if (!(vpr & (R_V7M_VPR_MASK01_MASK | R_V7M_VPR_MASK23_MASK))) {
+        /* VPT not enabled, nothing to do */
+        return;
+    }
+
+    mask01 = FIELD_EX32(vpr, V7M_VPR, MASK01);
+    mask23 = FIELD_EX32(vpr, V7M_VPR, MASK23);
+    if (mask01 > 8) {
+        /* high bit set, but not 0b1000: invert the relevant half of P0 */
+        vpr ^= 0xff;
+    }
+    if (mask23 > 8) {
+        /* high bit set, but not 0b1000: invert the relevant half of P0 */
+        vpr ^= 0xff00;
+    }
+    vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
+    vpr = FIELD_DP32(vpr, V7M_VPR, MASK23, mask23 << 1);
+    env->v7m.vpr = vpr;
+}
+
+
+#define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE)                         \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
+    {                                                                   \
+        TYPE *d = vd;                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned b, e;                                                  \
+        /*                                                              \
+         * R_SXTM allows the dest reg to become UNKNOWN for abandoned   \
+         * beats so we don't care if we update part of the dest and     \
+         * then take an exception.                                      \
+         */                                                             \
+        for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
+            if (mask & (1 << b)) {                                      \
+                d[H##ESIZE(e)] = cpu_##LDTYPE##_data_ra(env, addr, GETPC()); \
+            }                                                           \
+            addr += MSIZE;                                              \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+#define DO_VSTR(OP, MSIZE, STTYPE, ESIZE, TYPE)                         \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
+    {                                                                   \
+        TYPE *d = vd;                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned b, e;                                                  \
+        for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
+            if (mask & (1 << b)) {                                      \
+                cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
+            }                                                           \
+            addr += MSIZE;                                              \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+DO_VLDR(vldrb, 1, ldub, 1, uint8_t)
+DO_VLDR(vldrh, 2, lduw, 2, uint16_t)
+DO_VLDR(vldrw, 4, ldl, 4, uint32_t)
+
+DO_VSTR(vstrb, 1, stb, 1, uint8_t)
+DO_VSTR(vstrh, 2, stw, 2, uint16_t)
+DO_VSTR(vstrw, 4, stl, 4, uint32_t)
+
+#undef DO_VLDR
+#undef DO_VSTR
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
 
 /* Include the generated decoder */
 #include "decode-mve.c.inc"
+
+typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+
+/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
+static inline long mve_qreg_offset(unsigned reg)
+{
+    return offsetof(CPUARMState, vfp.zregs[reg].d[0]);
+}
+
+static TCGv_ptr mve_qreg_ptr(unsigned reg)
+{
+    TCGv_ptr ret = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(ret, cpu_env, mve_qreg_offset(reg));
+    return ret;
+}
+
+static bool mve_check_qreg_bank(DisasContext *s, int qmask)
+{
+    /*
+     * Check whether Qregs are in range. For v8.1M only Q0..Q7
+     * are supported, see VFPSmallRegisterBank().
+     */
+    return qmask < 8;
+}
+
+static bool mve_eci_check(DisasContext *s)
+{
+    /*
+     * This is a beatwise insn: check that ECI is valid (not a
+     * reserved value) and note that we are handling it.
+     * Return true if OK, false if we generated an exception.
+     */
+    s->eci_handled = true;
+    switch (s->eci) {
+    case ECI_NONE:
+    case ECI_A0:
+    case ECI_A0A1:
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        return true;
+    default:
+        /* Reserved value: INVSTATE UsageFault */
+        gen_exception_insn(s, s->pc_curr, EXCP_INVSTATE, syn_uncategorized(),
+                           default_exception_el(s));
+        return false;
+    }
+}
+
+static void mve_update_eci(DisasContext *s)
+{
+    /*
+     * The helper function will always update the CPUState field,
+     * so we only need to update the DisasContext field.
+     */
+    if (s->eci) {
+        s->eci = (s->eci == ECI_A0A1A2B0) ? ECI_A0 : ECI_NONE;
+    }
+}
+
+static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
+{
+    TCGv_i32 addr;
+    uint32_t offset;
+    TCGv_ptr qreg;
+
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !mve_check_qreg_bank(s, a->qd) ||
+        !fn) {
+        return false;
+    }
+
+    /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+    if (a->rn == 15 || (a->rn == 13 && a->w)) {
+        return false;
+    }
+
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    offset = a->imm << a->size;
+    if (!a->a) {
+        offset = -offset;
+    }
+    addr = load_reg(s, a->rn);
+    if (a->p) {
+        tcg_gen_addi_i32(addr, addr, offset);
+    }
+
+    qreg = mve_qreg_ptr(a->qd);
+    fn(cpu_env, qreg, addr);
+    tcg_temp_free_ptr(qreg);
+
+    /*
+     * Writeback always happens after the last beat of the insn,
+     * regardless of predication
+     */
+    if (a->w) {
+        if (!a->p) {
+            tcg_gen_addi_i32(addr, addr, offset);
+        }
+        store_reg(s, a->rn, addr);
+    } else {
+        tcg_temp_free_i32(addr);
+    }
+    mve_update_eci(s);
+    return true;
+}
+
+static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
+{
+    static MVEGenLdStFn * const ldstfns[4][2] = {
+        { gen_helper_mve_vstrb, gen_helper_mve_vldrb },
+        { gen_helper_mve_vstrh, gen_helper_mve_vldrh },
+        { gen_helper_mve_vstrw, gen_helper_mve_vldrw },
+        { NULL, NULL }
+    };
+    return do_ldst(s, a, ldstfns[a->size][a->l]);
+}
diff --git a/target/arm/meson.build b/target/arm/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
   'helper.c',
   'iwmmxt_helper.c',
   'm_helper.c',
+  'mve_helper.c',
   'neon_helper.c',
   'op_helper.c',
   'tlb_helper.c',
-- 
2.20.1

Implement the variants of MVE VLDR (encodings T1, T2) which perform
"widening" loads where bytes or halfwords are loaded from memory and
zero or sign-extended into halfword or word length vector elements,
and the narrowing MVE VSTR (encodings T1, T2) where bytes or
halfwords are stored from halfword or word elements.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-3-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 10 ++++++++++
 target/arm/mve.decode      | 25 +++++++++++++++++++++++--
 target/arm/mve_helper.c    | 11 +++++++++++
 target/arm/translate-mve.c | 14 ++++++++++++++
 4 files changed, 58 insertions(+), 2 deletions(-)

Implement the MVE VCLZ insn (and the necessary machinery
for MVE 1-input vector ops).

Note that for non-load instructions predication is always performed
at a byte level granularity regardless of element size (R_ZLSJ),
and so the masking logic here differs from that used in the VLDR
and VSTR helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-4-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  4 ++
 target/arm/mve.decode      |  8 ++++
 target/arm/mve_helper.c    | 82 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 38 ++++++++++++++++++
 4 files changed, 132 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vldrh_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 #
 
 %qd 22:1 13:3
+%qm 5:1 1:3
 
 &vldr_vstr rn qd imm p a w size l u
+&1op qd qm size
 
 @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
 @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
 
+@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
+
 # Vector loads and stores
 
 # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
                  size=1 p=1
 VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
                  size=2 p=1
+
+# Vector miscellaneous
+
+VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
 
 #undef DO_VLDR
 #undef DO_VSTR
+
+/*
+ * The mergemask(D, R, M) macro performs the operation "*D = R" but
+ * storing only the bytes which correspond to 1 bits in M,
+ * leaving other bytes in *D unchanged. We use _Generic
+ * to select the correct implementation based on the type of D.
+ */
+
+static void mergemask_ub(uint8_t *d, uint8_t r, uint16_t mask)
+{
+    if (mask & 1) {
+        *d = r;
+    }
+}
+
+static void mergemask_sb(int8_t *d, int8_t r, uint16_t mask)
+{
+    mergemask_ub((uint8_t *)d, r, mask);
+}
+
+static void mergemask_uh(uint16_t *d, uint16_t r, uint16_t mask)
+{
+    uint16_t bmask = expand_pred_b_data[mask & 3];
+    *d = (*d & ~bmask) | (r & bmask);
+}
+
+static void mergemask_sh(int16_t *d, int16_t r, uint16_t mask)
+{
+    mergemask_uh((uint16_t *)d, r, mask);
+}
+
+static void mergemask_uw(uint32_t *d, uint32_t r, uint16_t mask)
+{
+    uint32_t bmask = expand_pred_b_data[mask & 0xf];
+    *d = (*d & ~bmask) | (r & bmask);
+}
+
+static void mergemask_sw(int32_t *d, int32_t r, uint16_t mask)
+{
+    mergemask_uw((uint32_t *)d, r, mask);
+}
+
+static void mergemask_uq(uint64_t *d, uint64_t r, uint16_t mask)
+{
+    uint64_t bmask = expand_pred_b_data[mask & 0xff];
+    *d = (*d & ~bmask) | (r & bmask);
+}
+
+static void mergemask_sq(int64_t *d, int64_t r, uint16_t mask)
+{
+    mergemask_uq((uint64_t *)d, r, mask);
+}
+
+#define mergemask(D, R, M)                      \
+    _Generic(D,                                 \
+             uint8_t *: mergemask_ub,           \
+             int8_t *:  mergemask_sb,           \
+             uint16_t *: mergemask_uh,          \
+             int16_t *:  mergemask_sh,          \
+             uint32_t *: mergemask_uw,          \
+             int32_t *:  mergemask_sw,          \
+             uint64_t *: mergemask_uq,          \
+             int64_t *:  mergemask_sq)(D, R, M)
+
+#define DO_1OP(OP, ESIZE, TYPE, FN)                                     \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
+    {                                                                   \
+        TYPE *d = vd, *m = vm;                                          \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            mergemask(&d[H##ESIZE(e)], FN(m[H##ESIZE(e)]), mask);       \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+#define DO_CLZ_B(N)   (clz32(N) - 24)
+#define DO_CLZ_H(N)   (clz32(N) - 16)
+
+DO_1OP(vclzb, 1, uint8_t, DO_CLZ_B)
+DO_1OP(vclzh, 2, uint16_t, DO_CLZ_H)
+DO_1OP(vclzw, 4, uint32_t, clz32)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
 #include "decode-mve.c.inc"
 
 typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 
 /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
 static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
 DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
 DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
 DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
+
+static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
+{
+    TCGv_ptr qd, qm;
+
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !mve_check_qreg_bank(s, a->qd | a->qm) ||
+        !fn) {
+        return false;
+    }
+
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    qd = mve_qreg_ptr(a->qd);
+    qm = mve_qreg_ptr(a->qm);
+    fn(cpu_env, qd, qm);
+    tcg_temp_free_ptr(qd);
+    tcg_temp_free_ptr(qm);
+    mve_update_eci(s);
+    return true;
+}
+
+#define DO_1OP(INSN, FN)                                        \
+    static bool trans_##INSN(DisasContext *s, arg_1op *a)       \
+    {                                                           \
+        static MVEGenOneOpFn * const fns[] = {                  \
+            gen_helper_mve_##FN##b,                             \
+            gen_helper_mve_##FN##h,                             \
+            gen_helper_mve_##FN##w,                             \
+            NULL,                                               \
+        };                                                      \
+        return do_1op(s, a, fns[a->size]);                      \
+    }
+
+DO_1OP(VCLZ, vclz)
-- 
2.20.1

Implement the MVE VCLS insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-5-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 4 ++++
 target/arm/mve.decode      | 1 +
 target/arm/mve_helper.c    | 7 +++++++
 target/arm/translate-mve.c | 1 +
 4 files changed, 13 insertions(+)

Implement the MVE instructions VREV16, VREV32 and VREV64.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-6-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  4 ++++
 target/arm/mve_helper.c    |  7 +++++++
 target/arm/translate-mve.c | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 51 insertions(+)

Implement the MVE VMVN(register) operation.  Note that for
predication this operation is byte-by-byte.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-7-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 2 ++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 4 ++++
 target/arm/translate-mve.c | 5 +++++
 4 files changed, 14 insertions(+)

Implement the MVE VABS functions (both integer and floating point).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-8-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  6 ++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 13 +++++++++++++
 target/arm/translate-mve.c | 15 +++++++++++++++
 4 files changed, 37 insertions(+)

Implement the MVE VNEG insn (both integer and floating point forms).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-9-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  6 ++++++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    | 12 ++++++++++++
 target/arm/translate-mve.c | 15 +++++++++++++++
 4 files changed, 35 insertions(+)

The Arm MVE VDUP implementation would like to be able to emit code to
duplicate a byte or halfword value into an i32.  We have code to do
this already in tcg-op-gvec.c, so all we need to do is make the
functions global.

For consistency with other functions made available to the frontends:
 * we rename to tcg_gen_dup_*
 * we expose both the _i32 and _i64 forms
 * we provide the #define for a _tl form

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210617121628.20116-10-peter.maydell@linaro.org
---
 include/tcg/tcg-op.h |  8 ++++++++
 include/tcg/tcg.h    |  1 -
 tcg/tcg-op-gvec.c    | 20 ++++++++++----------
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -XXX,XX +XXX,XX @@ void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_umax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_abs_i32(TCGv_i32, TCGv_i32);
 
+/* Replicate a value of size @vece from @in to all the lanes in @out */
+void tcg_gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in);
+
 static inline void tcg_gen_discard_i32(TCGv_i32 arg)
 {
     tcg_gen_op1_i32(INDEX_op_discard, arg);
@@ -XXX,XX +XXX,XX @@ void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_umax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_abs_i64(TCGv_i64, TCGv_i64);
 
+/* Replicate a value of size @vece from @in to all the lanes in @out */
+void tcg_gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in);
+
 #if TCG_TARGET_REG_BITS == 64
 static inline void tcg_gen_discard_i64(TCGv_i64 arg)
 {
@@ -XXX,XX +XXX,XX @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
 #define tcg_gen_atomic_smax_fetch_tl tcg_gen_atomic_smax_fetch_i64
 #define tcg_gen_atomic_umax_fetch_tl tcg_gen_atomic_umax_fetch_i64
 #define tcg_gen_dup_tl_vec  tcg_gen_dup_i64_vec
+#define tcg_gen_dup_tl tcg_gen_dup_i64
 #else
 #define tcg_gen_movi_tl tcg_gen_movi_i32
 #define tcg_gen_mov_tl tcg_gen_mov_i32
@@ -XXX,XX +XXX,XX @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
 #define tcg_gen_atomic_smax_fetch_tl tcg_gen_atomic_smax_fetch_i32
 #define tcg_gen_atomic_umax_fetch_tl tcg_gen_atomic_umax_fetch_i32
 #define tcg_gen_dup_tl_vec  tcg_gen_dup_i32_vec
+#define tcg_gen_dup_tl tcg_gen_dup_i32
 #endif
 
 #if UINTPTR_MAX == UINT32_MAX
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -XXX,XX +XXX,XX @@ uint64_t dup_const(unsigned vece, uint64_t c);
         : (qemu_build_not_reached_always(), 0))                    \
      : dup_const(VECE, C))
 
-
 /*
  * Memory helpers that will be used by TCG generated code.
  */
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -XXX,XX +XXX,XX @@ uint64_t (dup_const)(unsigned vece, uint64_t c)
 }
 
 /* Duplicate IN into OUT as per VECE.  */
-static void gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
+void tcg_gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
 {
     switch (vece) {
     case MO_8:
@@ -XXX,XX +XXX,XX @@ static void gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
     }
 }
 
-static void gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in)
+void tcg_gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in)
 {
     switch (vece) {
     case MO_8:
@@ -XXX,XX +XXX,XX @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz,
                 && (vece != MO_32 || !check_size_impl(oprsz, 4))) {
                 t_64 = tcg_temp_new_i64();
                 tcg_gen_extu_i32_i64(t_64, in_32);
-                gen_dup_i64(vece, t_64, t_64);
+                tcg_gen_dup_i64(vece, t_64, t_64);
             } else {
                 t_32 = tcg_temp_new_i32();
-                gen_dup_i32(vece, t_32, in_32);
+                tcg_gen_dup_i32(vece, t_32, in_32);
             }
         } else if (in_64) {
             /* We are given a 64-bit variable input.  */
             t_64 = tcg_temp_new_i64();
-            gen_dup_i64(vece, t_64, in_64);
+            tcg_gen_dup_i64(vece, t_64, in_64);
         } else {
             /* We are given a constant input.  */
             /* For 64-bit hosts, use 64-bit constants for "simple" constants
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_2s(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
     } else if (g->fni8 && check_size_impl(oprsz, 8)) {
         TCGv_i64 t64 = tcg_temp_new_i64();
 
-        gen_dup_i64(g->vece, t64, c);
+        tcg_gen_dup_i64(g->vece, t64, c);
         expand_2s_i64(dofs, aofs, oprsz, t64, g->scalar_first, g->fni8);
         tcg_temp_free_i64(t64);
     } else if (g->fni4 && check_size_impl(oprsz, 4)) {
         TCGv_i32 t32 = tcg_temp_new_i32();
 
         tcg_gen_extrl_i64_i32(t32, c);
-        gen_dup_i32(g->vece, t32, t32);
+        tcg_gen_dup_i32(g->vece, t32, t32);
         expand_2s_i32(dofs, aofs, oprsz, t32, g->scalar_first, g->fni4);
         tcg_temp_free_i32(t32);
     } else {
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_ands(unsigned vece, uint32_t dofs, uint32_t aofs,
                        TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
 {
     TCGv_i64 tmp = tcg_temp_new_i64();
-    gen_dup_i64(vece, tmp, c);
+    tcg_gen_dup_i64(vece, tmp, c);
     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_ands);
     tcg_temp_free_i64(tmp);
 }
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_xors(unsigned vece, uint32_t dofs, uint32_t aofs,
                        TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
 {
     TCGv_i64 tmp = tcg_temp_new_i64();
-    gen_dup_i64(vece, tmp, c);
+    tcg_gen_dup_i64(vece, tmp, c);
     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_xors);
     tcg_temp_free_i64(tmp);
 }
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_ors(unsigned vece, uint32_t dofs, uint32_t aofs,
                       TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
 {
     TCGv_i64 tmp = tcg_temp_new_i64();
-    gen_dup_i64(vece, tmp, c);
+    tcg_gen_dup_i64(vece, tmp, c);
     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_ors);
     tcg_temp_free_i64(tmp);
 }
-- 
2.20.1

Implement the MVE VDUP insn, which duplicates a value from
a general-purpose register into every lane of a vector
register (subject to predication).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-11-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  2 ++
 target/arm/mve.decode      | 10 ++++++++++
 target/arm/mve_helper.c    | 16 ++++++++++++++++
 target/arm/translate-mve.c | 27 +++++++++++++++++++++++++++
 4 files changed, 55 insertions(+)

Implement the MVE vector logical operations operating
on two registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-12-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  6 ++++++
 target/arm/mve.decode      |  9 +++++++++
 target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
 target/arm/translate-mve.c | 37 +++++++++++++++++++++++++++++++++++++
 4 files changed, 78 insertions(+)

Implement the MVE VADD, VSUB and VMUL insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-13-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 12 ++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 14 ++++++++++++++
 target/arm/translate-mve.c | 16 ++++++++++++++++
 4 files changed, 47 insertions(+)

Implement the MVE VMULH insn, which performs a vector
multiply and returns the high half of the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-14-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 38 insertions(+)

Implement the MVE VRMULH insn, which performs a rounding multiply
and then returns the high half.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-15-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 22 ++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 34 insertions(+)

Implement the MVE VMAX and VMIN insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-16-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 14 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 14 ++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 37 insertions(+)

Implement the MVE VABD insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-17-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 7 +++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 5 +++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 17 insertions(+)

Implement MVE VHADD and VHSUB insns, which perform an addition
or subtraction and then halve the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-18-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 14 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 48 insertions(+)

Implement the MVE VMULL insn, which multiplies two single
width integer elements to produce a double width result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-19-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 14 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 34 ++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 57 insertions(+)

Implement the MVE VMLALDAV insn, which multiplies pairs of integer
elements, accumulating them into a 64-bit result in a pair of
general-purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-20-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++
 target/arm/translate.h     | 10 ++++
 target/arm/mve.decode      | 15 ++++++
 target/arm/mve_helper.c    | 34 ++++++++++++++
 target/arm/translate-mve.c | 96 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 163 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+
+DEF_HELPER_FLAGS_4(mve_vmlaldavuh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline int negate(DisasContext *s, int x)
     return -x;
 }
 
+static inline int plus_1(DisasContext *s, int x)
+{
+    return x + 1;
+}
+
 static inline int plus_2(DisasContext *s, int x)
 {
     return x + 2;
@@ -XXX,XX +XXX,XX @@ static inline int times_4(DisasContext *s, int x)
     return x * 4;
 }
 
+static inline int times_2_plus_1(DisasContext *s, int x)
+{
+    return x * 2 + 1;
+}
+
 static inline int arm_dc_feature(DisasContext *dc, int feature)
 {
     return (dc->features & (1ULL << feature)) != 0;
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
 VDUP             1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
 VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
 VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
+
+# multiply-add long dual accumulate
+# rdahi: bits [3:1] from insn, bit 0 is 1
+# rdalo: bits [3:1] from insn, bit 0 is 0
+%rdahi 20:3 !function=times_2_plus_1
+%rdalo 13:3 !function=times_2
+# size bit is 0 for 16 bit, 1 for 32 bit
+%size_16 16:1 !function=plus_1
+
+&vmlaldav rdahi rdalo size qn qm x a
+
+@vmlaldav        .... .... . ... ... . ... . .... .... qm:3 . \
+                 qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
+VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
+VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_S(vhadds, do_vhadd_s)
 DO_2OP_U(vhaddu, do_vhadd_u)
 DO_2OP_S(vhsubs, do_vhsub_s)
 DO_2OP_U(vhsubu, do_vhsub_u)
+
+
+/*
+ * Multiply add long dual accumulate ops.
+ */
+#define DO_LDAV(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC)                 \
+    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
+                                    void *vm, uint64_t a)               \
+    {                                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        TYPE *n = vn, *m = vm;                                          \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            if (mask & 1) {                                             \
+                if (e & 1) {                                            \
+                    a ODDACC                                            \
+                        (int64_t)n[H##ESIZE(e - 1 * XCHG)] * m[H##ESIZE(e)]; \
+                } else {                                                \
+                    a EVENACC                                           \
+                        (int64_t)n[H##ESIZE(e + 1 * XCHG)] * m[H##ESIZE(e)]; \
+                }                                                       \
+            }                                                           \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+        return a;                                                       \
+    }
+
+DO_LDAV(vmlaldavsh, 2, int16_t, false, +=, +=)
+DO_LDAV(vmlaldavxsh, 2, int16_t, true, +=, +=)
+DO_LDAV(vmlaldavsw, 4, int32_t, false, +=, +=)
+DO_LDAV(vmlaldavxsw, 4, int32_t, true, +=, +=)
+
+DO_LDAV(vmlaldavuh, 2, uint16_t, false, +=, +=)
+DO_LDAV(vmlaldavuw, 4, uint32_t, false, +=, +=)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
 typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
+typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
 
 /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
 static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
     }
 }
 
+static bool mve_skip_first_beat(DisasContext *s)
+{
+    /* Return true if PSR.ECI says we must skip the first beat of this insn */
+    switch (s->eci) {
+    case ECI_NONE:
+        return false;
+    case ECI_A0:
+    case ECI_A0A1:
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        return true;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
 {
     TCGv_i32 addr;
@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULL_BS, vmullbs)
 DO_2OP(VMULL_BU, vmullbu)
 DO_2OP(VMULL_TS, vmullts)
 DO_2OP(VMULL_TU, vmulltu)
+
+static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
+                             MVEGenDualAccOpFn *fn)
+{
+    TCGv_ptr qn, qm;
+    TCGv_i64 rda;
+    TCGv_i32 rdalo, rdahi;
+
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !mve_check_qreg_bank(s, a->qn | a->qm) ||
+        !fn) {
+        return false;
+    }
+    /*
+     * rdahi == 13 is UNPREDICTABLE; rdahi == 15 is a related
+     * encoding; rdalo always has bit 0 clear so cannot be 13 or 15.
+     */
+    if (a->rdahi == 13 || a->rdahi == 15) {
+        return false;
+    }
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    qn = mve_qreg_ptr(a->qn);
+    qm = mve_qreg_ptr(a->qm);
+
+    /*
+     * This insn is subject to beat-wise execution. Partial execution
+     * of an A=0 (no-accumulate) insn which does not execute the first
+     * beat must start with the current rda value, not 0.
+     */
+    if (a->a || mve_skip_first_beat(s)) {
+        rda = tcg_temp_new_i64();
+        rdalo = load_reg(s, a->rdalo);
+        rdahi = load_reg(s, a->rdahi);
+        tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
+        tcg_temp_free_i32(rdalo);
+        tcg_temp_free_i32(rdahi);
+    } else {
+        rda = tcg_const_i64(0);
+    }
+
+    fn(rda, cpu_env, qn, qm, rda);
+    tcg_temp_free_ptr(qn);
+    tcg_temp_free_ptr(qm);
+
+    rdalo = tcg_temp_new_i32();
+    rdahi = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(rdalo, rda);
+    tcg_gen_extrh_i64_i32(rdahi, rda);
+    store_reg(s, a->rdalo, rdalo);
+    store_reg(s, a->rdahi, rdahi);
+    tcg_temp_free_i64(rda);
+    mve_update_eci(s);
+    return true;
+}
+
+static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
+{
+    static MVEGenDualAccOpFn * const fns[4][2] = {
+        { NULL, NULL },
+        { gen_helper_mve_vmlaldavsh, gen_helper_mve_vmlaldavxsh },
+        { gen_helper_mve_vmlaldavsw, gen_helper_mve_vmlaldavxsw },
+        { NULL, NULL },
+    };
+    return do_long_dual_acc(s, a, fns[a->size][a->x]);
+}
+
+static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
+{
+    static MVEGenDualAccOpFn * const fns[4][2] = {
+        { NULL, NULL },
+        { gen_helper_mve_vmlaldavuh, NULL },
+        { gen_helper_mve_vmlaldavuw, NULL },
+        { NULL, NULL },
+    };
+    return do_long_dual_acc(s, a, fns[a->size][a->x]);
+}
-- 
2.20.1

Implement the MVE insn VMLSLDAV, which multiplies source elements,
alternately adding and subtracting them, and accumulates into a
64-bit result in a pair of general purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-21-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  5 +++++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    |  5 +++++
 target/arm/translate-mve.c | 11 +++++++++++
 4 files changed, 23 insertions(+)

Implement the MVE VRMLALDAVH and VRMLSLDAVH insns, which accumulate
the results of a rounded multiply of pairs of elements into a 72-bit
accumulator, returning the top 64 bits in a pair of general purpose
registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-22-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  7 +++++++
 target/arm/mve_helper.c    | 37 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 24 ++++++++++++++++++++++++
 4 files changed, 76 insertions(+)

Implement the scalar form of the MVE VADD insn. This takes the
scalar operand from a general purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-23-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  4 ++++
 target/arm/mve.decode      |  7 ++++++
 target/arm/mve_helper.c    | 22 +++++++++++++++++++
 target/arm/translate-mve.c | 45 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 78 insertions(+)

Implement the scalar forms of the MVE VSUB and VMUL insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-24-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 2 ++
 target/arm/mve_helper.c    | 2 ++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 14 insertions(+)

Implement the scalar variants of the MVE VHADD and VHSUB insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-25-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 ++++++++++++++++
 target/arm/mve.decode      |  4 ++++
 target/arm/mve_helper.c    |  8 ++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 32 insertions(+)

Implement the MVE VBRSR insn, which reverses a specified
number of bits in each element, setting the rest to zero.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-26-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  4 ++++
 target/arm/mve.decode      |  1 +
 target/arm/mve_helper.c    | 43 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  1 +
 4 files changed, 49 insertions(+)

Implement the MVE VPST insn, which sets the predicate mask
fields in the VPR to the immediate value encoded in the insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-27-peter.maydell@linaro.org
---
 target/arm/mve.decode      |  4 +++
 target/arm/translate-mve.c | 59 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
+
+# Predicate operations
+%mask_22_13      22:1 13:3
+VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
     }
 }
 
+static void mve_update_and_store_eci(DisasContext *s)
+{
+    /*
+     * For insns which don't call a helper function that will call
+     * mve_advance_vpt(), this version updates s->eci and also stores
+     * it out to the CPUState field.
+     */
+    if (s->eci) {
+        mve_update_eci(s);
+        store_cpu_field(tcg_constant_i32(s->eci << 4), condexec_bits);
+    }
+}
+
 static bool mve_skip_first_beat(DisasContext *s)
 {
     /* Return true if PSR.ECI says we must skip the first beat of this insn */
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
     };
     return do_long_dual_acc(s, a, fns[a->x]);
 }
+
+static bool trans_VPST(DisasContext *s, arg_VPST *a)
+{
+    TCGv_i32 vpr;
+
+    /* mask == 0 is a "related encoding" */
+    if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
+        return false;
+    }
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+    /*
+     * Set the VPR mask fields. We take advantage of MASK01 and MASK23
+     * being adjacent fields in the register.
+     *
+     * This insn is not predicated, but it is subject to beat-wise
+     * execution, and the mask is updated on the odd-numbered beats.
+     * So if PSR.ECI says we should skip beat 1, we mustn't update the
+     * 01 mask field.
+     */
+    vpr = load_cpu_field(v7m.vpr);
+    switch (s->eci) {
+    case ECI_NONE:
+    case ECI_A0:
+        /* Update both 01 and 23 fields */
+        tcg_gen_deposit_i32(vpr, vpr,
+                            tcg_constant_i32(a->mask | (a->mask << 4)),
+                            R_V7M_VPR_MASK01_SHIFT,
+                            R_V7M_VPR_MASK01_LENGTH + R_V7M_VPR_MASK23_LENGTH);
+        break;
+    case ECI_A0A1:
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        /* Update only the 23 mask field */
+        tcg_gen_deposit_i32(vpr, vpr,
+                            tcg_constant_i32(a->mask),
+                            R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    store_cpu_field(vpr, v7m.vpr);
+    mve_update_and_store_eci(s);
+    return true;
+}
-- 
2.20.1

Implement the MVE VQADD and VQSUB insns, which perform saturating
addition of a scalar to each element.  Note that individual bytes of
each result element are used or discarded according to the predicate
mask, but FPSCR.QC is only set if the predicate mask for the lowest
byte of the element is set.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-28-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 ++++++++++
 target/arm/mve.decode      |  5 +++
 target/arm/mve_helper.c    | 62 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 +++
 4 files changed, 87 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vhsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vhsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vqadds_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqadds_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqadds_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqaddu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqaddu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqaddu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqsubs_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubs_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubs_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+
+VQADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+VQADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 
 # Predicate operations
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhaddu, do_vhadd_u)
 DO_2OP_S(vhsubs, do_vhsub_s)
 DO_2OP_U(vhsubu, do_vhsub_u)
 
+static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
+{
+    if (val > max) {
+        *s = true;
+        return max;
+    } else if (val < min) {
+        *s = true;
+        return min;
+    }
+    return val;
+}
+
+#define DO_SQADD_B(n, m, s) do_sat_bhw((int64_t)n + m, INT8_MIN, INT8_MAX, s)
+#define DO_SQADD_H(n, m, s) do_sat_bhw((int64_t)n + m, INT16_MIN, INT16_MAX, s)
+#define DO_SQADD_W(n, m, s) do_sat_bhw((int64_t)n + m, INT32_MIN, INT32_MAX, s)
+
+#define DO_UQADD_B(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT8_MAX, s)
+#define DO_UQADD_H(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT16_MAX, s)
+#define DO_UQADD_W(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT32_MAX, s)
+
+#define DO_SQSUB_B(n, m, s) do_sat_bhw((int64_t)n - m, INT8_MIN, INT8_MAX, s)
+#define DO_SQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, INT16_MIN, INT16_MAX, s)
+#define DO_SQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, INT32_MIN, INT32_MAX, s)
+
+#define DO_UQSUB_B(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT8_MAX, s)
+#define DO_UQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT16_MAX, s)
+#define DO_UQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT32_MAX, s)
 
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhsubu, do_vhsub_u)
         mve_advance_vpt(env);                                           \
     }
 
+#define DO_2OP_SAT_SCALAR(OP, ESIZE, TYPE, FN)                          \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+                                uint32_t rm)                            \
+    {                                                                   \
+        TYPE *d = vd, *n = vn;                                          \
+        TYPE m = rm;                                                    \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        bool qc = false;                                                \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            bool sat = false;                                           \
+            mergemask(&d[H##ESIZE(e)], FN(n[H##ESIZE(e)], m, &sat),     \
+                      mask);                                            \
+            qc |= sat & mask & 1;                                       \
+        }                                                               \
+        if (qc) {                                                       \
+            env->vfp.qc[0] = qc;                                        \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
 /* provide unsigned 2-op scalar helpers for all sizes */
 #define DO_2OP_SCALAR_U(OP, FN)                 \
     DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR_U(vhaddu_scalar, do_vhadd_u)
 DO_2OP_SCALAR_S(vhsubs_scalar, do_vhsub_s)
 DO_2OP_SCALAR_U(vhsubu_scalar, do_vhsub_u)
 
+DO_2OP_SAT_SCALAR(vqaddu_scalarb, 1, uint8_t, DO_UQADD_B)
+DO_2OP_SAT_SCALAR(vqaddu_scalarh, 2, uint16_t, DO_UQADD_H)
+DO_2OP_SAT_SCALAR(vqaddu_scalarw, 4, uint32_t, DO_UQADD_W)
+DO_2OP_SAT_SCALAR(vqadds_scalarb, 1, int8_t, DO_SQADD_B)
+DO_2OP_SAT_SCALAR(vqadds_scalarh, 2, int16_t, DO_SQADD_H)
+DO_2OP_SAT_SCALAR(vqadds_scalarw, 4, int32_t, DO_SQADD_W)
+
+DO_2OP_SAT_SCALAR(vqsubu_scalarb, 1, uint8_t, DO_UQSUB_B)
+DO_2OP_SAT_SCALAR(vqsubu_scalarh, 2, uint16_t, DO_UQSUB_H)
+DO_2OP_SAT_SCALAR(vqsubu_scalarw, 4, uint32_t, DO_UQSUB_W)
+DO_2OP_SAT_SCALAR(vqsubs_scalarb, 1, int8_t, DO_SQSUB_B)
+DO_2OP_SAT_SCALAR(vqsubs_scalarh, 2, int16_t, DO_SQSUB_H)
+DO_2OP_SAT_SCALAR(vqsubs_scalarw, 4, int32_t, DO_SQSUB_W)
+
 static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
 {
     m &= 0xff;
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VHADD_S_scalar, vhadds_scalar)
 DO_2OP_SCALAR(VHADD_U_scalar, vhaddu_scalar)
 DO_2OP_SCALAR(VHSUB_S_scalar, vhsubs_scalar)
 DO_2OP_SCALAR(VHSUB_U_scalar, vhsubu_scalar)
+DO_2OP_SCALAR(VQADD_S_scalar, vqadds_scalar)
+DO_2OP_SCALAR(VQADD_U_scalar, vqaddu_scalar)
+DO_2OP_SCALAR(VQSUB_S_scalar, vqsubs_scalar)
+DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
 DO_2OP_SCALAR(VBRSR, vbrsr)
 
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
-- 
2.20.1

Implement the MVE VQDMULH and VQRDMULH scalar insns, which multiply
elements by the scalar, double, possibly round, take the high half
and saturate.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-29-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 38 insertions(+)

Implement the MVE VQDMULL scalar insn. This multiplies the top or
bottom half of each element by the scalar, doubles and saturates
to a double-width result.

Note that this encoding overlaps with VQADD and VQSUB; it uses
what in VQADD and VQSUB would be the 'size=0b11' encoding.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-30-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  5 +++
 target/arm/mve.decode      | 23 +++++++++++---
 target/arm/mve_helper.c    | 65 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 30 ++++++++++++++++++
 4 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 %qm 5:1 1:3
 %qn 7:1 17:3
 
+# VQDMULL has size in bit 28: 0 for 16 bit, 1 for 32 bit
+%size_28 28:1 !function=plus_1
+
 &vldr_vstr rn qd imm p a w size l u
 &1op qd qm size
 &2op qd qm qn size
@@ -XXX,XX +XXX,XX @@
 @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
 
 @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
+@2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 
 # Vector loads and stores
 
@@ -XXX,XX +XXX,XX @@ VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 
-VQADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
-VQADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
-VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
-VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+{
+  VQADD_S_scalar  1110  1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+  VQADD_U_scalar  1111  1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+  VQDMULLB_scalar 111 . 1110 0 . 11 ... 0 ... 0 1111 . 110 .... @2scalar_nosz \
+                  size=%size_28
+}
+
+{
+  VQSUB_S_scalar  1110  1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+  VQSUB_U_scalar  1111  1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+  VQDMULLT_scalar 111 . 1110 0 . 11 ... 0 ... 1 1111 . 110 .... @2scalar_nosz \
+                  size=%size_28
+}
+
 VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 
 VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 
+
 # Predicate operations
 %mask_22_13      22:1 13:3
 VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
 DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
 DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 
+/*
+ * Long saturating scalar ops. As with DO_2OP_L, TYPE and H are for the
+ * input (smaller) type and LESIZE, LTYPE, LH for the output (long) type.
+ * SATMASK specifies which bits of the predicate mask matter for determining
+ * whether to propagate a saturation indication into FPSCR.QC -- for
+ * the 16x16->32 case we must check only the bit corresponding to the T or B
+ * half that we used, but for the 32x32->64 case we propagate if the mask
+ * bit is set for either half.
+ */
+#define DO_2OP_SAT_SCALAR_L(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN, SATMASK) \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+                                uint32_t rm)                            \
+    {                                                                   \
+        LTYPE *d = vd;                                                  \
+        TYPE *n = vn;                                                   \
+        TYPE m = rm;                                                    \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned le;                                                    \
+        bool qc = false;                                                \
+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
+            bool sat = false;                                           \
+            LTYPE r = FN((LTYPE)n[H##ESIZE(le * 2 + TOP)], m, &sat);    \
+            mergemask(&d[H##LESIZE(le)], r, mask);                      \
+            qc |= sat && (mask & SATMASK);                              \
+        }                                                               \
+        if (qc) {                                                       \
+            env->vfp.qc[0] = qc;                                        \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+static inline int32_t do_qdmullh(int16_t n, int16_t m, bool *sat)
+{
+    int64_t r = ((int64_t)n * m) * 2;
+    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat);
+}
+
+static inline int64_t do_qdmullw(int32_t n, int32_t m, bool *sat)
+{
+    /* The multiply can't overflow, but the doubling might */
+    int64_t r = (int64_t)n * m;
+    if (r > INT64_MAX / 2) {
+        *sat = true;
+        return INT64_MAX;
+    } else if (r < INT64_MIN / 2) {
+        *sat = true;
+        return INT64_MIN;
+    } else {
+        return r * 2;
+    }
+}
+
+#define SATMASK16B 1
+#define SATMASK16T (1 << 2)
+#define SATMASK32 ((1 << 4) | 1)
+
+DO_2OP_SAT_SCALAR_L(vqdmullb_scalarh, 0, 2, int16_t, 4, int32_t, \
+                    do_qdmullh, SATMASK16B)
+DO_2OP_SAT_SCALAR_L(vqdmullb_scalarw, 0, 4, int32_t, 8, int64_t, \
+                    do_qdmullw, SATMASK32)
+DO_2OP_SAT_SCALAR_L(vqdmullt_scalarh, 1, 2, int16_t, 4, int32_t, \
+                    do_qdmullh, SATMASK16T)
+DO_2OP_SAT_SCALAR_L(vqdmullt_scalarw, 1, 4, int32_t, 8, int64_t, \
+                    do_qdmullw, SATMASK32)
+
 static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
 {
     m &= 0xff;
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
 DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
 DO_2OP_SCALAR(VBRSR, vbrsr)
 
+static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
+{
+    static MVEGenTwoOpScalarFn * const fns[] = {
+        NULL,
+        gen_helper_mve_vqdmullb_scalarh,
+        gen_helper_mve_vqdmullb_scalarw,
+        NULL,
+    };
+    if (a->qd == a->qn && a->size == MO_32) {
+        /* UNPREDICTABLE; we choose to undef */
+        return false;
+    }
+    return do_2op_scalar(s, a, fns[a->size]);
+}
+
+static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
+{
+    static MVEGenTwoOpScalarFn * const fns[] = {
+        NULL,
+        gen_helper_mve_vqdmullt_scalarh,
+        gen_helper_mve_vqdmullt_scalarw,
+        NULL,
+    };
+    if (a->qd == a->qn && a->size == MO_32) {
+        /* UNPREDICTABLE; we choose to undef */
+        return false;
+    }
+    return do_2op_scalar(s, a, fns[a->size]);
+}
+
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                              MVEGenDualAccOpFn *fn)
 {
-- 
2.20.1

Implement the vector forms of the MVE VQDMULH and VQRDMULH insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-31-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 27 +++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 40 insertions(+)

Implement the vector forms of the MVE VQADD and VQSUB insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-32-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 ++++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 14 ++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 39 insertions(+)

Implement the MVE VQSHL insn (encoding T4, which is the
vector-shift-by-vector version).

The DO_SQSHL_OP and DO_UQSHL_OP macros here are derived from
the neon_helper.c code for qshl_u{8,16,32} and qshl_s{8,16,32}.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-33-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      | 12 ++++++++++++
 target/arm/mve_helper.c    | 34 ++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 56 insertions(+)

Implement the MV VQRSHL (vector) insn.  Again, the code to perform
the actual shifts is borrowed from neon_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-34-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 6 ++++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 19 insertions(+)

Implement the MVE VSHL insn (vector form).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-35-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 6 ++++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 19 insertions(+)

Implement the MVE VRSHL insn (vector form).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-36-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 4 ++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 17 insertions(+)

Implement the MVE VQDMLADH and VQRDMLADH insns.  These multiply
elements, and then add pairs of products, double, possibly round,
saturate and return the high half of the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-37-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 +++++++
 target/arm/mve.decode      |  5 +++
 target/arm/mve_helper.c    | 89 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++
 4 files changed, 114 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vqdmladhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmladhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 VQRSHL_S         111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
 VQRSHL_U         111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
 
+VQDMLADH         1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
+VQDMLADHX        1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
+VQRDMLADH        1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
+VQRDMLADHX       1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
 DO_2OP_SAT_S(vqrshls, DO_SQRSHL_OP)
 DO_2OP_SAT_U(vqrshlu, DO_UQRSHL_OP)
 
+/*
+ * Multiply add dual returning high half
+ * The 'FN' here takes four inputs A, B, C, D, a 0/1 indicator of
+ * whether to add the rounding constant, and the pointer to the
+ * saturation flag, and should do "(A * B + C * D) * 2 + rounding constant",
+ * saturate to twice the input size and return the high half; or
+ * (A * B - C * D) etc for VQDMLSDH.
+ */
+#define DO_VQDMLADH_OP(OP, ESIZE, TYPE, XCHG, ROUND, FN)                \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+                                void *vm)                               \
+    {                                                                   \
+        TYPE *d = vd, *n = vn, *m = vm;                                 \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        bool qc = false;                                                \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            bool sat = false;                                           \
+            if ((e & 1) == XCHG) {                                      \
+                TYPE r = FN(n[H##ESIZE(e)],                             \
+                            m[H##ESIZE(e - XCHG)],                      \
+                            n[H##ESIZE(e + (1 - 2 * XCHG))],            \
+                            m[H##ESIZE(e + (1 - XCHG))],                \
+                            ROUND, &sat);                               \
+                mergemask(&d[H##ESIZE(e)], r, mask);                    \
+                qc |= sat & mask & 1;                                   \
+            }                                                           \
+        }                                                               \
+        if (qc) {                                                       \
+            env->vfp.qc[0] = qc;                                        \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+static int8_t do_vqdmladh_b(int8_t a, int8_t b, int8_t c, int8_t d,
+                            int round, bool *sat)
+{
+    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 7);
+    return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
+}
+
+static int16_t do_vqdmladh_h(int16_t a, int16_t b, int16_t c, int16_t d,
+                             int round, bool *sat)
+{
+    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 15);
+    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
+}
+
+static int32_t do_vqdmladh_w(int32_t a, int32_t b, int32_t c, int32_t d,
+                             int round, bool *sat)
+{
+    int64_t m1 = (int64_t)a * b;
+    int64_t m2 = (int64_t)c * d;
+    int64_t r;
+    /*
+     * Architecturally we should do the entire add, double, round
+     * and then check for saturation. We do three saturating adds,
+     * but we need to be careful about the order. If the first
+     * m1 + m2 saturates then it's impossible for the *2+rc to
+     * bring it back into the non-saturated range. However, if
+     * m1 + m2 is negative then it's possible that doing the doubling
+     * would take the intermediate result below INT64_MAX and the
+     * addition of the rounding constant then brings it back in range.
+     * So we add half the rounding constant before doubling rather
+     * than adding the rounding constant after the doubling.
+     */
+    if (sadd64_overflow(m1, m2, &r) ||
+        sadd64_overflow(r, (round << 30), &r) ||
+        sadd64_overflow(r, r, &r)) {
+        *sat = true;
+        return r < 0 ? INT32_MAX : INT32_MIN;
+    }
+    return r >> 32;
+}
+
+DO_VQDMLADH_OP(vqdmladhb, 1, int8_t, 0, 0, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqdmladhh, 2, int16_t, 0, 0, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqdmladhw, 4, int32_t, 0, 0, do_vqdmladh_w)
+DO_VQDMLADH_OP(vqdmladhxb, 1, int8_t, 1, 0, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqdmladhxh, 2, int16_t, 1, 0, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqdmladhxw, 4, int32_t, 1, 0, do_vqdmladh_w)
+
+DO_VQDMLADH_OP(vqrdmladhb, 1, int8_t, 0, 1, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqrdmladhh, 2, int16_t, 0, 1, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqrdmladhw, 4, int32_t, 0, 1, do_vqdmladh_w)
+DO_VQDMLADH_OP(vqrdmladhxb, 1, int8_t, 1, 1, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqrdmladhxh, 2, int16_t, 1, 1, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqrdmladhxw, 4, int32_t, 1, 1, do_vqdmladh_w)
+
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                 uint32_t rm)                            \
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQSHL_S, vqshls)
 DO_2OP(VQSHL_U, vqshlu)
 DO_2OP(VQRSHL_S, vqrshls)
 DO_2OP(VQRSHL_U, vqrshlu)
+DO_2OP(VQDMLADH, vqdmladh)
+DO_2OP(VQDMLADHX, vqdmladhx)
+DO_2OP(VQRDMLADH, vqrdmladh)
+DO_2OP(VQRDMLADHX, vqrdmladhx)
 
 static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                           MVEGenTwoOpScalarFn fn)
-- 
2.20.1

Implement the MVE VQDMLSDH and VQRDMLSDH insns, which are
like VQDMLADH and VQRDMLADH except that products are subtracted
rather than added.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-38-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 44 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 69 insertions(+)

Implement the vector form of the MVE VQDMULL insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-39-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  5 +++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 30 ++++++++++++++++++++++++++++++
 4 files changed, 70 insertions(+)

Implement the MVE VRHADD insn, which performs a rounded halving
addition.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-40-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 6 ++++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 19 insertions(+)

Implement the MVE VADC and VSBC insns.  These perform an
add-with-carry or subtract-with-carry of the 32-bit elements in each
lane of the input vectors, where the carry-out of each add is the
carry-in of the next.  The initial carry input is either 1 or is from
FPSCR.C; the carry out at the end is written back to FPSCR.C.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-41-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  5 ++++
 target/arm/mve.decode      |  5 ++++
 target/arm/mve_helper.c    | 52 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 37 +++++++++++++++++++++++++++
 4 files changed, 99 insertions(+)

Implement the MVE VCADD insn, which performs a complex add with
rotate.  Note that the size=0b11 encoding is VSBC.

The architecture grants some leeway for the "destination and Vm
source overlap" case for the size MO_32 case, but we choose not to
make use of it, instead always calculating all 16 bytes worth of
results before setting the destination register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-42-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  9 +++++++--
 target/arm/mve_helper.c    | 29 +++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  7 +++++++
 4 files changed, 51 insertions(+), 2 deletions(-)

Implement the MVE VHCADD insn, which is similar to VCADD
but performs a halving step. This one overlaps with VADC.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-43-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 8 ++++++--
 target/arm/mve_helper.c    | 2 ++
 target/arm/translate-mve.c | 4 +++-
 4 files changed, 19 insertions(+), 3 deletions(-)

Implement the MVE VADDV insn, which performs an addition
across vector lanes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-44-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    | 24 +++++++++++++++++++++
 target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 76 insertions(+)

In a CPU with MVE, the VMOV (vector lane to general-purpose register)
and VMOV (general-purpose register to vector lane) insns are not
predicated, but they are subject to beatwise execution if they
are not in an IT block.

Since our implementation always executes all 4 beats in one tick,
this means only that we need to handle PSR.ECI:
 * we must do the usual check for bad ECI state
 * we must advance ECI state if the insn succeeds
 * if ECI says we should not be executing the beat corresponding
   to the lane of the vector register being accessed then we
   should skip performing the move

Note that if PSR.ECI is non-zero then we cannot be in an IT block.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-45-peter.maydell@linaro.org
---
 target/arm/translate-a32.h |  2 +
 target/arm/translate-mve.c |  4 +-
 target/arm/translate-vfp.c | 77 +++++++++++++++++++++++++++++++++++---
 3 files changed, 75 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@ long neon_full_reg_offset(unsigned reg);
 long neon_element_offset(int reg, int element, MemOp memop);
 void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
 void clear_eci_state(DisasContext *s);
+bool mve_eci_check(DisasContext *s);
+void mve_update_and_store_eci(DisasContext *s);
 
 static inline TCGv_i32 load_cpu_offset(int offset)
 {
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool mve_check_qreg_bank(DisasContext *s, int qmask)
     return qmask < 8;
 }
 
-static bool mve_eci_check(DisasContext *s)
+bool mve_eci_check(DisasContext *s)
 {
     /*
      * This is a beatwise insn: check that ECI is valid (not a
@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
     }
 }
 
-static void mve_update_and_store_eci(DisasContext *s)
+void mve_update_and_store_eci(DisasContext *s)
 {
     /*
      * For insns which don't call a helper function that will call
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
     return true;
 }
 
+static bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
+{
+    /*
+     * In a CPU with MVE, the VMOV (vector lane to general-purpose register)
+     * and VMOV (general-purpose register to vector lane) insns are not
+     * predicated, but they are subject to beatwise execution if they are
+     * not in an IT block.
+     *
+     * Since our implementation always executes all 4 beats in one tick,
+     * this means only that if PSR.ECI says we should not be executing
+     * the beat corresponding to the lane of the vector register being
+     * accessed then we should skip performing the move, and that we need
+     * to do the usual check for bad ECI state and advance of ECI state.
+     *
+     * Note that if PSR.ECI is non-zero then we cannot be in an IT block.
+     *
+     * Return true if this VMOV scalar <-> gpreg should be skipped because
+     * the MVE PSR.ECI state says we skip the beat where the store happens.
+     */
+
+    /* Calculate the byte offset into Qn which we're going to access */
+    int ofs = (index << size) + ((vn & 1) * 8);
+
+    if (!dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+
+    switch (s->eci) {
+    case ECI_NONE:
+        return false;
+    case ECI_A0:
+        return ofs < 4;
+    case ECI_A0A1:
+        return ofs < 8;
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        return ofs < 12;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 {
     /* VMOV scalar to general purpose register */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
         return false;
     }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        if (!mve_eci_check(s)) {
+            return true;
+        }
+    }
+
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = tcg_temp_new_i32();
-    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
-    store_reg(s, a->rt, tmp);
+    if (!mve_skip_vmov(s, a->vn, a->index, a->size)) {
+        tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, a->index,
+                            a->size | (a->u ? 0 : MO_SIGN));
+        store_reg(s, a->rt, tmp);
+    }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        mve_update_and_store_eci(s);
+    }
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
         return false;
     }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        if (!mve_eci_check(s)) {
+            return true;
+        }
+    }
+
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = load_reg(s, a->rt);
-    write_neon_element32(tmp, a->vn, a->index, a->size);
-    tcg_temp_free_i32(tmp);
+    if (!mve_skip_vmov(s, a->vn, a->index, a->size)) {
+        tmp = load_reg(s, a->rt);
+        write_neon_element32(tmp, a->vn, a->index, a->size);
+        tcg_temp_free_i32(tmp);
+    }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        mve_update_and_store_eci(s);
+    }
     return true;
 }
 
-- 
2.20.1

From: Peter Collingbourne <pcc@google.com>

MTE3 introduces an asymmetric tag checking mode, in which loads are
checked synchronously and stores are checked asynchronously. Add
support for it.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210616195614.11785-1-pcc@google.com
[PMM: Add line to emulation.rst]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/emulation.rst |  1 +
 target/arm/cpu64.c            |  2 +-
 target/arm/mte_helper.c       | 82 ++++++++++++++++++++++-------------
 3 files changed, 53 insertions(+), 32 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_LSE (Large System Extensions)
 - FEAT_MTE (Memory Tagging Extension)
 - FEAT_MTE2 (Memory Tagging Extension)
+- FEAT_MTE3 (MTE Asymmetric Fault Handling)
 - FEAT_PAN (Privileged access never)
 - FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN)
 - FEAT_PAuth (Pointer authentication)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          * during realize if the board provides no tag memory, much like
          * we do for EL2 with the virtualization=on property.
          */
-        t = FIELD_DP64(t, ID_AA64PFR1, MTE, 2);
+        t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3);
         cpu->isar.id_aa64pfr1 = t;
 
         t = cpu->isar.id_aa64mmfr0;
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(stzgm_tags)(CPUARMState *env, uint64_t ptr, uint64_t val)
     }
 }
 
+static void mte_sync_check_fail(CPUARMState *env, uint32_t desc,
+                                uint64_t dirty_ptr, uintptr_t ra)
+{
+    int is_write, syn;
+
+    env->exception.vaddress = dirty_ptr;
+
+    is_write = FIELD_EX32(desc, MTEDESC, WRITE);
+    syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0, is_write,
+                                0x11);
+    raise_exception_ra(env, EXCP_DATA_ABORT, syn, exception_target_el(env), ra);
+    g_assert_not_reached();
+}
+
+static void mte_async_check_fail(CPUARMState *env, uint64_t dirty_ptr,
+                                 uintptr_t ra, ARMMMUIdx arm_mmu_idx, int el)
+{
+    int select;
+
+    if (regime_has_2_ranges(arm_mmu_idx)) {
+        select = extract64(dirty_ptr, 55, 1);
+    } else {
+        select = 0;
+    }
+    env->cp15.tfsr_el[el] |= 1 << select;
+#ifdef CONFIG_USER_ONLY
+    /*
+     * Stand in for a timer irq, setting _TIF_MTE_ASYNC_FAULT,
+     * which then sends a SIGSEGV when the thread is next scheduled.
+     * This cpu will return to the main loop at the end of the TB,
+     * which is rather sooner than "normal".  But the alternative
+     * is waiting until the next syscall.
+     */
+    qemu_cpu_kick(env_cpu(env));
+#endif
+}
+
 /* Record a tag check failure.  */
 static void mte_check_fail(CPUARMState *env, uint32_t desc,
                            uint64_t dirty_ptr, uintptr_t ra)
 {
     int mmu_idx = FIELD_EX32(desc, MTEDESC, MIDX);
     ARMMMUIdx arm_mmu_idx = core_to_aa64_mmu_idx(mmu_idx);
-    int el, reg_el, tcf, select, is_write, syn;
+    int el, reg_el, tcf;
     uint64_t sctlr;
 
     reg_el = regime_el(env, arm_mmu_idx);
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
     switch (tcf) {
     case 1:
         /* Tag check fail causes a synchronous exception. */
-        env->exception.vaddress = dirty_ptr;
-
-        is_write = FIELD_EX32(desc, MTEDESC, WRITE);
-        syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
-                                    is_write, 0x11);
-        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
-                           exception_target_el(env), ra);
-        /* noreturn, but fall through to the assert anyway */
+        mte_sync_check_fail(env, desc, dirty_ptr, ra);
+        break;
 
     case 0:
         /*
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
 
     case 2:
         /* Tag check fail causes asynchronous flag set.  */
-        if (regime_has_2_ranges(arm_mmu_idx)) {
-            select = extract64(dirty_ptr, 55, 1);
-        } else {
-            select = 0;
-        }
-        env->cp15.tfsr_el[el] |= 1 << select;
-#ifdef CONFIG_USER_ONLY
-        /*
-         * Stand in for a timer irq, setting _TIF_MTE_ASYNC_FAULT,
-         * which then sends a SIGSEGV when the thread is next scheduled.
-         * This cpu will return to the main loop at the end of the TB,
-         * which is rather sooner than "normal".  But the alternative
-         * is waiting until the next syscall.
-         */
-        qemu_cpu_kick(env_cpu(env));
-#endif
+        mte_async_check_fail(env, dirty_ptr, ra, arm_mmu_idx, el);
         break;
 
-    default:
-        /* Case 3: Reserved. */
-        qemu_log_mask(LOG_GUEST_ERROR,
-                      "Tag check failure with SCTLR_EL%d.TCF%s "
-                      "set to reserved value %d\n",
-                      reg_el, el ? "" : "0", tcf);
+    case 3:
+        /*
+         * Tag check fail causes asynchronous flag set for stores, or
+         * a synchronous exception for loads.
+         */
+        if (FIELD_EX32(desc, MTEDESC, WRITE)) {
+            mte_async_check_fail(env, dirty_ptr, ra, arm_mmu_idx, el);
+        } else {
+            mte_sync_check_fail(env, desc, dirty_ptr, ra);
+        }
         break;
     }
 }
-- 
2.20.1

From: Alexandre Iooss <erdnaxe@crans.org>

This adds the target guide for BBC Micro:bit.

Information is taken from https://wiki.qemu.org/Features/MicroBit
and from hw/arm/nrf51_soc.c.

Signed-off-by: Alexandre Iooss <erdnaxe@crans.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-id: 20210621075625.540471-1-erdnaxe@crans.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/nrf.rst    | 51 ++++++++++++++++++++++++++++++++++++++
 docs/system/target-arm.rst |  1 +
 MAINTAINERS                |  1 +
 3 files changed, 53 insertions(+)
 create mode 100644 docs/system/arm/nrf.rst

diff --git a/docs/system/arm/nrf.rst b/docs/system/arm/nrf.rst
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/docs/system/arm/nrf.rst
@@ -XXX,XX +XXX,XX @@
+Nordic nRF boards (``microbit``)
+================================
+
+The `Nordic nRF`_ chips are a family of ARM-based System-on-Chip that
+are designed to be used for low-power and short-range wireless solutions.
+
+.. _Nordic nRF: https://www.nordicsemi.com/Products
+
+The nRF51 series is the first series for short range wireless applications.
+It is superseded by the nRF52 series.
+The following machines are based on this chip :
+
+- ``microbit``       BBC micro:bit board with nRF51822 SoC
+
+There are other series such as nRF52, nRF53 and nRF91 which are currently not
+supported by QEMU.
+
+Supported devices
+-----------------
+
+ * ARM Cortex-M0 (ARMv6-M)
+ * Serial ports (UART)
+ * Clock controller
+ * Timers
+ * Random Number Generator (RNG)
+ * GPIO controller
+ * NVMC
+ * SWI
+
+Missing devices
+---------------
+
+ * Watchdog
+ * Real-Time Clock (RTC) controller
+ * TWI (i2c)
+ * SPI controller
+ * Analog to Digital Converter (ADC)
+ * Quadrature decoder
+ * Radio
+
+Boot options
+------------
+
+The Micro:bit machine can be started using the ``-device`` option to load a
+firmware in `ihex format`_. Example:
+
+.. _ihex format: https://en.wikipedia.org/wiki/Intel_HEX
+
+.. code-block:: bash
+
+  $ qemu-system-arm -M microbit -device loader,file=test.hex
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -XXX,XX +XXX,XX @@ undocumented; you can get a complete list by running
    arm/digic
    arm/musicpal
    arm/gumstix
+   arm/nrf
    arm/nseries
    arm/nuvoton
    arm/orangepi
diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/*/microbit*.c
 F: include/hw/*/nrf51*.h
 F: include/hw/*/microbit*.h
 F: tests/qtest/microbit-test.c
+F: docs/system/arm/nrf.rst
 
 AVR Machines
 -------------
-- 
2.20.1

The following changes since commit 5767815218efd3cbfd409505ed824d5f356044ae:

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2024-02-14 15:45:52 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240215

for you to fetch changes up to f780e63fe731b058fe52d43653600d8729a1b5f2:

docs: Add documentation for the mps3-an536 board (2024-02-15 14:32:39 +0000)

----------------------------------------------------------------
target-arm queue:
 * hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
 * linux-user/aarch64: Choose SYNC as the preferred MTE mode
 * Fix some errors in SVE/SME handling of MTE tags
 * hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
 * hw/block/tc58128: Don't emit deprecation warning under qtest
 * tests/qtest: Fix handling of npcm7xx and GMAC tests
 * hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
 * tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
 * Don't assert on vmload/vmsave of M-profile CPUs
 * hw/arm/smmuv3: add support for stage 1 access fault
 * hw/arm/stellaris: QOM cleanups
 * Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
 * Improve Cortex_R52 IMPDEF sysreg modelling
 * Allow access to SPSR_hyp from hyp mode
 * New board model mps3-an536 (Cortex-R52)

----------------------------------------------------------------
Luc Michel (1):
      hw/arm/smmuv3: add support for stage 1 access fault

Nabih Estefan (1):
      tests/qtest: Fix GMAC test to run on a machine in upstream QEMU

Peter Maydell (22):
      hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
      hw/block/tc58128: Don't emit deprecation warning under qtest
      tests/qtest/meson.build: Don't include qtests_npcm7xx in qtests_aarch64
      tests/qtest/bios-tables-test: Allow changes to virt GTDT
      hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
      tests/qtest/bios-tables-tests: Update virt golden reference
      hw/arm/npcm7xx: Call qemu_configure_nic_device() for GMAC modules
      tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
      target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
      target/arm: Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
      target/arm: The Cortex-R52 has a read-only CBAR
      target/arm: Add Cortex-R52 IMPDEF sysregs
      target/arm: Allow access to SPSR_hyp from hyp mode
      hw/misc/mps2-scc: Fix condition for CFG3 register
      hw/misc/mps2-scc: Factor out which-board conditionals
      hw/misc/mps2-scc: Make changes needed for AN536 FPGA image
      hw/arm/mps3r: Initial skeleton for mps3-an536 board
      hw/arm/mps3r: Add CPUs, GIC, and per-CPU RAM
      hw/arm/mps3r: Add UARTs
      hw/arm/mps3r: Add GPIO, watchdog, dual-timer, I2C devices
      hw/arm/mps3r: Add remaining devices
      docs: Add documentation for the mps3-an536 board

Philippe Mathieu-Daudé (5):
      hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
      hw/arm/stellaris: Convert ADC controller to Resettable interface
      hw/arm/stellaris: Convert I2C controller to Resettable interface
      hw/arm/stellaris: Add missing QOM 'machine' parent
      hw/arm/stellaris: Add missing QOM 'SoC' parent

Richard Henderson (6):
      linux-user/aarch64: Choose SYNC as the preferred MTE mode
      target/arm: Fix nregs computation in do_{ld,st}_zpa
      target/arm: Adjust and validate mtedesc sizem1
      target/arm: Split out make_svemte_desc
      target/arm: Handle mte in do_ldrq, do_ldro
      target/arm: Fix SVE/SME gross MTE suppression checks

From: Philippe Mathieu-Daudé <philmd@linaro.org>

Similarly to commits dadbb58f59..5ae79fe825 for other ARM boards,
connect FIQ output of the GIC CPU interfaces to the CPU.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20240130152548.17855-1-philmd@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xilinx_zynq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -XXX,XX +XXX,XX @@ static void zynq_init(MachineState *machine)
     sysbus_mmio_map(busdev, 0, MPCORE_PERIPHBASE);
     sysbus_connect_irq(busdev, 0,
                        qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));
+    sysbus_connect_irq(busdev, 1,
+                       qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ));
 
     for (n = 0; n < 64; n++) {
         pic[n] = qdev_get_gpio_in(dev, n);
-- 
2.34.1