Series comparison

-[PULL 00/49] target-arm queue
+[PULL 00/57] target-arm queue
-target-arm queue: I have a lot more still in my to-review
+The following changes since commit 53f306f316549d20c76886903181413d20842423:
 queue, but my rule of thumb is when I get to 50 patches or
 so to send out what I have.
-thanks
+  Merge remote-tracking branch 'remotes/ehabkost-gl/tags/x86-next-pull-request' into staging (2021-06-21 11:26:04 +0100)
 -- PMM
 The following changes since commit 9a7beaad3dbba982f7a461d676b55a5c3851d312:
   Merge remote-tracking branch 'remotes/alistair/tags/pull-riscv-to-apply-20210304' into staging (2021-03-05 10:47:46 +0000)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210305
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210621
-for you to fetch changes up to 2c669ff88ec6733420a000103a2b8b9e93df4945:
+for you to fetch changes up to a83f1d9263d281f938a3984cda7104d55affd43a:
-  hw/arm/mps2: Update old infocenter.arm.com URLs (2021-03-05 15:17:38 +0000)
+  docs/system: arm: Add nRF boards description (2021-06-21 17:24:33 +0100)
 ----------------------------------------------------------------
- * sbsa-ref: remove cortex-a53 from list of supported cpus
+target-arm queue:
- * sbsa-ref: add 'max' to list of allowed cpus
+ * Don't require 'virt' board to be compiled in for ACPI GHES code
- * target/arm: Add support for FEAT_SSBS, Speculative Store Bypass Safe
+ * docs: Document which architecture extensions we emulate
- * npcm7xx: add EMC model
+ * Fix bugs in M-profile FPCXT_NS accesses
- * xlnx-zynqmp: Remove obsolete 'has_rpu' property
+ * First slice of MVE patches
- * target/arm: Speed up aarch64 TBL/TBX
+ * Implement MTE3
- * virtio-mmio: improve virtio-mmio get_dev_path alog
+ * docs/system: arm: Add nRF boards description
  * target/arm: Use TCF0 and TFSRE0 for unprivileged tag checks
  * target/arm: Restrict v8M IDAU to TCG
  * target/arm/cpu: Update coding style to make checkpatch.pl happy
  * musicpal, tc6393xb, omap_lcdc, tcx: drop dead code for non-32-bit-RGB surfaces
  * Add new board: mps3-an524
 ----------------------------------------------------------------
-Doug Evans (3):
+Alexandre Iooss (1):
-      hw/net: Add npcm7xx emc model
+      docs/system: arm: Add nRF boards description
       hw/arm: Add npcm7xx emc model
       tests/qtests: Add npcm7xx emc model test
 Marcin Juszkiewicz (2):
       sbsa-ref: remove cortex-a53 from list of supported cpus
       sbsa-ref: add 'max' to list of allowed cpus
 Peter Collingbourne (1):
-      target/arm: Use TCF0 and TFSRE0 for unprivileged tag checks
+      target/arm: Implement MTE3
-Peter Maydell (34):
+Peter Maydell (55):
-      hw/arm/musicpal: Remove dead code for non-32-bit-RGB surfaces
+      hw/acpi: Provide stub version of acpi_ghes_record_errors()
-      hw/display/tc6393xb: Remove dead code for handling non-32bpp surfaces
+      hw/acpi: Provide function acpi_ghes_present()
-      hw/display/tc6393xb: Expand out macros in template header
+      target/arm: Use acpi_ghes_present() to see if we report ACPI memory errors
-      hw/display/tc6393xb: Inline tc6393xb_draw_graphic32() at its callsite
+      docs/system/arm: Document which architecture extensions we emulate
-      hw/display/omap_lcdc: Expand out macros in template header
+      target/arm/translate-vfp.c: Whitespace fixes
-      hw/display/omap_lcdc: Drop broken bigendian ifdef
+      target/arm: Handle FPU being disabled in FPCXT_NS accesses
-      hw/display/omap_lcdc: Fix coding style issues in template header
+      target/arm: Don't NOCP fault for FPCXT_NS accesses
-      hw/display/omap_lcdc: Inline template header into C file
+      target/arm: Handle writeback in VLDR/VSTR sysreg with no memory access
-      hw/display/omap_lcdc: Delete unnecessary macro
+      target/arm: Factor FP context update code out into helper function
-      hw/display/tcx: Drop unnecessary code for handling BGR format outputs
+      target/arm: Split vfp_access_check() into A and M versions
-      hw/arm/mps2-tz: Make SYSCLK frequency board-specific
+      target/arm: Handle FPU check for FPCXT_NS insns via vfp_access_check_m()
-      hw/misc/mps2-scc: Support configurable number of OSCCLK values
+      target/arm: Implement MVE VLDR/VSTR (non-widening forms)
-      hw/arm/mps2-tz: Correct the OSCCLK settings for mps2-an505 and mps2-an511
+      target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
-      hw/arm/mps2-tz: Make the OSCCLK settings be configurable per-board
+      target/arm: Implement MVE VCLZ
-      hw/misc/mps2-fpgaio: Make number of LEDs configurable by board
+      target/arm: Implement MVE VCLS
-      hw/misc/mps2-fpgaio: Support SWITCH register
+      target/arm: Implement MVE VREV16, VREV32, VREV64
-      hw/arm/mps2-tz: Make FPGAIO switch and LED config per-board
+      target/arm: Implement MVE VMVN (register)
-      hw/arm/mps2-tz: Condition IRQ splitting on number of CPUs, not board type
+      target/arm: Implement MVE VABS
-      hw/arm/mps2-tz: Make number of IRQs board-specific
+      target/arm: Implement MVE VNEG
-      hw/misc/mps2-scc: Implement CFG_REG5 and CFG_REG6 for MPS3 AN524
+      tcg: Make gen_dup_i32/i64() public as tcg_gen_dup_i32/i64
-      hw/arm/mps2-tz: Correct wrong interrupt numbers for DMA and SPI
+      target/arm: Implement MVE VDUP
-      hw/arm/mps2-tz: Allow PPCPortInfo structures to specify device interrupts
+      target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
-      hw/arm/mps2-tz: Move device IRQ info to data structures
+      target/arm: Implement MVE VADD, VSUB, VMUL
-      hw/arm/mps2-tz: Size the uart-irq-orgate based on the number of UARTs
+      target/arm: Implement MVE VMULH
-      hw/arm/mps2-tz: Allow boards to have different PPCInfo data
+      target/arm: Implement MVE VRMULH
-      hw/arm/mps2-tz: Make RAM arrangement board-specific
+      target/arm: Implement MVE VMAX, VMIN
-      hw/arm/mps2-tz: Set MachineClass default_ram info from RAMInfo data
+      target/arm: Implement MVE VABD
-      hw/arm/mps2-tz: Support ROMs as well as RAMs
+      target/arm: Implement MVE VHADD, VHSUB
-      hw/arm/mps2-tz: Get armv7m_load_kernel() size argument from RAMInfo
+      target/arm: Implement MVE VMULL
-      hw/arm/mps2-tz: Add new mps3-an524 board
+      target/arm: Implement MVE VMLALDAV
-      hw/arm/mps2-tz: Stub out USB controller for mps3-an524
+      target/arm: Implement MVE VMLSLDAV
-      hw/arm/mps2-tz: Provide PL031 RTC on mps3-an524
+      target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
-      docs/system/arm/mps2.rst: Document the new mps3-an524 board
+      target/arm: Implement MVE VADD (scalar)
-      hw/arm/mps2: Update old infocenter.arm.com URLs
+      target/arm: Implement MVE VSUB, VMUL (scalar)
       target/arm: Implement MVE VHADD, VHSUB (scalar)
       target/arm: Implement MVE VBRSR
       target/arm: Implement MVE VPST
       target/arm: Implement MVE VQADD and VQSUB
       target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
       target/arm: Implement MVE VQDMULL scalar
       target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
       target/arm: Implement MVE VQADD, VQSUB (vector)
       target/arm: Implement MVE VQSHL (vector)
       target/arm: Implement MVE VQRSHL
       target/arm: Implement MVE VSHL insn
       target/arm: Implement MVE VRSHL
       target/arm: Implement MVE VQDMLADH and VQRDMLADH
       target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
       target/arm: Implement MVE VQDMULL (vector)
       target/arm: Implement MVE VRHADD
       target/arm: Implement MVE VADC, VSBC
       target/arm: Implement MVE VCADD
       target/arm: Implement MVE VHCADD
       target/arm: Implement MVE VADDV
       target/arm: Make VMOV scalar <-> gpreg beatwise for MVE
-Philippe Mathieu-Daudé (4):
+ docs/system/arm/emulation.rst |  103 ++++
-      hw/arm/xlnx-zynqmp: Remove obsolete 'has_rpu' property
+ docs/system/arm/nrf.rst       |   51 ++
-      hw/i2c/npcm7xx_smbus: Simplify npcm7xx_smbus_init()
+ docs/system/target-arm.rst    |    7 +
-      target/arm: Restrict v8M IDAU to TCG
+ include/hw/acpi/ghes.h        |    9 +
-      target/arm/cpu: Update coding style to make checkpatch.pl happy
+ include/tcg/tcg-op.h          |    8 +
  include/tcg/tcg.h             |    1 -
  target/arm/helper-mve.h       |  357 +++++++++++++
  target/arm/helper.h           |    2 +
  target/arm/internals.h        |   11 +
  target/arm/translate-a32.h    |    3 +
  target/arm/translate.h        |   10 +
  target/arm/m-nocp.decode      |   24 +
  target/arm/mve.decode         |  240 +++++++++
  target/arm/vfp.decode         |   14 -
  hw/acpi/ghes-stub.c           |   22 +
  hw/acpi/ghes.c                |   17 +
  target/arm/cpu64.c            |    2 +-
  target/arm/kvm64.c            |    6 +-
  target/arm/mte_helper.c       |   82 +--
  target/arm/mve_helper.c       | 1160 +++++++++++++++++++++++++++++++++++++++++
  target/arm/translate-m-nocp.c |  550 +++++++++++++++++++
  target/arm/translate-mve.c    |  759 +++++++++++++++++++++++++++
  target/arm/translate-vfp.c    |  741 +++++++-------------------
  tcg/tcg-op-gvec.c             |   20 +-
  MAINTAINERS                   |    1 +
  hw/acpi/meson.build           |    6 +-
  target/arm/meson.build        |    1 +
 files changed, 3578 insertions(+), 629 deletions(-)
  create mode 100644 docs/system/arm/emulation.rst
  create mode 100644 docs/system/arm/nrf.rst
  create mode 100644 target/arm/helper-mve.h
  create mode 100644 hw/acpi/ghes-stub.c
  create mode 100644 target/arm/mve_helper.c
-Rebecca Cran (3):
-      target/arm: Add support for FEAT_SSBS, Speculative Store Bypass Safe
-      target/arm: Enable FEAT_SSBS for "max" AARCH64 CPU
-      target/arm: Set ID_PFR2.SSBS to 1 for "max" 32-bit CPU
-Richard Henderson (1):
-      target/arm: Speed up aarch64 TBL/TBX
-schspa (1):
-      virtio-mmio: improve virtio-mmio get_dev_path alog
- docs/system/arm/mps2.rst         |  24 +-
- docs/system/arm/nuvoton.rst      |   3 +-
- hw/display/omap_lcd_template.h   | 169 --------
- hw/display/tc6393xb_template.h   |  72 ----
- include/hw/arm/armsse.h          |   4 +-
- include/hw/arm/npcm7xx.h         |   2 +
- include/hw/arm/xlnx-zynqmp.h     |   2 -
- include/hw/misc/armsse-cpuid.h   |   2 +-
- include/hw/misc/armsse-mhu.h     |   2 +-
- include/hw/misc/iotkit-secctl.h  |   2 +-
- include/hw/misc/iotkit-sysctl.h  |   2 +-
- include/hw/misc/iotkit-sysinfo.h |   2 +-
- include/hw/misc/mps2-fpgaio.h    |   8 +-
- include/hw/misc/mps2-scc.h       |  10 +-
- include/hw/net/npcm7xx_emc.h     | 286 +++++++++++++
- include/ui/console.h             |  10 -
- target/arm/cpu.h                 |  15 +-
- target/arm/helper-a64.h          |   2 +-
- target/arm/internals.h           |   6 +
- hw/arm/mps2-tz.c                 | 632 +++++++++++++++++++++++-----
- hw/arm/mps2.c                    |   5 +
- hw/arm/musicpal.c                |  64 ++-
- hw/arm/npcm7xx.c                 |  50 ++-
- hw/arm/sbsa-ref.c                |   2 +-
- hw/arm/xlnx-zynqmp.c             |   6 -
- hw/display/omap_lcdc.c           | 129 +++++-
- hw/display/tc6393xb.c            |  48 +--
- hw/display/tcx.c                 |  31 +-
- hw/i2c/npcm7xx_smbus.c           |   1 -
- hw/misc/armsse-cpuid.c           |   2 +-
- hw/misc/armsse-mhu.c             |   2 +-
- hw/misc/iotkit-sysctl.c          |   2 +-
- hw/misc/iotkit-sysinfo.c         |   2 +-
- hw/misc/mps2-fpgaio.c            |  43 +-
- hw/misc/mps2-scc.c               |  93 ++++-
- hw/net/npcm7xx_emc.c             | 857 ++++++++++++++++++++++++++++++++++++++
- hw/virtio/virtio-mmio.c          |  13 +-
- target/arm/cpu.c                 |  23 +-
- target/arm/cpu64.c               |   5 +
- target/arm/cpu_tcg.c             |   8 +
- target/arm/helper-a64.c          |  32 --
- target/arm/helper.c              |  39 +-
- target/arm/mte_helper.c          |  13 +-
- target/arm/translate-a64.c       |  70 +---
- target/arm/vec_helper.c          |  48 +++
- tests/qtest/npcm7xx_emc-test.c   | 862 +++++++++++++++++++++++++++++++++++++++
- hw/net/meson.build               |   1 +
- hw/net/trace-events              |  17 +
- tests/qtest/meson.build          |   3 +-
-files changed, 3098 insertions(+), 628 deletions(-)
- delete mode 100644 hw/display/omap_lcd_template.h
- delete mode 100644 hw/display/tc6393xb_template.h
- create mode 100644 include/hw/net/npcm7xx_emc.h
- create mode 100644 hw/net/npcm7xx_emc.c
- create mode 100644 tests/qtest/npcm7xx_emc-test.c

-New patch
+[PULL 01/57] hw/acpi: Provide stub version of acpi_ghes_record_errors()
+Generic code in target/arm wants to call acpi_ghes_record_errors();
+provide a stub version so that we don't fail to link when
+CONFIG_ACPI_APEI is not set. This requires us to add a new
+ghes-stub.c file to contain it and the meson.build mechanics
+to use it when appropriate.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
+Message-id: 20210603171259.27962-2-peter.maydell@linaro.org
+---
+ hw/acpi/ghes-stub.c | 17 +++++++++++++++++
+ hw/acpi/meson.build |  6 +++---
+files changed, 20 insertions(+), 3 deletions(-)
+ create mode 100644 hw/acpi/ghes-stub.c
+diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/hw/acpi/ghes-stub.c
+@@ -XXX,XX +XXX,XX @@
++/*
++ * Support for generating APEI tables and recording CPER for Guests:
++ * stub functions.
++ *
++ * Copyright (c) 2021 Linaro, Ltd
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or later.
++ * See the COPYING file in the top-level directory.
++ */
++
++#include "qemu/osdep.h"
++#include "hw/acpi/ghes.h"
++
++int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
++{
++    return -1;
++}
+diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/acpi/meson.build
++++ b/hw/acpi/meson.build
+@@ -XXX,XX +XXX,XX @@ acpi_ss.add(when: 'CONFIG_ACPI_PCI', if_true: files('pci.c'))
+ acpi_ss.add(when: 'CONFIG_ACPI_VMGENID', if_true: files('vmgenid.c'))
+ acpi_ss.add(when: 'CONFIG_ACPI_HW_REDUCED', if_true: files('generic_event_device.c'))
+ acpi_ss.add(when: 'CONFIG_ACPI_HMAT', if_true: files('hmat.c'))
+-acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'))
++acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'), if_false: files('ghes-stub.c'))
+ acpi_ss.add(when: 'CONFIG_ACPI_X86', if_true: files('core.c', 'piix4.c', 'pcihp.c'), if_false: files('acpi-stub.c'))
+ acpi_ss.add(when: 'CONFIG_ACPI_X86_ICH', if_true: files('ich9.c', 'tco.c'))
+ acpi_ss.add(when: 'CONFIG_IPMI', if_true: files('ipmi.c'), if_false: files('ipmi-stub.c'))
+ acpi_ss.add(when: 'CONFIG_PC', if_false: files('acpi-x86-stub.c'))
+ acpi_ss.add(when: 'CONFIG_TPM', if_true: files('tpm.c'))
+-softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c'))
++softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c'))
+ softmmu_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
+ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('acpi-stub.c', 'aml-build-stub.c',
+-                                                  'acpi-x86-stub.c', 'ipmi-stub.c'))
++                                                  'acpi-x86-stub.c', 'ipmi-stub.c', 'ghes-stub.c'))
+--
+.20.1

-[PULL 38/49] hw/arm/mps2-tz: Move device IRQ info to data structures
+[PULL 02/57] hw/acpi: Provide function acpi_ghes_present()
-Move the specification of the IRQ information for the uart, ethernet,
+Allow code elsewhere in the system to check whether the ACPI GHES
-dma and spi devices to the data structures.  (The other devices
+table is present, so it can determine whether it is OK to try to
-handled by the PPCPortInfo structures don't have any interrupt lines
+record an error by calling acpi_ghes_record_errors().
-we need to wire up.)
 (We don't need to migrate the new 'present' field in AcpiGhesState,
 because it is set once at system initialization and doesn't change.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-14-peter.maydell@linaro.org
+Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
 Message-id: 20210603171259.27962-3-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 52 +++++++++++++++++++++++-------------------------
+ include/hw/acpi/ghes.h |  9 +++++++++
-file changed, 25 insertions(+), 27 deletions(-)
+ hw/acpi/ghes-stub.c    |  5 +++++
  hw/acpi/ghes.c         | 17 +++++++++++++++++
 files changed, 31 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/include/hw/acpi/ghes.h
-+++ b/hw/arm/mps2-tz.c
++++ b/include/hw/acpi/ghes.h
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
+@@ -XXX,XX +XXX,XX @@ enum {
-                                const char *name, hwaddr size,
-                                const int *irqs)
+ typedef struct AcpiGhesState {
      uint64_t ghes_addr_le;
 +    bool present; /* True if GHES is present at all on this board */
  } AcpiGhesState;
  void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
@@ -XXX,XX +XXX,XX @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
  void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                            GArray *hardware_errors);
  int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
 +
 +/**
 + * acpi_ghes_present: Report whether ACPI GHES table is present
 + *
 + * Returns: true if the system has an ACPI GHES table and it is
 + * safe to call acpi_ghes_record_errors() to record a memory error.
 + */
 +bool acpi_ghes_present(void);
  #endif
 diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/acpi/ghes-stub.c
 +++ b/hw/acpi/ghes-stub.c
@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
  {
-+    /* The irq[] array is tx, rx, combined, in that order */
+     return -1;
      MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
      CMSDKAPBUART *uart = opaque;
      int i = uart - &mms->uart[0];
 -    int rxirqno = i * 2 + 32;
 -    int txirqno = i * 2 + 33;
 -    int combirqno = i + 42;
      SysBusDevice *s;
      DeviceState *orgate_dev = DEVICE(&mms->uart_irq_orgate);
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
      qdev_prop_set_uint32(DEVICE(uart), "pclk-frq", mmc->sysclk_frq);
      sysbus_realize(SYS_BUS_DEVICE(uart), &error_fatal);
      s = SYS_BUS_DEVICE(uart);
 -    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, txirqno));
 -    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, rxirqno));
 +    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
 +    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, irqs[1]));
      sysbus_connect_irq(s, 2, qdev_get_gpio_in(orgate_dev, i * 2));
      sysbus_connect_irq(s, 3, qdev_get_gpio_in(orgate_dev, i * 2 + 1));
 -    sysbus_connect_irq(s, 4, get_sse_irq_in(mms, combirqno));
 +    sysbus_connect_irq(s, 4, get_sse_irq_in(mms, irqs[2]));
      return sysbus_mmio_get_region(SYS_BUS_DEVICE(uart), 0);
  }
++
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
++bool acpi_ghes_present(void)
++{
-     s = SYS_BUS_DEVICE(mms->lan9118);
++    return false;
-     sysbus_realize_and_unref(s, &error_fatal);
++}
--    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, 48));
+diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
-+    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
+index XXXXXXX..XXXXXXX 100644
-     return sysbus_mmio_get_region(s, 0);
+--- a/hw/acpi/ghes.c
 +++ b/hw/acpi/ghes.c
@@ -XXX,XX +XXX,XX @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
      /* Create a read-write fw_cfg file for Address */
      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
          NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
 +
 +    ags->present = true;
  }
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_dma(MPS2TZMachineState *mms, void *opaque,
+ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
-                               const char *name, hwaddr size,
+@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
-                               const int *irqs)
- {
+     return ret;
 +    /* The irq[] array is DMACINTR, DMACINTERR, DMACINTTC, in that order */
      PL080State *dma = opaque;
      int i = dma - &mms->dma[0];
      SysBusDevice *s;
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_dma(MPS2TZMachineState *mms, void *opaque,
      s = SYS_BUS_DEVICE(dma);
      /* Wire up DMACINTR, DMACINTERR, DMACINTTC */
 -    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, 58 + i * 3));
 -    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, 56 + i * 3));
 -    sysbus_connect_irq(s, 2, get_sse_irq_in(mms, 57 + i * 3));
 +    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
 +    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, irqs[1]));
 +    sysbus_connect_irq(s, 2, get_sse_irq_in(mms, irqs[2]));
      g_free(mscname);
      return sysbus_mmio_get_region(s, 0);
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_spi(MPS2TZMachineState *mms, void *opaque,
       * lines are set via the "MISC" register in the MPS2 FPGAIO device.
       */
      PL022State *spi = opaque;
 -    int i = spi - &mms->spi[0];
      SysBusDevice *s;
      object_initialize_child(OBJECT(mms), name, spi, TYPE_PL022);
      sysbus_realize(SYS_BUS_DEVICE(spi), &error_fatal);
      s = SYS_BUS_DEVICE(spi);
 -    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, 51 + i));
 +    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
      return sysbus_mmio_get_region(s, 0);
  }
++
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
++bool acpi_ghes_present(void)
-         }, {
++{
-             .name = "apb_ppcexp1",
++    AcpiGedState *acpi_ged_state;
-             .ports = {
++    AcpiGhesState *ags;
--                { "spi0", make_spi, &mms->spi[0], 0x40205000, 0x1000 },
++
--                { "spi1", make_spi, &mms->spi[1], 0x40206000, 0x1000 },
++    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
--                { "spi2", make_spi, &mms->spi[2], 0x40209000, 0x1000 },
++                                                       NULL));
--                { "spi3", make_spi, &mms->spi[3], 0x4020a000, 0x1000 },
++
--                { "spi4", make_spi, &mms->spi[4], 0x4020b000, 0x1000 },
++    if (!acpi_ged_state) {
--                { "uart0", make_uart, &mms->uart[0], 0x40200000, 0x1000 },
++        return false;
--                { "uart1", make_uart, &mms->uart[1], 0x40201000, 0x1000 },
++    }
--                { "uart2", make_uart, &mms->uart[2], 0x40202000, 0x1000 },
++    ags = &acpi_ged_state->ghes_state;
--                { "uart3", make_uart, &mms->uart[3], 0x40203000, 0x1000 },
++    return ags->present;
--                { "uart4", make_uart, &mms->uart[4], 0x40204000, 0x1000 },
++}
 +                { "spi0", make_spi, &mms->spi[0], 0x40205000, 0x1000, { 51 } },
 +                { "spi1", make_spi, &mms->spi[1], 0x40206000, 0x1000, { 52 } },
 +                { "spi2", make_spi, &mms->spi[2], 0x40209000, 0x1000, { 53 } },
 +                { "spi3", make_spi, &mms->spi[3], 0x4020a000, 0x1000, { 54 } },
 +                { "spi4", make_spi, &mms->spi[4], 0x4020b000, 0x1000, { 55 } },
 +                { "uart0", make_uart, &mms->uart[0], 0x40200000, 0x1000, { 32, 33, 42 } },
 +                { "uart1", make_uart, &mms->uart[1], 0x40201000, 0x1000, { 34, 35, 43 } },
 +                { "uart2", make_uart, &mms->uart[2], 0x40202000, 0x1000, { 36, 37, 44 } },
 +                { "uart3", make_uart, &mms->uart[3], 0x40203000, 0x1000, { 38, 39, 45 } },
 +                { "uart4", make_uart, &mms->uart[4], 0x40204000, 0x1000, { 40, 41, 46 } },
                  { "i2c0", make_i2c, &mms->i2c[0], 0x40207000, 0x1000 },
                  { "i2c1", make_i2c, &mms->i2c[1], 0x40208000, 0x1000 },
                  { "i2c2", make_i2c, &mms->i2c[2], 0x4020c000, 0x1000 },
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
                  { "gpio1", make_unimp_dev, &mms->gpio[1], 0x40101000, 0x1000 },
                  { "gpio2", make_unimp_dev, &mms->gpio[2], 0x40102000, 0x1000 },
                  { "gpio3", make_unimp_dev, &mms->gpio[3], 0x40103000, 0x1000 },
 -                { "eth", make_eth_dev, NULL, 0x42000000, 0x100000 },
 +                { "eth", make_eth_dev, NULL, 0x42000000, 0x100000, { 48 } },
              },
          }, {
              .name = "ahb_ppcexp1",
              .ports = {
 -                { "dma0", make_dma, &mms->dma[0], 0x40110000, 0x1000 },
 -                { "dma1", make_dma, &mms->dma[1], 0x40111000, 0x1000 },
 -                { "dma2", make_dma, &mms->dma[2], 0x40112000, 0x1000 },
 -                { "dma3", make_dma, &mms->dma[3], 0x40113000, 0x1000 },
 +                { "dma0", make_dma, &mms->dma[0], 0x40110000, 0x1000, { 58, 56, 57 } },
 +                { "dma1", make_dma, &mms->dma[1], 0x40111000, 0x1000, { 61, 59, 60 } },
 +                { "dma2", make_dma, &mms->dma[2], 0x40112000, 0x1000, { 64, 62, 63 } },
 +                { "dma3", make_dma, &mms->dma[3], 0x40113000, 0x1000, { 67, 65, 66 } },
              },
          },
      };
 --
 .20.1

-New patch
+[PULL 03/57] target/arm: Use acpi_ghes_present() to see if we report ACPI memory errors
+The virt_is_acpi_enabled() function is specific to the virt board, as
+is the check for its 'ras' property.  Use the new acpi_ghes_present()
+function to check whether we should report memory errors via
+acpi_ghes_record_errors().
+This avoids a link error if QEMU was built without support for the
+virt board, and provides a mechanism that can be used by any future
+board models that want to add ACPI memory error reporting support
+(they only need to call acpi_ghes_add_fw_cfg()).
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
+Message-id: 20210603171259.27962-4-peter.maydell@linaro.org
+---
+ target/arm/kvm64.c | 6 +-----
+file changed, 1 insertion(+), 5 deletions(-)
+diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/kvm64.c
++++ b/target/arm/kvm64.c
+@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
+ {
+     ram_addr_t ram_addr;
+     hwaddr paddr;
+-    Object *obj = qdev_get_machine();
+-    VirtMachineState *vms = VIRT_MACHINE(obj);
+-    bool acpi_enabled = virt_is_acpi_enabled(vms);
+     assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
+-    if (acpi_enabled && addr &&
+-            object_property_get_bool(obj, "ras", NULL)) {
++    if (acpi_ghes_present() && addr) {
+         ram_addr = qemu_ram_addr_from_host(addr);
+         if (ram_addr != RAM_ADDR_INVALID &&
+             kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
+--
+.20.1

-[PULL 34/49] hw/arm/mps2-tz: Make number of IRQs board-specific
+[PULL 04/57] docs/system/arm: Document which architecture extensions we emulate
-The AN524 has more interrupt lines than the AN505 and AN521; make
+These days the Arm architecture has a wide range of fine-grained
-numirq board-specific rather than a compile-time constant.
+optional extra architectural features. We implement quite a lot
+of these but by no means all of them. Document what we do implement,
-Since the difference is small (92 on the current boards and 95 on the
+so that users can find out without having to dig through back-issues
-new one) we don't dynamically allocate the cpu_irq_splitter[] array
+of our Changelog on the wiki.
 but leave it as a fixed length array whose size is the maximum needed
 for any of the boards.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20210617140328.28622-1-peter.maydell@linaro.org
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-10-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 15 ++++++++++-----
+ docs/system/arm/emulation.rst | 102 ++++++++++++++++++++++++++++++++++
-file changed, 10 insertions(+), 5 deletions(-)
+ docs/system/target-arm.rst    |   6 ++
 files changed, 108 insertions(+)
  create mode 100644 docs/system/arm/emulation.rst
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@
 +A-profile CPU architecture support
 +==================================
 +
 +QEMU's TCG emulation includes support for the Armv5, Armv6, Armv7 and
 +Armv8 versions of the A-profile architecture. It also has support for
 +the following architecture extensions:
 +
 +- FEAT_AA32BF16 (AArch32 BFloat16 instructions)
 +- FEAT_AA32HPD (AArch32 hierarchical permission disables)
 +- FEAT_AA32I8MM (AArch32 Int8 matrix multiplication instructions)
 +- FEAT_AES (AESD and AESE instructions)
 +- FEAT_BF16 (AArch64 BFloat16 instructions)
 +- FEAT_BTI (Branch Target Identification)
 +- FEAT_DIT (Data Independent Timing instructions)
 +- FEAT_DPB (DC CVAP instruction)
 +- FEAT_DotProd (Advanced SIMD dot product instructions)
 +- FEAT_FCMA (Floating-point complex number instructions)
 +- FEAT_FHM (Floating-point half-precision multiplication instructions)
 +- FEAT_FP16 (Half-precision floating-point data processing)
 +- FEAT_FRINTTS (Floating-point to integer instructions)
 +- FEAT_FlagM (Flag manipulation instructions v2)
 +- FEAT_FlagM2 (Enhancements to flag manipulation instructions)
 +- FEAT_HPDS (Hierarchical permission disables)
 +- FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
 +- FEAT_JSCVT (JavaScript conversion instructions)
 +- FEAT_LOR (Limited ordering regions)
 +- FEAT_LRCPC (Load-acquire RCpc instructions)
 +- FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
 +- FEAT_LSE (Large System Extensions)
 +- FEAT_MTE (Memory Tagging Extension)
 +- FEAT_MTE2 (Memory Tagging Extension)
 +- FEAT_PAN (Privileged access never)
 +- FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN)
 +- FEAT_PAuth (Pointer authentication)
 +- FEAT_PMULL (PMULL, PMULL2 instructions)
 +- FEAT_PMUv3p1 (PMU Extensions v3.1)
 +- FEAT_PMUv3p4 (PMU Extensions v3.4)
 +- FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
 +- FEAT_RNG (Random number generator)
 +- FEAT_SB (Speculation Barrier)
 +- FEAT_SEL2 (Secure EL2)
 +- FEAT_SHA1 (SHA1 instructions)
 +- FEAT_SHA256 (SHA256 instructions)
 +- FEAT_SHA3 (Advanced SIMD SHA3 instructions)
 +- FEAT_SHA512 (Advanced SIMD SHA512 instructions)
 +- FEAT_SM3 (Advanced SIMD SM3 instructions)
 +- FEAT_SM4 (Advanced SIMD SM4 instructions)
 +- FEAT_SPECRES (Speculation restriction instructions)
 +- FEAT_SSBS (Speculative Store Bypass Safe)
 +- FEAT_TLBIOS (TLB invalidate instructions in Outer Shareable domain)
 +- FEAT_TLBIRANGE (TLB invalidate range instructions)
 +- FEAT_TTCNP (Translation table Common not private translations)
 +- FEAT_TTST (Small translation tables)
 +- FEAT_UAO (Unprivileged Access Override control)
 +- FEAT_VHE (Virtualization Host Extensions)
 +- FEAT_VMID16 (16-bit VMID)
 +- FEAT_XNX (Translation table stage 2 Unprivileged Execute-never)
 +- SVE (The Scalable Vector Extension)
 +- SVE2 (The Scalable Vector Extension v2)
 +
 +For information on the specifics of these extensions, please refer
 +to the `Armv8-A Arm Architecture Reference Manual
 +<https://developer.arm.com/documentation/ddi0487/latest>`_.
 +
 +When a specific named CPU is being emulated, only those features which
 +are present in hardware for that CPU are emulated. (If a feature is
 +not in the list above then it is not supported, even if the real
 +hardware should have it.) The ``max`` CPU enables all features.
 +
 +R-profile CPU architecture support
 +==================================
 +
 +QEMU's TCG emulation support for R-profile CPUs is currently limited.
 +We emulate only the Cortex-R5 and Cortex-R5F CPUs.
 +
 +M-profile CPU architecture support
 +==================================
 +
 +QEMU's TCG emulation includes support for Armv6-M, Armv7-M, Armv8-M, and
 +Armv8.1-M versions of the M-profile architucture.  It also has support
 +for the following architecture extensions:
 +
 +- FP (Floating-point Extension)
 +- FPCXT (FPCXT access instructions)
 +- HP (Half-precision floating-point instructions)
 +- LOB (Low Overhead loops and Branch future)
 +- M (Main Extension)
 +- MPU (Memory Protection Unit Extension)
 +- PXN (Privileged Execute Never)
 +- RAS (Reliability, Serviceability and Availability): "minimum RAS Extension" only
 +- S (Security Extension)
 +- ST (System Timer Extension)
 +
 +For information on the specifics of these extensions, please refer
 +to the `Armv8-M Arm Architecture Reference Manual
 +<https://developer.arm.com/documentation/ddi0553/latest>`_.
 +
 +When a specific named CPU is being emulated, only those features which
 +are present in hardware for that CPU are emulated. (If a feature is
 +not in the list above then it is not supported, even if the real
 +hardware should have it.) There is no equivalent of the ``max`` CPU for
 +M-profile.
 diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/docs/system/target-arm.rst
-+++ b/hw/arm/mps2-tz.c
++++ b/docs/system/target-arm.rst
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ undocumented; you can get a complete list by running
- #include "hw/qdev-clock.h"
+    arm/virt
- #include "qom/object.h"
+    arm/xlnx-versal-virt
--#define MPS2TZ_NUMIRQ 92
++Emulated CPU architecture support
-+#define MPS2TZ_NUMIRQ_MAX 92
++=================================
++
- typedef enum MPS2TZFPGAType {
++.. toctree::
-     FPGA_AN505,
++   arm/emulation
-@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineClass {
++
-     const uint32_t *oscclk;
+ Arm CPU features
-     uint32_t fpgaio_num_leds; /* Number of LEDs in FPGAIO LED0 register */
+ ================
      bool fpgaio_has_switches; /* Does FPGAIO have SWITCH register? */
 +    int numirq; /* Number of external interrupts */
      const char *armsse_type;
  };
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
      SplitIRQ sec_resp_splitter;
      qemu_or_irq uart_irq_orgate;
      DeviceState *lan9118;
 -    SplitIRQ cpu_irq_splitter[MPS2TZ_NUMIRQ];
 +    SplitIRQ cpu_irq_splitter[MPS2TZ_NUMIRQ_MAX];
      Clock *sysclk;
      Clock *s32kclk;
  };
@@ -XXX,XX +XXX,XX @@ static qemu_irq get_sse_irq_in(MPS2TZMachineState *mms, int irqno)
  {
      /* Return a qemu_irq which will signal IRQ n to all CPUs in the SSE. */
      MachineClass *mc = MACHINE_GET_CLASS(mms);
 +    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
 -    assert(irqno < MPS2TZ_NUMIRQ);
 +    assert(irqno < mmc->numirq);
      if (mc->max_cpus > 1) {
          return qdev_get_gpio_in(DEVICE(&mms->cpu_irq_splitter[irqno]), 0);
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
      iotkitdev = DEVICE(&mms->iotkit);
      object_property_set_link(OBJECT(&mms->iotkit), "memory",
                               OBJECT(system_memory), &error_abort);
 -    qdev_prop_set_uint32(iotkitdev, "EXP_NUMIRQ", MPS2TZ_NUMIRQ);
 +    qdev_prop_set_uint32(iotkitdev, "EXP_NUMIRQ", mmc->numirq);
      qdev_connect_clock_in(iotkitdev, "MAINCLK", mms->sysclk);
      qdev_connect_clock_in(iotkitdev, "S32KCLK", mms->s32kclk);
      sysbus_realize(SYS_BUS_DEVICE(&mms->iotkit), &error_fatal);
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
       * board. If there is only one CPU, we can just wire the device IRQ
       * directly to the SSE's IRQ input.
       */
 +    assert(mmc->numirq <= MPS2TZ_NUMIRQ_MAX);
      if (mc->max_cpus > 1) {
 -        for (i = 0; i < MPS2TZ_NUMIRQ; i++) {
 +        for (i = 0; i < mmc->numirq; i++) {
              char *name = g_strdup_printf("mps2-irq-splitter%d", i);
              SplitIRQ *splitter = &mms->cpu_irq_splitter[i];
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
      mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
      mmc->fpgaio_num_leds = 2;
      mmc->fpgaio_has_switches = false;
 +    mmc->numirq = 92;
      mmc->armsse_type = TYPE_IOTKIT;
  }
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
      mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
      mmc->fpgaio_num_leds = 2;
      mmc->fpgaio_has_switches = false;
 +    mmc->numirq = 92;
      mmc->armsse_type = TYPE_SSE200;
  }
 --
 .20.1

-[PULL 15/49] target/arm/cpu: Update coding style to make checkpatch.pl happy
+[PULL 05/57] target/arm/translate-vfp.c: Whitespace fixes
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+In the code for handling VFP system register accesses there is some
 stray whitespace after a unary '-' operator, and also some incorrect
 indent in a couple of function prototypes.  We're about to move this
 code to another file, so fix the code style issues first so
 checkpatch doesn't complain about the code-movement patch.
-We will move this code in the next commit. Clean it up
+Cc: qemu-stable@nongnu.org
-first to avoid checkpatch.pl errors.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210618141019.10671-2-peter.maydell@linaro.org
 ---
  target/arm/translate-vfp.c | 11 +++++------
 file changed, 5 insertions(+), 6 deletions(-)
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 Message-id: 20210221222617.2579610-3-f4bug@amsat.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/cpu.c | 12 ++++++++----
 file changed, 8 insertions(+), 4 deletions(-)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/target/arm/translate-vfp.c
-+++ b/target/arm/cpu.c
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
  }
- static const ARMCPRegInfo cortexa9_cp_reginfo[] = {
+ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
--    /* power_control should be set to maximum latency. Again,
+-
-+    /*
+                                   fp_sysreg_loadfn *loadfn,
-+     * power_control should be set to maximum latency. Again,
+-                                 void *opaque)
-      * default to 0 and set by private hook
++                                  void *opaque)
       */
      { .name = "A9_PWRCTL", .cp = 15, .crn = 15, .crm = 0, .opc1 = 0, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
      set_feature(&cpu->env, ARM_FEATURE_NEON);
      set_feature(&cpu->env, ARM_FEATURE_THUMB2EE);
      set_feature(&cpu->env, ARM_FEATURE_EL3);
 -    /* Note that A9 supports the MP extensions even for
 +    /*
 +     * Note that A9 supports the MP extensions even for
       * A9UP and single-core A9MP (which are both different
       * and valid configurations; we don't model A9UP).
       */
@@ -XXX,XX +XXX,XX @@ static uint64_t a15_l2ctlr_read(CPUARMState *env, const ARMCPRegInfo *ri)
  {
-     MachineState *ms = MACHINE(qdev_get_machine());
+     /* Do a write to an M-profile floating point system register */
+     TCGv_i32 tmp;
--    /* Linux wants the number of processors from here.
+@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
-+    /*
+ }
-+     * Linux wants the number of processors from here.
-      * Might as well set the interrupt-controller bit too.
+ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
-      */
+-                                fp_sysreg_storefn *storefn,
-     return ((ms->smp.cpus - 1) << 24) | (1 << 23);
+-                                void *opaque)
-@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
++                                 fp_sysreg_storefn *storefn,
-     cpu->isar.id_mmfr1 = 0x40000000;
++                                 void *opaque)
-     cpu->isar.id_mmfr2 = 0x01240000;
+ {
-     cpu->isar.id_mmfr3 = 0x02102211;
+     /* Do a read from an M-profile floating point system register */
--    /* a7_mpcore_r0p5_trm, page 4-4 gives 0x01101110; but
+     TCGv_i32 tmp;
-+    /*
+@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
-+     * a7_mpcore_r0p5_trm, page 4-4 gives 0x01101110; but
+     TCGv_i32 addr;
-      * table 4-41 gives 0x02101110, which includes the arm div insns.
-      */
+     if (!a->a) {
-     cpu->isar.id_isar0 = 0x02101110;
+-        offset = - offset;
 +        offset = -offset;
      }
      addr = load_reg(s, a->rn);
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
      TCGv_i32 value = tcg_temp_new_i32();
      if (!a->a) {
 -        offset = - offset;
 +        offset = -offset;
      }
      addr = load_reg(s, a->rn);
 --
 .20.1

-New patch
+[PULL 06/57] target/arm: Handle FPU being disabled in FPCXT_NS accesses
+If the guest makes an FPCXT_NS access when the FPU is disabled,
+one of two things happens:
+ * if there is no active FP context, then the insn behaves the
+   same way as if the FPU was enabled: writes ignored, reads
+   same value as FPDSCR_NS
+ * if there is an active FP context, then we take a NOCP
+   exception
+Add code to the sysreg read/write functions which emits
+code to take the NOCP exception in the latter case.
+At the moment this will never be used, because the NOCP checks in
+m-nocp.decode happen first, and so the trans functions are never
+called when the FPU is disabled.  The code will be needed when we
+move the sysreg access insns to before the NOCP patterns in the
+following commit.
+Cc: qemu-stable@nongnu.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210618141019.10671-3-peter.maydell@linaro.org
+---
+ target/arm/translate-vfp.c | 32 ++++++++++++++++++++++++++++++--
+file changed, 30 insertions(+), 2 deletions(-)
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-vfp.c
++++ b/target/arm/translate-vfp.c
+@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
+         lab_end = gen_new_label();
+         /* fpInactive case: write is a NOP, so branch to end */
+         gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
+-        /* !fpInactive: PreserveFPState(), and reads same as FPCXT_S */
++        /*
++         * !fpInactive: if FPU disabled, take NOCP exception;
++         * otherwise PreserveFPState(), and then FPCXT_NS writes
++         * behave the same as FPCXT_S writes.
++         */
++        if (s->fp_excp_el) {
++            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
++                               syn_uncategorized(), s->fp_excp_el);
++            /*
++             * This was only a conditional exception, so override
++             * gen_exception_insn()'s default to DISAS_NORETURN
++             */
++            s->base.is_jmp = DISAS_NEXT;
++            break;
++        }
+         gen_preserve_fp_state(s);
+         /* fall through */
+     case ARM_VFP_FPCXT_S:
+@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
+         tcg_gen_br(lab_end);
+         gen_set_label(lab_active);
+-        /* !fpInactive: Reads the same as FPCXT_S, but side effects differ */
++        /*
++         * !fpInactive: if FPU disabled, take NOCP exception;
++         * otherwise PreserveFPState(), and then FPCXT_NS
++         * reads the same as FPCXT_S.
++         */
++        if (s->fp_excp_el) {
++            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
++                               syn_uncategorized(), s->fp_excp_el);
++            /*
++             * This was only a conditional exception, so override
++             * gen_exception_insn()'s default to DISAS_NORETURN
++             */
++            s->base.is_jmp = DISAS_NEXT;
++            break;
++        }
+         gen_preserve_fp_state(s);
+         tmp = tcg_temp_new_i32();
+         sfpa = tcg_temp_new_i32();
+--
+.20.1

-[PULL 06/49] hw/net: Add npcm7xx emc model
+[PULL 07/57] target/arm: Don't NOCP fault for FPCXT_NS accesses
-From: Doug Evans <dje@google.com>
+The M-profile architecture requires that accesses to FPCXT_NS when
 there is no active FP state must not take a NOCP fault even if the
 FPU is disabled. We were not implementing this correctly, because
 in our decode we catch the NOCP faults early in m-nocp.decode.
-This is a 10/100 ethernet device that has several features.
+Fix this bug by moving all the handling of M-profile FP system
-Only the ones needed by the Linux driver have been implemented.
+register accesses from vfp.decode into m-nocp.decode and putting
-See npcm7xx_emc.c for a list of unimplemented features.
+it above the NOCP blocks. This provides the correct behaviour:
  * for accesses other than FPCXT_NS the trans functions call
    vfp_access_check(), which will check for FPU disabled and
    raise a NOCP exception if necessary
  * for FPCXT_NS we have the special case code that doesn't
    call vfp_access_check()
  * when these trans functions want to raise an UNDEF they return
    false, so the decoder will fall through into the NOCP blocks.
    This means that NOCP correctly takes precedence over UNDEF
    for these insns. (This is a difference from the other insns
    handled by m-nocp.decode, where UNDEF takes precedence and
    which we implement by having those trans functions call
    unallocated_encoding() in the appropriate places.)
-Reviewed-by: Hao Wu <wuhaotsh@google.com>
+[Note for backport to stable: this commit has a semantic dependency
-Reviewed-by: Avi Fishman <avi.fishman@nuvoton.com>
+on commit 9a486856e9173af, which was not marked as cc-stable because
-Signed-off-by: Doug Evans <dje@google.com>
+we didn't know we'd need it for a for-stable bugfix.]
-Message-id: 20210218212453.831406-2-dje@google.com
 Cc: qemu-stable@nongnu.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210618141019.10671-4-peter.maydell@linaro.org
 ---
- include/hw/net/npcm7xx_emc.h | 286 ++++++++++++
+ target/arm/translate-a32.h    |   1 +
- hw/net/npcm7xx_emc.c         | 857 +++++++++++++++++++++++++++++++++++
+ target/arm/m-nocp.decode      |  24 ++
- hw/net/meson.build           |   1 +
+ target/arm/vfp.decode         |  14 -
- hw/net/trace-events          |  17 +
+ target/arm/translate-m-nocp.c | 514 +++++++++++++++++++++++++++++++++
-files changed, 1161 insertions(+)
+ target/arm/translate-vfp.c    | 517 +---------------------------------
- create mode 100644 include/hw/net/npcm7xx_emc.h
+files changed, 542 insertions(+), 528 deletions(-)
  create mode 100644 hw/net/npcm7xx_emc.c
-diff --git a/include/hw/net/npcm7xx_emc.h b/include/hw/net/npcm7xx_emc.h
+diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
-new file mode 100644
+index XXXXXXX..XXXXXXX 100644
-index XXXXXXX..XXXXXXX
+--- a/target/arm/translate-a32.h
---- /dev/null
++++ b/target/arm/translate-a32.h
-+++ b/include/hw/net/npcm7xx_emc.h
+@@ -XXX,XX +XXX,XX @@ bool disas_neon_shared(DisasContext *s, uint32_t insn);
  void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
  void arm_gen_condlabel(DisasContext *s);
  bool vfp_access_check(DisasContext *s);
 +void gen_preserve_fp_state(DisasContext *s);
  void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop);
  void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop);
  void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop);
 diff --git a/target/arm/m-nocp.decode b/target/arm/m-nocp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/m-nocp.decode
 +++ b/target/arm/m-nocp.decode
 @@ -XXX,XX +XXX,XX @@
+ &nocp cp
++# M-profile VLDR/VSTR to sysreg
++%vldr_sysreg 22:1 13:3
++%imm7_0x4 0:7 !function=times_4
++
++&vldr_sysreg rn reg imm a w p
++@vldr_sysreg .... ... . a:1 . . . rn:4 ... . ... .. ....... \
++             reg=%vldr_sysreg imm=%imm7_0x4 &vldr_sysreg
++
+ {
+   # Special cases which do not take an early NOCP: VLLDM and VLSTM
+   VLLDM_VLSTM  1110 1100 001 l:1 rn:4 0000 1010 op:1 000 0000
+@@ -XXX,XX +XXX,XX @@
+   VSCCLRM      1110 1100 1.01 1111 .... 1011 imm:7 0   vd=%vd_dp size=3
+   VSCCLRM      1110 1100 1.01 1111 .... 1010 imm:8     vd=%vd_sp size=2
++  # FP system register accesses: these are a special case because accesses
++  # to FPCXT_NS succeed even if the FPU is disabled. We therefore need
++  # to handle them before the big NOCP blocks. Note that within these
++  # insns NOCP still has higher priority than UNDEFs; this is implemented
++  # by their returning 'false' for UNDEF so as to fall through into the
++  # NOCP check (in contrast to VLLDM etc, which call unallocated_encoding()
++  # for the UNDEFs there that must take precedence over NOCP.)
++
++  VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
++
++  # P=0 W=0 is SEE "Related encodings", so split into two patterns
++  VLDR_sysreg  ---- 110 1 . . w:1 1 .... ... 0 111 11 ....... @vldr_sysreg p=1
++  VLDR_sysreg  ---- 110 0 . . 1   1 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
++  VSTR_sysreg  ---- 110 1 . . w:1 0 .... ... 0 111 11 ....... @vldr_sysreg p=1
++  VSTR_sysreg  ---- 110 0 . . 1   0 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
++
+   NOCP         111- 1110 ---- ---- ---- cp:4 ---- ---- &nocp
+   NOCP         111- 110- ---- ---- ---- cp:4 ---- ---- &nocp
+   # From v8.1M onwards this range will also NOCP:
+diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp.decode
++++ b/target/arm/vfp.decode
+@@ -XXX,XX +XXX,XX @@ VLDR_VSTR_hp ---- 1101 u:1 .0 l:1 rn:4 .... 1001 imm:8      vd=%vd_sp
+ VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8      vd=%vd_sp
+ VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8      vd=%vd_dp
+-# M-profile VLDR/VSTR to sysreg
+-%vldr_sysreg 22:1 13:3
+-%imm7_0x4 0:7 !function=times_4
+-
+-&vldr_sysreg rn reg imm a w p
+-@vldr_sysreg .... ... . a:1 . . . rn:4 ... . ... .. ....... \
+-             reg=%vldr_sysreg imm=%imm7_0x4 &vldr_sysreg
+-
+-# P=0 W=0 is SEE "Related encodings", so split into two patterns
+-VLDR_sysreg  ---- 110 1 . . w:1 1 .... ... 0 111 11 ....... @vldr_sysreg p=1
+-VLDR_sysreg  ---- 110 0 . . 1   1 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
+-VSTR_sysreg  ---- 110 1 . . w:1 0 .... ... 0 111 11 ....... @vldr_sysreg p=1
+-VSTR_sysreg  ---- 110 0 . . 1   0 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
+-
+ # We split the load/store multiple up into two patterns to avoid
+ # overlap with other insns in the "Advanced SIMD load/store and 64-bit move"
+ # grouping:
+diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-m-nocp.c
++++ b/target/arm/translate-m-nocp.c
+@@ -XXX,XX +XXX,XX @@
+ #include "qemu/osdep.h"
+ #include "tcg/tcg-op.h"
++#include "tcg/tcg-op-gvec.h"
+ #include "translate.h"
+ #include "translate-a32.h"
+@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
+     return true;
+ }
 +/*
-+ * Nuvoton NPCM7xx EMC Module
++ * M-profile provides two different sets of instructions that can
-+ *
++ * access floating point system registers: VMSR/VMRS (which move
-+ * Copyright 2020 Google LLC
++ * to/from a general purpose register) and VLDR/VSTR sysreg (which
-+ *
++ * move directly to/from memory). In some cases there are also side
-+ * This program is free software; you can redistribute it and/or modify it
++ * effects which must happen after any write to memory (which could
-+ * under the terms of the GNU General Public License as published by the
++ * cause an exception). So we implement the common logic for the
-+ * Free Software Foundation; either version 2 of the License, or
++ * sysreg access in gen_M_fp_sysreg_write() and gen_M_fp_sysreg_read(),
-+ * (at your option) any later version.
++ * which take pointers to callback functions which will perform the
-+ *
++ * actual "read/write general purpose register" and "read/write
-+ * This program is distributed in the hope that it will be useful, but WITHOUT
++ * memory" operations.
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
 + * for more details.
 + */
 +
-+#ifndef NPCM7XX_EMC_H
++/*
-+#define NPCM7XX_EMC_H
++ * Emit code to store the sysreg to its final destination; frees the
-+
++ * TCG temp 'value' it is passed.
-+#include "hw/irq.h"
++ */
-+#include "hw/sysbus.h"
++typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
-+#include "net/net.h"
++/*
-+
++ * Emit code to load the value to be copied to the sysreg; returns
-+/* 32-bit register indices. */
++ * a new TCG temporary
-+enum NPCM7xxPWMRegister {
++ */
-+    /* Control registers. */
++typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
-+    REG_CAMCMR,
++
-+    REG_CAMEN,
++/* Common decode/access checks for fp sysreg read/write */
-+
++typedef enum FPSysRegCheckResult {
-+    /* There are 16 CAMn[ML] registers. */
++    FPSysRegCheckFailed, /* caller should return false */
-+    REG_CAMM_BASE,
++    FPSysRegCheckDone, /* caller should return true */
-+    REG_CAML_BASE,
++    FPSysRegCheckContinue, /* caller should continue generating code */
-+    REG_CAMML_LAST = 0x21,
++} FPSysRegCheckResult;
 +
-+    REG_TXDLSA = 0x22,
++static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
-+    REG_RXDLSA,
++{
-+    REG_MCMDR,
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-+    REG_MIID,
++        return FPSysRegCheckFailed;
-+    REG_MIIDA,
++    }
-+    REG_FFTCR,
++
-+    REG_TSDR,
++    switch (regno) {
-+    REG_RSDR,
++    case ARM_VFP_FPSCR:
-+    REG_DMARFC,
++    case QEMU_VFP_FPSCR_NZCV:
-+    REG_MIEN,
++        break;
-+
++    case ARM_VFP_FPSCR_NZCVQC:
-+    /* Status registers. */
++        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-+    REG_MISTA,
++            return FPSysRegCheckFailed;
-+    REG_MGSTA,
++        }
-+    REG_MPCNT,
++        break;
-+    REG_MRPC,
++    case ARM_VFP_FPCXT_S:
-+    REG_MRPCC,
++    case ARM_VFP_FPCXT_NS:
-+    REG_MREPC,
++        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-+    REG_DMARFS,
++            return FPSysRegCheckFailed;
-+    REG_CTXDSA,
++        }
-+    REG_CTXBSA,
++        if (!s->v8m_secure) {
-+    REG_CRXDSA,
++            return FPSysRegCheckFailed;
-+    REG_CRXBSA,
++        }
-+
++        break;
-+    NPCM7XX_NUM_EMC_REGS,
++    case ARM_VFP_VPR:
-+};
++    case ARM_VFP_P0:
-+
++        if (!dc_isar_feature(aa32_mve, s)) {
-+/* REG_CAMCMR fields */
++            return FPSysRegCheckFailed;
-+/* Enable CAM Compare */
++        }
-+#define REG_CAMCMR_ECMP (1 << 4)
++        break;
-+/* Complement CAM Compare */
++    default:
-+#define REG_CAMCMR_CCAM (1 << 3)
++        return FPSysRegCheckFailed;
-+/* Accept Broadcast Packet */
++    }
 +#define REG_CAMCMR_ABP (1 << 2)
 +/* Accept Multicast Packet */
 +#define REG_CAMCMR_AMP (1 << 1)
 +/* Accept Unicast Packet */
 +#define REG_CAMCMR_AUP (1 << 0)
 +
 +/* REG_MCMDR fields */
 +/* Software Reset */
 +#define REG_MCMDR_SWR (1 << 24)
 +/* Internal Loopback Select */
 +#define REG_MCMDR_LBK (1 << 21)
 +/* Operation Mode Select */
 +#define REG_MCMDR_OPMOD (1 << 20)
 +/* Enable MDC Clock Generation */
 +#define REG_MCMDR_ENMDC (1 << 19)
 +/* Full-Duplex Mode Select */
 +#define REG_MCMDR_FDUP (1 << 18)
 +/* Enable SQE Checking */
 +#define REG_MCMDR_ENSEQ (1 << 17)
 +/* Send PAUSE Frame */
 +#define REG_MCMDR_SDPZ (1 << 16)
 +/* No Defer */
 +#define REG_MCMDR_NDEF (1 << 9)
 +/* Frame Transmission On */
 +#define REG_MCMDR_TXON (1 << 8)
 +/* Strip CRC Checksum */
 +#define REG_MCMDR_SPCRC (1 << 5)
 +/* Accept CRC Error Packet */
 +#define REG_MCMDR_AEP (1 << 4)
 +/* Accept Control Packet */
 +#define REG_MCMDR_ACP (1 << 3)
 +/* Accept Runt Packet */
 +#define REG_MCMDR_ARP (1 << 2)
 +/* Accept Long Packet */
 +#define REG_MCMDR_ALP (1 << 1)
 +/* Frame Reception On */
 +#define REG_MCMDR_RXON (1 << 0)
 +
 +/* REG_MIEN fields */
 +/* Enable Transmit Descriptor Unavailable Interrupt */
 +#define REG_MIEN_ENTDU (1 << 23)
 +/* Enable Transmit Completion Interrupt */
 +#define REG_MIEN_ENTXCP (1 << 18)
 +/* Enable Transmit Interrupt */
 +#define REG_MIEN_ENTXINTR (1 << 16)
 +/* Enable Receive Descriptor Unavailable Interrupt */
 +#define REG_MIEN_ENRDU (1 << 10)
 +/* Enable Receive Good Interrupt */
 +#define REG_MIEN_ENRXGD (1 << 4)
 +/* Enable Receive Interrupt */
 +#define REG_MIEN_ENRXINTR (1 << 0)
 +
 +/* REG_MISTA fields */
 +/* TODO: Add error fields and support simulated errors? */
 +/* Transmit Bus Error Interrupt */
 +#define REG_MISTA_TXBERR (1 << 24)
 +/* Transmit Descriptor Unavailable Interrupt */
 +#define REG_MISTA_TDU (1 << 23)
 +/* Transmit Completion Interrupt */
 +#define REG_MISTA_TXCP (1 << 18)
 +/* Transmit Interrupt */
 +#define REG_MISTA_TXINTR (1 << 16)
 +/* Receive Bus Error Interrupt */
 +#define REG_MISTA_RXBERR (1 << 11)
 +/* Receive Descriptor Unavailable Interrupt */
 +#define REG_MISTA_RDU (1 << 10)
 +/* DMA Early Notification Interrupt */
 +#define REG_MISTA_DENI (1 << 9)
 +/* Maximum Frame Length Interrupt */
 +#define REG_MISTA_DFOI (1 << 8)
 +/* Receive Good Interrupt */
 +#define REG_MISTA_RXGD (1 << 4)
 +/* Packet Too Long Interrupt */
 +#define REG_MISTA_PTLE (1 << 3)
 +/* Receive Interrupt */
 +#define REG_MISTA_RXINTR (1 << 0)
 +
 +/* REG_MGSTA fields */
 +/* Transmission Halted */
 +#define REG_MGSTA_TXHA (1 << 11)
 +/* Receive Halted */
 +#define REG_MGSTA_RXHA (1 << 11)
 +
 +/* REG_DMARFC fields */
 +/* Maximum Receive Frame Length */
 +#define REG_DMARFC_RXMS(word) extract32((word), 0, 16)
 +
 +/* REG MIIDA fields */
 +/* Busy Bit */
 +#define REG_MIIDA_BUSY (1 << 17)
 +
 +/* Transmit and receive descriptors */
 +typedef struct NPCM7xxEMCTxDesc NPCM7xxEMCTxDesc;
 +typedef struct NPCM7xxEMCRxDesc NPCM7xxEMCRxDesc;
 +
 +struct NPCM7xxEMCTxDesc {
 +    uint32_t flags;
 +    uint32_t txbsa;
 +    uint32_t status_and_length;
 +    uint32_t ntxdsa;
 +};
 +
 +struct NPCM7xxEMCRxDesc {
 +    uint32_t status_and_length;
 +    uint32_t rxbsa;
 +    uint32_t reserved;
 +    uint32_t nrxdsa;
 +};
 +
 +/* NPCM7xxEMCTxDesc.flags values */
 +/* Owner: 0 = cpu, 1 = emc */
 +#define TX_DESC_FLAG_OWNER_MASK (1 << 31)
 +/* Transmit interrupt enable */
 +#define TX_DESC_FLAG_INTEN (1 << 2)
 +/* CRC append */
 +#define TX_DESC_FLAG_CRCAPP (1 << 1)
 +/* Padding enable */
 +#define TX_DESC_FLAG_PADEN (1 << 0)
 +
 +/* NPCM7xxEMCTxDesc.status_and_length values */
 +/* Collision count */
 +#define TX_DESC_STATUS_CCNT_SHIFT 28
 +#define TX_DESC_STATUS_CCNT_BITSIZE 4
 +/* SQE error */
 +#define TX_DESC_STATUS_SQE (1 << 26)
 +/* Transmission paused */
 +#define TX_DESC_STATUS_PAU (1 << 25)
 +/* P transmission halted */
 +#define TX_DESC_STATUS_TXHA (1 << 24)
 +/* Late collision */
 +#define TX_DESC_STATUS_LC (1 << 23)
 +/* Transmission abort */
 +#define TX_DESC_STATUS_TXABT (1 << 22)
 +/* No carrier sense */
 +#define TX_DESC_STATUS_NCS (1 << 21)
 +/* Defer exceed */
 +#define TX_DESC_STATUS_EXDEF (1 << 20)
 +/* Transmission complete */
 +#define TX_DESC_STATUS_TXCP (1 << 19)
 +/* Transmission deferred */
 +#define TX_DESC_STATUS_DEF (1 << 17)
 +/* Transmit interrupt */
 +#define TX_DESC_STATUS_TXINTR (1 << 16)
 +
 +#define TX_DESC_PKT_LEN(word) extract32((word), 0, 16)
 +
 +/* Transmit buffer start address */
 +#define TX_DESC_TXBSA(word) ((uint32_t) (word) & ~3u)
 +
 +/* Next transmit descriptor start address */
 +#define TX_DESC_NTXDSA(word) ((uint32_t) (word) & ~3u)
 +
 +/* NPCM7xxEMCRxDesc.status_and_length values */
 +/* Owner: 0b00 = cpu, 0b01 = undefined, 0b10 = emc, 0b11 = undefined */
 +#define RX_DESC_STATUS_OWNER_SHIFT 30
 +#define RX_DESC_STATUS_OWNER_BITSIZE 2
 +#define RX_DESC_STATUS_OWNER_MASK (3 << RX_DESC_STATUS_OWNER_SHIFT)
 +/* Runt packet */
 +#define RX_DESC_STATUS_RP (1 << 22)
 +/* Alignment error */
 +#define RX_DESC_STATUS_ALIE (1 << 21)
 +/* Frame reception complete */
 +#define RX_DESC_STATUS_RXGD (1 << 20)
 +/* Packet too long */
 +#define RX_DESC_STATUS_PTLE (1 << 19)
 +/* CRC error */
 +#define RX_DESC_STATUS_CRCE (1 << 17)
 +/* Receive interrupt */
 +#define RX_DESC_STATUS_RXINTR (1 << 16)
 +
 +#define RX_DESC_PKT_LEN(word) extract32((word), 0, 16)
 +
 +/* Receive buffer start address */
 +#define RX_DESC_RXBSA(word) ((uint32_t) (word) & ~3u)
 +
 +/* Next receive descriptor start address */
 +#define RX_DESC_NRXDSA(word) ((uint32_t) (word) & ~3u)
 +
 +/* Minimum packet length, when TX_DESC_FLAG_PADEN is set. */
 +#define MIN_PACKET_LENGTH 64
 +
 +struct NPCM7xxEMCState {
 +    /*< private >*/
 +    SysBusDevice parent;
 +    /*< public >*/
 +
 +    MemoryRegion iomem;
 +
 +    qemu_irq tx_irq;
 +    qemu_irq rx_irq;
 +
 +    NICState *nic;
 +    NICConf conf;
 +
 +    /* 0 or 1, for log messages */
 +    uint8_t emc_num;
 +
 +    uint32_t regs[NPCM7XX_NUM_EMC_REGS];
 +
 +    /*
-+     * tx is active. Set to true by TSDR and then switches off when out of
++     * FPCXT_NS is a special case: it has specific handling for
-+     * descriptors. If the TXON bit in REG_MCMDR is off then this is off.
++     * "current FP state is inactive", and must do the PreserveFPState()
 +     * but not the usual full set of actions done by ExecuteFPCheck().
 +     * So we don't call vfp_access_check() and the callers must handle this.
 +     */
-+    bool tx_active;
++    if (regno != ARM_VFP_FPCXT_NS && !vfp_access_check(s)) {
-+
++        return FPSysRegCheckDone;
-+    /*
++    }
-+     * rx is active. Set to true by RSDR and then switches off when out of
++    return FPSysRegCheckContinue;
 +     * descriptors. If the RXON bit in REG_MCMDR is off then this is off.
 +     */
 +    bool rx_active;
 +};
 +
 +typedef struct NPCM7xxEMCState NPCM7xxEMCState;
 +
 +#define TYPE_NPCM7XX_EMC "npcm7xx-emc"
 +#define NPCM7XX_EMC(obj) \
 +    OBJECT_CHECK(NPCM7xxEMCState, (obj), TYPE_NPCM7XX_EMC)
 +
 +#endif /* NPCM7XX_EMC_H */
 diff --git a/hw/net/npcm7xx_emc.c b/hw/net/npcm7xx_emc.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/net/npcm7xx_emc.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Nuvoton NPCM7xx EMC Module
 + *
 + * Copyright 2020 Google LLC
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License as published by the
 + * Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
 + * for more details.
 + *
 + * Unsupported/unimplemented features:
 + * - MCMDR.FDUP (full duplex) is ignored, half duplex is not supported
 + * - Only CAM0 is supported, CAM[1-15] are not
 + *   - writes to CAMEN.[1-15] are ignored, these bits always read as zeroes
 + * - MII is not implemented, MIIDA.BUSY and MIID always return zero
 + * - MCMDR.LBK is not implemented
 + * - MCMDR.{OPMOD,ENSQE,AEP,ARP} are not supported
 + * - H/W FIFOs are not supported, MCMDR.FFTCR is ignored
 + * - MGSTA.SQE is not supported
 + * - pause and control frames are not implemented
 + * - MGSTA.CCNT is not supported
 + * - MPCNT, DMARFS are not implemented
 + */
 +
 +#include "qemu/osdep.h"
 +
 +/* For crc32 */
 +#include <zlib.h>
 +
 +#include "qemu-common.h"
 +#include "hw/irq.h"
 +#include "hw/qdev-clock.h"
 +#include "hw/qdev-properties.h"
 +#include "hw/net/npcm7xx_emc.h"
 +#include "net/eth.h"
 +#include "migration/vmstate.h"
 +#include "qemu/bitops.h"
 +#include "qemu/error-report.h"
 +#include "qemu/log.h"
 +#include "qemu/module.h"
 +#include "qemu/units.h"
 +#include "sysemu/dma.h"
 +#include "trace.h"
 +
 +#define CRC_LENGTH 4
 +
 +/*
 + * The maximum size of a (layer 2) ethernet frame as defined by 802.3.
 + * 1518 = 6(dest macaddr) + 6(src macaddr) + 2(proto) + 4(crc) + 1500(payload)
 + * This does not include an additional 4 for the vlan field (802.1q).
 + */
 +#define MAX_ETH_FRAME_SIZE 1518
 +
 +static const char *emc_reg_name(int regno)
 +{
 +#define REG(name) case REG_ ## name: return #name;
 +    switch (regno) {
 +    REG(CAMCMR)
 +    REG(CAMEN)
 +    REG(TXDLSA)
 +    REG(RXDLSA)
 +    REG(MCMDR)
 +    REG(MIID)
 +    REG(MIIDA)
 +    REG(FFTCR)
 +    REG(TSDR)
 +    REG(RSDR)
 +    REG(DMARFC)
 +    REG(MIEN)
 +    REG(MISTA)
 +    REG(MGSTA)
 +    REG(MPCNT)
 +    REG(MRPC)
 +    REG(MRPCC)
 +    REG(MREPC)
 +    REG(DMARFS)
 +    REG(CTXDSA)
 +    REG(CTXBSA)
 +    REG(CRXDSA)
 +    REG(CRXBSA)
 +    case REG_CAMM_BASE + 0: return "CAM0M";
 +    case REG_CAML_BASE + 0: return "CAM0L";
 +    case REG_CAMM_BASE + 2 ... REG_CAMML_LAST:
 +        /* Only CAM0 is supported, fold the others into something simple. */
 +        if (regno & 1) {
 +            return "CAM<n>L";
 +        } else {
 +            return "CAM<n>M";
 +        }
 +    default: return "UNKNOWN";
 +    }
 +#undef REG
 +}
 +
-+static void emc_reset(NPCM7xxEMCState *emc)
++static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
-+{
++                                  TCGLabel *label)
 +    trace_npcm7xx_emc_reset(emc->emc_num);
 +
 +    memset(&emc->regs[0], 0, sizeof(emc->regs));
 +
 +    /* These regs have non-zero reset values. */
 +    emc->regs[REG_TXDLSA] = 0xfffffffc;
 +    emc->regs[REG_RXDLSA] = 0xfffffffc;
 +    emc->regs[REG_MIIDA] = 0x00900000;
 +    emc->regs[REG_FFTCR] = 0x0101;
 +    emc->regs[REG_DMARFC] = 0x0800;
 +    emc->regs[REG_MPCNT] = 0x7fff;
 +
 +    emc->tx_active = false;
 +    emc->rx_active = false;
 +}
 +
 +static void npcm7xx_emc_reset(DeviceState *dev)
 +{
 +    NPCM7xxEMCState *emc = NPCM7XX_EMC(dev);
 +    emc_reset(emc);
 +}
 +
 +static void emc_soft_reset(NPCM7xxEMCState *emc)
 +{
 +    /*
-+     * The docs say at least MCMDR.{LBK,OPMOD} bits are not changed during a
++     * FPCXT_NS is a special case: it has specific handling for
-+     * soft reset, but does not go into further detail. For now, KISS.
++     * "current FP state is inactive", and must do the PreserveFPState()
 +     * but not the usual full set of actions done by ExecuteFPCheck().
 +     * We don't have a TB flag that matches the fpInactive check, so we
 +     * do it at runtime as we don't expect FPCXT_NS accesses to be frequent.
 +     *
 +     * Emit code that checks fpInactive and does a conditional
 +     * branch to label based on it:
 +     *  if cond is TCG_COND_NE then branch if fpInactive != 0 (ie if inactive)
 +     *  if cond is TCG_COND_EQ then branch if fpInactive == 0 (ie if active)
 +     */
-+    uint32_t mcmdr = emc->regs[REG_MCMDR];
++    assert(cond == TCG_COND_EQ || cond == TCG_COND_NE);
-+    emc_reset(emc);
++
-+    emc->regs[REG_MCMDR] = mcmdr & (REG_MCMDR_LBK | REG_MCMDR_OPMOD);
++    /* fpInactive = FPCCR_NS.ASPEN == 1 && CONTROL.FPCA == 0 */
-+
++    TCGv_i32 aspen, fpca;
-+    qemu_set_irq(emc->tx_irq, 0);
++    aspen = load_cpu_field(v7m.fpccr[M_REG_NS]);
-+    qemu_set_irq(emc->rx_irq, 0);
++    fpca = load_cpu_field(v7m.control[M_REG_S]);
 +    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 +    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 +    tcg_gen_andi_i32(fpca, fpca, R_V7M_CONTROL_FPCA_MASK);
 +    tcg_gen_or_i32(fpca, fpca, aspen);
 +    tcg_gen_brcondi_i32(tcg_invert_cond(cond), fpca, 0, label);
 +    tcg_temp_free_i32(aspen);
 +    tcg_temp_free_i32(fpca);
 +}
 +
-+static void emc_set_link(NetClientState *nc)
++static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
 +                                  fp_sysreg_loadfn *loadfn,
 +                                  void *opaque)
 +{
-+    /* Nothing to do yet. */
++    /* Do a write to an M-profile floating point system register */
-+}
++    TCGv_i32 tmp;
-+
++    TCGLabel *lab_end = NULL;
-+/* MISTA.TXINTR is the union of the individual bits with their enables. */
++
-+static void emc_update_mista_txintr(NPCM7xxEMCState *emc)
++    switch (fp_sysreg_checks(s, regno)) {
-+{
++    case FPSysRegCheckFailed:
-+    /* Only look at the bits we support. */
++        return false;
-+    uint32_t mask = (REG_MISTA_TXBERR |
++    case FPSysRegCheckDone:
-+                     REG_MISTA_TDU |
++        return true;
-+                     REG_MISTA_TXCP);
++    case FPSysRegCheckContinue:
-+    if (emc->regs[REG_MISTA] & emc->regs[REG_MIEN] & mask) {
++        break;
-+        emc->regs[REG_MISTA] |= REG_MISTA_TXINTR;
++    }
-+    } else {
++
-+        emc->regs[REG_MISTA] &= ~REG_MISTA_TXINTR;
++    switch (regno) {
-+    }
++    case ARM_VFP_FPSCR:
-+}
++        tmp = loadfn(s, opaque);
-+
++        gen_helper_vfp_set_fpscr(cpu_env, tmp);
-+/* MISTA.RXINTR is the union of the individual bits with their enables. */
++        tcg_temp_free_i32(tmp);
-+static void emc_update_mista_rxintr(NPCM7xxEMCState *emc)
++        gen_lookup_tb(s);
-+{
++        break;
-+    /* Only look at the bits we support. */
++    case ARM_VFP_FPSCR_NZCVQC:
-+    uint32_t mask = (REG_MISTA_RXBERR |
++    {
-+                     REG_MISTA_RDU |
++        TCGv_i32 fpscr;
-+                     REG_MISTA_RXGD);
++        tmp = loadfn(s, opaque);
-+    if (emc->regs[REG_MISTA] & emc->regs[REG_MIEN] & mask) {
++        if (dc_isar_feature(aa32_mve, s)) {
-+        emc->regs[REG_MISTA] |= REG_MISTA_RXINTR;
++            /* QC is only present for MVE; otherwise RES0 */
-+    } else {
++            TCGv_i32 qc = tcg_temp_new_i32();
-+        emc->regs[REG_MISTA] &= ~REG_MISTA_RXINTR;
++            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
-+    }
++            /*
-+}
++             * The 4 vfp.qc[] fields need only be "zero" vs "non-zero";
-+
++             * here writing the same value into all elements is simplest.
-+/* N.B. emc_update_mista_txintr must have already been called. */
++             */
-+static void emc_update_tx_irq(NPCM7xxEMCState *emc)
++            tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
-+{
++                                 16, 16, qc);
-+    int level = !!(emc->regs[REG_MISTA] &
++        }
-+                   emc->regs[REG_MIEN] &
++        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
-+                   REG_MISTA_TXINTR);
++        fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
-+    trace_npcm7xx_emc_update_tx_irq(level);
++        tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
-+    qemu_set_irq(emc->tx_irq, level);
++        tcg_gen_or_i32(fpscr, fpscr, tmp);
-+}
++        store_cpu_field(fpscr, vfp.xregs[ARM_VFP_FPSCR]);
-+
++        tcg_temp_free_i32(tmp);
-+/* N.B. emc_update_mista_rxintr must have already been called. */
++        break;
-+static void emc_update_rx_irq(NPCM7xxEMCState *emc)
++    }
-+{
++    case ARM_VFP_FPCXT_NS:
-+    int level = !!(emc->regs[REG_MISTA] &
++        lab_end = gen_new_label();
-+                   emc->regs[REG_MIEN] &
++        /* fpInactive case: write is a NOP, so branch to end */
-+                   REG_MISTA_RXINTR);
++        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
 +    trace_npcm7xx_emc_update_rx_irq(level);
 +    qemu_set_irq(emc->rx_irq, level);
 +}
 +
 +/* Update IRQ states due to changes in MIEN,MISTA. */
 +static void emc_update_irq_from_reg_change(NPCM7xxEMCState *emc)
 +{
 +    emc_update_mista_txintr(emc);
 +    emc_update_tx_irq(emc);
 +
 +    emc_update_mista_rxintr(emc);
 +    emc_update_rx_irq(emc);
 +}
 +
 +static int emc_read_tx_desc(dma_addr_t addr, NPCM7xxEMCTxDesc *desc)
 +{
 +    if (dma_memory_read(&address_space_memory, addr, desc, sizeof(*desc))) {
 +        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
 +                      HWADDR_PRIx "\n", __func__, addr);
 +        return -1;
 +    }
 +    desc->flags = le32_to_cpu(desc->flags);
 +    desc->txbsa = le32_to_cpu(desc->txbsa);
 +    desc->status_and_length = le32_to_cpu(desc->status_and_length);
 +    desc->ntxdsa = le32_to_cpu(desc->ntxdsa);
 +    return 0;
 +}
 +
 +static int emc_write_tx_desc(const NPCM7xxEMCTxDesc *desc, dma_addr_t addr)
 +{
 +    NPCM7xxEMCTxDesc le_desc;
 +
 +    le_desc.flags = cpu_to_le32(desc->flags);
 +    le_desc.txbsa = cpu_to_le32(desc->txbsa);
 +    le_desc.status_and_length = cpu_to_le32(desc->status_and_length);
 +    le_desc.ntxdsa = cpu_to_le32(desc->ntxdsa);
 +    if (dma_memory_write(&address_space_memory, addr, &le_desc,
 +                         sizeof(le_desc))) {
 +        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to write descriptor @ 0x%"
 +                      HWADDR_PRIx "\n", __func__, addr);
 +        return -1;
 +    }
 +    return 0;
 +}
 +
 +static int emc_read_rx_desc(dma_addr_t addr, NPCM7xxEMCRxDesc *desc)
 +{
 +    if (dma_memory_read(&address_space_memory, addr, desc, sizeof(*desc))) {
 +        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
 +                      HWADDR_PRIx "\n", __func__, addr);
 +        return -1;
 +    }
 +    desc->status_and_length = le32_to_cpu(desc->status_and_length);
 +    desc->rxbsa = le32_to_cpu(desc->rxbsa);
 +    desc->reserved = le32_to_cpu(desc->reserved);
 +    desc->nrxdsa = le32_to_cpu(desc->nrxdsa);
 +    return 0;
 +}
 +
 +static int emc_write_rx_desc(const NPCM7xxEMCRxDesc *desc, dma_addr_t addr)
 +{
 +    NPCM7xxEMCRxDesc le_desc;
 +
 +    le_desc.status_and_length = cpu_to_le32(desc->status_and_length);
 +    le_desc.rxbsa = cpu_to_le32(desc->rxbsa);
 +    le_desc.reserved = cpu_to_le32(desc->reserved);
 +    le_desc.nrxdsa = cpu_to_le32(desc->nrxdsa);
 +    if (dma_memory_write(&address_space_memory, addr, &le_desc,
 +                         sizeof(le_desc))) {
 +        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to write descriptor @ 0x%"
 +                      HWADDR_PRIx "\n", __func__, addr);
 +        return -1;
 +    }
 +    return 0;
 +}
 +
 +static void emc_set_mista(NPCM7xxEMCState *emc, uint32_t flags)
 +{
 +    trace_npcm7xx_emc_set_mista(flags);
 +    emc->regs[REG_MISTA] |= flags;
 +    if (extract32(flags, 16, 16)) {
 +        emc_update_mista_txintr(emc);
 +    }
 +    if (extract32(flags, 0, 16)) {
 +        emc_update_mista_rxintr(emc);
 +    }
 +}
 +
 +static void emc_halt_tx(NPCM7xxEMCState *emc, uint32_t mista_flag)
 +{
 +    emc->tx_active = false;
 +    emc_set_mista(emc, mista_flag);
 +}
 +
 +static void emc_halt_rx(NPCM7xxEMCState *emc, uint32_t mista_flag)
 +{
 +    emc->rx_active = false;
 +    emc_set_mista(emc, mista_flag);
 +}
 +
 +static void emc_set_next_tx_descriptor(NPCM7xxEMCState *emc,
 +                                       const NPCM7xxEMCTxDesc *tx_desc,
 +                                       uint32_t desc_addr)
 +{
 +    /* Update the current descriptor, if only to reset the owner flag. */
 +    if (emc_write_tx_desc(tx_desc, desc_addr)) {
 +        /*
-+         * We just read it so this shouldn't generally happen.
++         * !fpInactive: if FPU disabled, take NOCP exception;
-+         * Error already reported.
++         * otherwise PreserveFPState(), and then FPCXT_NS writes
 +         * behave the same as FPCXT_S writes.
 +         */
-+        emc_set_mista(emc, REG_MISTA_TXBERR);
++        if (s->fp_excp_el) {
-+    }
++            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-+    emc->regs[REG_CTXDSA] = TX_DESC_NTXDSA(tx_desc->ntxdsa);
++                               syn_uncategorized(), s->fp_excp_el);
-+}
++            /*
-+
++             * This was only a conditional exception, so override
-+static void emc_set_next_rx_descriptor(NPCM7xxEMCState *emc,
++             * gen_exception_insn()'s default to DISAS_NORETURN
-+                                       const NPCM7xxEMCRxDesc *rx_desc,
++             */
-+                                       uint32_t desc_addr)
++            s->base.is_jmp = DISAS_NEXT;
-+{
++            break;
-+    /* Update the current descriptor, if only to reset the owner flag. */
++        }
-+    if (emc_write_rx_desc(rx_desc, desc_addr)) {
++        gen_preserve_fp_state(s);
 +        /* fall through */
 +    case ARM_VFP_FPCXT_S:
 +    {
 +        TCGv_i32 sfpa, control;
 +        /*
-+         * We just read it so this shouldn't generally happen.
++         * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
-+         * Error already reported.
++         * bits [27:0] from value and zeroes bits [31:28].
 +         */
-+        emc_set_mista(emc, REG_MISTA_RXBERR);
++        tmp = loadfn(s, opaque);
-+    }
++        sfpa = tcg_temp_new_i32();
-+    emc->regs[REG_CRXDSA] = RX_DESC_NRXDSA(rx_desc->nrxdsa);
++        tcg_gen_shri_i32(sfpa, tmp, 31);
-+}
++        control = load_cpu_field(v7m.control[M_REG_S]);
-+
++        tcg_gen_deposit_i32(control, control, sfpa,
-+static void emc_try_send_next_packet(NPCM7xxEMCState *emc)
++                            R_V7M_CONTROL_SFPA_SHIFT, 1);
-+{
++        store_cpu_field(control, v7m.control[M_REG_S]);
-+    /* Working buffer for sending out packets. Most packets fit in this. */
++        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
-+#define TX_BUFFER_SIZE 2048
++        gen_helper_vfp_set_fpscr(cpu_env, tmp);
-+    uint8_t tx_send_buffer[TX_BUFFER_SIZE];
++        tcg_temp_free_i32(tmp);
-+    uint32_t desc_addr = TX_DESC_NTXDSA(emc->regs[REG_CTXDSA]);
++        tcg_temp_free_i32(sfpa);
-+    NPCM7xxEMCTxDesc tx_desc;
++        break;
-+    uint32_t next_buf_addr, length;
++    }
-+    uint8_t *buf;
++    case ARM_VFP_VPR:
-+    g_autofree uint8_t *malloced_buf = NULL;
++        /* Behaves as NOP if not privileged */
-+
++        if (IS_USER(s)) {
-+    if (emc_read_tx_desc(desc_addr, &tx_desc)) {
++            break;
 +        /* Error reading descriptor, already reported. */
 +        emc_halt_tx(emc, REG_MISTA_TXBERR);
 +        emc_update_tx_irq(emc);
 +        return;
 +    }
 +
 +    /* Nothing we can do if we don't own the descriptor. */
 +    if (!(tx_desc.flags & TX_DESC_FLAG_OWNER_MASK)) {
 +        trace_npcm7xx_emc_cpu_owned_desc(desc_addr);
 +        emc_halt_tx(emc, REG_MISTA_TDU);
 +        emc_update_tx_irq(emc);
 +        return;
 +     }
 +
 +    /* Give the descriptor back regardless of what happens. */
 +    tx_desc.flags &= ~TX_DESC_FLAG_OWNER_MASK;
 +    tx_desc.status_and_length &= 0xffff;
 +
 +    /*
 +     * Despite the h/w documentation saying the tx buffer is word aligned,
 +     * the linux driver does not word align the buffer. There is value in not
 +     * aligning the buffer: See the description of NET_IP_ALIGN in linux
 +     * kernel sources.
 +     */
 +    next_buf_addr = tx_desc.txbsa;
 +    emc->regs[REG_CTXBSA] = next_buf_addr;
 +    length = TX_DESC_PKT_LEN(tx_desc.status_and_length);
 +    buf = &tx_send_buffer[0];
 +
 +    if (length > sizeof(tx_send_buffer)) {
 +        malloced_buf = g_malloc(length);
 +        buf = malloced_buf;
 +    }
 +
 +    if (dma_memory_read(&address_space_memory, next_buf_addr, buf, length)) {
 +        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read packet @ 0x%x\n",
 +                      __func__, next_buf_addr);
 +        emc_set_mista(emc, REG_MISTA_TXBERR);
 +        emc_set_next_tx_descriptor(emc, &tx_desc, desc_addr);
 +        emc_update_tx_irq(emc);
 +        trace_npcm7xx_emc_tx_done(emc->regs[REG_CTXDSA]);
 +        return;
 +    }
 +
 +    if ((tx_desc.flags & TX_DESC_FLAG_PADEN) && (length < MIN_PACKET_LENGTH)) {
 +        memset(buf + length, 0, MIN_PACKET_LENGTH - length);
 +        length = MIN_PACKET_LENGTH;
 +    }
 +
 +    /* N.B. emc_receive can get called here. */
 +    qemu_send_packet(qemu_get_queue(emc->nic), buf, length);
 +    trace_npcm7xx_emc_sent_packet(length);
 +
 +    tx_desc.status_and_length |= TX_DESC_STATUS_TXCP;
 +    if (tx_desc.flags & TX_DESC_FLAG_INTEN) {
 +        emc_set_mista(emc, REG_MISTA_TXCP);
 +    }
 +    if (emc->regs[REG_MISTA] & emc->regs[REG_MIEN] & REG_MISTA_TXINTR) {
 +        tx_desc.status_and_length |= TX_DESC_STATUS_TXINTR;
 +    }
 +
 +    emc_set_next_tx_descriptor(emc, &tx_desc, desc_addr);
 +    emc_update_tx_irq(emc);
 +    trace_npcm7xx_emc_tx_done(emc->regs[REG_CTXDSA]);
 +}
 +
 +static bool emc_can_receive(NetClientState *nc)
 +{
 +    NPCM7xxEMCState *emc = NPCM7XX_EMC(qemu_get_nic_opaque(nc));
 +
 +    bool can_receive = emc->rx_active;
 +    trace_npcm7xx_emc_can_receive(can_receive);
 +    return can_receive;
 +}
 +
 +/* If result is false then *fail_reason contains the reason. */
 +static bool emc_receive_filter1(NPCM7xxEMCState *emc, const uint8_t *buf,
 +                                size_t len, const char **fail_reason)
 +{
 +    eth_pkt_types_e pkt_type = get_eth_packet_type(PKT_GET_ETH_HDR(buf));
 +
 +    switch (pkt_type) {
 +    case ETH_PKT_BCAST:
 +        if (emc->regs[REG_CAMCMR] & REG_CAMCMR_CCAM) {
 +            return true;
 +        } else {
 +            *fail_reason = "Broadcast packet disabled";
 +            return !!(emc->regs[REG_CAMCMR] & REG_CAMCMR_ABP);
 +        }
-+    case ETH_PKT_MCAST:
++        tmp = loadfn(s, opaque);
-+        if (emc->regs[REG_CAMCMR] & REG_CAMCMR_CCAM) {
++        store_cpu_field(tmp, v7m.vpr);
-+            return true;
++        break;
-+        } else {
++    case ARM_VFP_P0:
-+            *fail_reason = "Multicast packet disabled";
++    {
-+            return !!(emc->regs[REG_CAMCMR] & REG_CAMCMR_AMP);
++        TCGv_i32 vpr;
-+        }
++        tmp = loadfn(s, opaque);
-+    case ETH_PKT_UCAST: {
++        vpr = load_cpu_field(v7m.vpr);
-+        bool matches;
++        tcg_gen_deposit_i32(vpr, vpr, tmp,
-+        if (emc->regs[REG_CAMCMR] & REG_CAMCMR_AUP) {
++                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
-+            return true;
++        store_cpu_field(vpr, v7m.vpr);
-+        }
++        tcg_temp_free_i32(tmp);
-+        matches = ((emc->regs[REG_CAMCMR] & REG_CAMCMR_ECMP) &&
++        break;
 +                   /* We only support one CAM register, CAM0. */
 +                   (emc->regs[REG_CAMEN] & (1 << 0)) &&
 +                   memcmp(buf, emc->conf.macaddr.a, ETH_ALEN) == 0);
 +        if (emc->regs[REG_CAMCMR] & REG_CAMCMR_CCAM) {
 +            *fail_reason = "MACADDR matched, comparison complemented";
 +            return !matches;
 +        } else {
 +            *fail_reason = "MACADDR didn't match";
 +            return matches;
 +        }
 +    }
 +    default:
 +        g_assert_not_reached();
 +    }
++    if (lab_end) {
++        gen_set_label(lab_end);
++    }
++    return true;
 +}
 +
-+static bool emc_receive_filter(NPCM7xxEMCState *emc, const uint8_t *buf,
++static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
-+                               size_t len)
++                                 fp_sysreg_storefn *storefn,
 +                                 void *opaque)
 +{
-+    const char *fail_reason = NULL;
++    /* Do a read from an M-profile floating point system register */
-+    bool ok = emc_receive_filter1(emc, buf, len, &fail_reason);
++    TCGv_i32 tmp;
-+    if (!ok) {
++    TCGLabel *lab_end = NULL;
-+        trace_npcm7xx_emc_packet_filtered_out(fail_reason);
++    bool lookup_tb = false;
-+    }
++
-+    return ok;
++    switch (fp_sysreg_checks(s, regno)) {
-+}
++    case FPSysRegCheckFailed:
-+
++        return false;
-+static ssize_t emc_receive(NetClientState *nc, const uint8_t *buf, size_t len1)
++    case FPSysRegCheckDone:
-+{
++        return true;
-+    NPCM7xxEMCState *emc = NPCM7XX_EMC(qemu_get_nic_opaque(nc));
++    case FPSysRegCheckContinue:
-+    const uint32_t len = len1;
++        break;
-+    size_t max_frame_len;
++    }
-+    bool long_frame;
++
-+    uint32_t desc_addr;
++    if (regno == ARM_VFP_FPSCR_NZCVQC && !dc_isar_feature(aa32_mve, s)) {
-+    NPCM7xxEMCRxDesc rx_desc;
++        /* QC is RES0 without MVE, so NZCVQC simplifies to NZCV */
-+    uint32_t crc;
++        regno = QEMU_VFP_FPSCR_NZCV;
-+    uint8_t *crc_ptr;
++    }
-+    uint32_t buf_addr;
++
-+
++    switch (regno) {
-+    trace_npcm7xx_emc_receiving_packet(len);
++    case ARM_VFP_FPSCR:
-+
++        tmp = tcg_temp_new_i32();
-+    if (!emc_can_receive(nc)) {
++        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Unexpected packet\n", __func__);
++        storefn(s, opaque, tmp);
-+        return -1;
++        break;
-+    }
++    case ARM_VFP_FPSCR_NZCVQC:
-+
++        tmp = tcg_temp_new_i32();
-+    if (len < ETH_HLEN ||
++        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-+        /* Defensive programming: drop unsupportable large packets. */
++        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
-+        len > 0xffff - CRC_LENGTH) {
++        storefn(s, opaque, tmp);
-+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Dropped frame of %u bytes\n",
++        break;
-+                      __func__, len);
++    case QEMU_VFP_FPSCR_NZCV:
 +        return len;
 +    }
 +
 +    /*
 +     * DENI is set if EMC received the Length/Type field of the incoming
 +     * packet, so it will be set regardless of what happens next.
 +     */
 +    emc_set_mista(emc, REG_MISTA_DENI);
 +
 +    if (!emc_receive_filter(emc, buf, len)) {
 +        emc_update_rx_irq(emc);
 +        return len;
 +    }
 +
 +    /* Huge frames (> DMARFC) are dropped. */
 +    max_frame_len = REG_DMARFC_RXMS(emc->regs[REG_DMARFC]);
 +    if (len + CRC_LENGTH > max_frame_len) {
 +        trace_npcm7xx_emc_packet_dropped(len);
 +        emc_set_mista(emc, REG_MISTA_DFOI);
 +        emc_update_rx_irq(emc);
 +        return len;
 +    }
 +
 +    /*
 +     * Long Frames (> MAX_ETH_FRAME_SIZE) are also dropped, unless MCMDR.ALP
 +     * is set.
 +     */
 +    long_frame = false;
 +    if (len + CRC_LENGTH > MAX_ETH_FRAME_SIZE) {
 +        if (emc->regs[REG_MCMDR] & REG_MCMDR_ALP) {
 +            long_frame = true;
 +        } else {
 +            trace_npcm7xx_emc_packet_dropped(len);
 +            emc_set_mista(emc, REG_MISTA_PTLE);
 +            emc_update_rx_irq(emc);
 +            return len;
 +        }
 +    }
 +
 +    desc_addr = RX_DESC_NRXDSA(emc->regs[REG_CRXDSA]);
 +    if (emc_read_rx_desc(desc_addr, &rx_desc)) {
 +        /* Error reading descriptor, already reported. */
 +        emc_halt_rx(emc, REG_MISTA_RXBERR);
 +        emc_update_rx_irq(emc);
 +        return len;
 +    }
 +
 +    /* Nothing we can do if we don't own the descriptor. */
 +    if (!(rx_desc.status_and_length & RX_DESC_STATUS_OWNER_MASK)) {
 +        trace_npcm7xx_emc_cpu_owned_desc(desc_addr);
 +        emc_halt_rx(emc, REG_MISTA_RDU);
 +        emc_update_rx_irq(emc);
 +        return len;
 +    }
 +
 +    crc = 0;
 +    crc_ptr = (uint8_t *) &crc;
 +    if (!(emc->regs[REG_MCMDR] & REG_MCMDR_SPCRC)) {
 +        crc = cpu_to_be32(crc32(~0, buf, len));
 +    }
 +
 +    /* Give the descriptor back regardless of what happens. */
 +    rx_desc.status_and_length &= ~RX_DESC_STATUS_OWNER_MASK;
 +
 +    buf_addr = rx_desc.rxbsa;
 +    emc->regs[REG_CRXBSA] = buf_addr;
 +    if (dma_memory_write(&address_space_memory, buf_addr, buf, len) ||
 +        (!(emc->regs[REG_MCMDR] & REG_MCMDR_SPCRC) &&
 +         dma_memory_write(&address_space_memory, buf_addr + len, crc_ptr,
 +                          4))) {
 +        qemu_log_mask(LOG_GUEST_ERROR, "%s: Bus error writing packet\n",
 +                      __func__);
 +        emc_set_mista(emc, REG_MISTA_RXBERR);
 +        emc_set_next_rx_descriptor(emc, &rx_desc, desc_addr);
 +        emc_update_rx_irq(emc);
 +        trace_npcm7xx_emc_rx_done(emc->regs[REG_CRXDSA]);
 +        return len;
 +    }
 +
 +    trace_npcm7xx_emc_received_packet(len);
 +
 +    /* Note: We've already verified len+4 <= 0xffff. */
 +    rx_desc.status_and_length = len;
 +    if (!(emc->regs[REG_MCMDR] & REG_MCMDR_SPCRC)) {
 +        rx_desc.status_and_length += 4;
 +    }
 +    rx_desc.status_and_length |= RX_DESC_STATUS_RXGD;
 +    emc_set_mista(emc, REG_MISTA_RXGD);
 +
 +    if (emc->regs[REG_MISTA] & emc->regs[REG_MIEN] & REG_MISTA_RXINTR) {
 +        rx_desc.status_and_length |= RX_DESC_STATUS_RXINTR;
 +    }
 +    if (long_frame) {
 +        rx_desc.status_and_length |= RX_DESC_STATUS_PTLE;
 +    }
 +
 +    emc_set_next_rx_descriptor(emc, &rx_desc, desc_addr);
 +    emc_update_rx_irq(emc);
 +    trace_npcm7xx_emc_rx_done(emc->regs[REG_CRXDSA]);
 +    return len;
 +}
 +
 +static void emc_try_receive_next_packet(NPCM7xxEMCState *emc)
 +{
 +    if (emc_can_receive(qemu_get_queue(emc->nic))) {
 +        qemu_flush_queued_packets(qemu_get_queue(emc->nic));
 +    }
 +}
 +
 +static uint64_t npcm7xx_emc_read(void *opaque, hwaddr offset, unsigned size)
 +{
 +    NPCM7xxEMCState *emc = opaque;
 +    uint32_t reg = offset / sizeof(uint32_t);
 +    uint32_t result;
 +
 +    if (reg >= NPCM7XX_NUM_EMC_REGS) {
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "%s: Invalid offset 0x%04" HWADDR_PRIx "\n",
 +                      __func__, offset);
 +        return 0;
 +    }
 +
 +    switch (reg) {
 +    case REG_MIID:
 +        /*
-+         * We don't implement MII. For determinism, always return zero as
++         * Read just NZCV; this is a special case to avoid the
-+         * writes record the last value written for debugging purposes.
++         * helper call for the "VMRS to CPSR.NZCV" insn.
 +         */
-+        qemu_log_mask(LOG_UNIMP, "%s: Read of MIID, returning 0\n", __func__);
++        tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
-+        result = 0;
++        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
-+        break;
++        storefn(s, opaque, tmp);
-+    case REG_TSDR:
++        break;
-+    case REG_RSDR:
++    case ARM_VFP_FPCXT_S:
-+        qemu_log_mask(LOG_GUEST_ERROR,
++    {
-+                      "%s: Read of write-only reg, %s/%d\n",
++        TCGv_i32 control, sfpa, fpscr;
-+                      __func__, emc_reg_name(reg), reg);
++        /* Bits [27:0] from FPSCR, bit [31] from CONTROL.SFPA */
-+        return 0;
++        tmp = tcg_temp_new_i32();
-+    default:
++        sfpa = tcg_temp_new_i32();
-+        result = emc->regs[reg];
++        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-+        break;
++        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
-+    }
++        control = load_cpu_field(v7m.control[M_REG_S]);
-+
++        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
-+    trace_npcm7xx_emc_reg_read(emc->emc_num, result, emc_reg_name(reg), reg);
++        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
-+    return result;
++        tcg_gen_or_i32(tmp, tmp, sfpa);
-+}
++        tcg_temp_free_i32(sfpa);
-+
++        /*
-+static void npcm7xx_emc_write(void *opaque, hwaddr offset,
++         * Store result before updating FPSCR etc, in case
-+                              uint64_t v, unsigned size)
++         * it is a memory write which causes an exception.
-+{
++         */
-+    NPCM7xxEMCState *emc = opaque;
++        storefn(s, opaque, tmp);
-+    uint32_t reg = offset / sizeof(uint32_t);
++        /*
-+    uint32_t value = v;
++         * Now we must reset FPSCR from FPDSCR_NS, and clear
-+
++         * CONTROL.SFPA; so we'll end the TB here.
-+    g_assert(size == sizeof(uint32_t));
++         */
-+
++        tcg_gen_andi_i32(control, control, ~R_V7M_CONTROL_SFPA_MASK);
-+    if (reg >= NPCM7XX_NUM_EMC_REGS) {
++        store_cpu_field(control, v7m.control[M_REG_S]);
-+        qemu_log_mask(LOG_GUEST_ERROR,
++        fpscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-+                      "%s: Invalid offset 0x%04" HWADDR_PRIx "\n",
++        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-+                      __func__, offset);
++        tcg_temp_free_i32(fpscr);
-+        return;
++        lookup_tb = true;
-+    }
++        break;
-+
++    }
-+    trace_npcm7xx_emc_reg_write(emc->emc_num, emc_reg_name(reg), reg, value);
++    case ARM_VFP_FPCXT_NS:
-+
++    {
-+    switch (reg) {
++        TCGv_i32 control, sfpa, fpscr, fpdscr, zero;
-+    case REG_CAMCMR:
++        TCGLabel *lab_active = gen_new_label();
-+        emc->regs[reg] = value;
++
-+        break;
++        lookup_tb = true;
-+    case REG_CAMEN:
++
-+        /* Only CAM0 is supported, don't pretend otherwise. */
++        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
-+        if (value & ~1) {
++        /* fpInactive case: reads as FPDSCR_NS */
-+            qemu_log_mask(LOG_GUEST_ERROR,
++        TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-+                          "%s: Only CAM0 is supported, cannot enable others"
++        storefn(s, opaque, tmp);
-+                          ": 0x%x\n",
++        lab_end = gen_new_label();
-+                          __func__, value);
++        tcg_gen_br(lab_end);
-+        }
++
-+        emc->regs[reg] = value & 1;
++        gen_set_label(lab_active);
-+        break;
++        /*
-+    case REG_CAMM_BASE + 0:
++         * !fpInactive: if FPU disabled, take NOCP exception;
-+        emc->regs[reg] = value;
++         * otherwise PreserveFPState(), and then FPCXT_NS
-+        emc->conf.macaddr.a[0] = value >> 24;
++         * reads the same as FPCXT_S.
-+        emc->conf.macaddr.a[1] = value >> 16;
++         */
-+        emc->conf.macaddr.a[2] = value >> 8;
++        if (s->fp_excp_el) {
-+        emc->conf.macaddr.a[3] = value >> 0;
++            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-+        break;
++                               syn_uncategorized(), s->fp_excp_el);
-+    case REG_CAML_BASE + 0:
++            /*
-+        emc->regs[reg] = value;
++             * This was only a conditional exception, so override
-+        emc->conf.macaddr.a[4] = value >> 24;
++             * gen_exception_insn()'s default to DISAS_NORETURN
-+        emc->conf.macaddr.a[5] = value >> 16;
++             */
-+        break;
++            s->base.is_jmp = DISAS_NEXT;
 +    case REG_MCMDR: {
 +        uint32_t prev;
 +        if (value & REG_MCMDR_SWR) {
 +            emc_soft_reset(emc);
 +            /* On h/w the reset happens over multiple cycles. For now KISS. */
 +            break;
 +        }
-+        prev = emc->regs[reg];
++        gen_preserve_fp_state(s);
-+        emc->regs[reg] = value;
++        tmp = tcg_temp_new_i32();
-+        /* Update tx state. */
++        sfpa = tcg_temp_new_i32();
-+        if (!(prev & REG_MCMDR_TXON) &&
++        fpscr = tcg_temp_new_i32();
-+            (value & REG_MCMDR_TXON)) {
++        gen_helper_vfp_get_fpscr(fpscr, cpu_env);
-+            emc->regs[REG_CTXDSA] = emc->regs[REG_TXDLSA];
++        tcg_gen_andi_i32(tmp, fpscr, ~FPCR_NZCV_MASK);
-+            /*
++        control = load_cpu_field(v7m.control[M_REG_S]);
-+             * Linux kernel turns TX on with CPU still holding descriptor,
++        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
-+             * which suggests we should wait for a write to TSDR before trying
++        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
-+             * to send a packet: so we don't send one here.
++        tcg_gen_or_i32(tmp, tmp, sfpa);
-+             */
++        tcg_temp_free_i32(control);
-+        } else if ((prev & REG_MCMDR_TXON) &&
++        /* Store result before updating FPSCR, in case it faults */
-+                   !(value & REG_MCMDR_TXON)) {
++        storefn(s, opaque, tmp);
-+            emc->regs[REG_MGSTA] |= REG_MGSTA_TXHA;
++        /* If SFPA is zero then set FPSCR from FPDSCR_NS */
 +        fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
 +        zero = tcg_const_i32(0);
 +        tcg_gen_movcond_i32(TCG_COND_EQ, fpscr, sfpa, zero, fpdscr, fpscr);
 +        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 +        tcg_temp_free_i32(zero);
 +        tcg_temp_free_i32(sfpa);
 +        tcg_temp_free_i32(fpdscr);
 +        tcg_temp_free_i32(fpscr);
 +        break;
 +    }
 +    case ARM_VFP_VPR:
 +        /* Behaves as NOP if not privileged */
 +        if (IS_USER(s)) {
 +            break;
 +        }
-+        if (!(value & REG_MCMDR_TXON)) {
++        tmp = load_cpu_field(v7m.vpr);
-+            emc_halt_tx(emc, 0);
++        storefn(s, opaque, tmp);
 +        break;
 +    case ARM_VFP_P0:
 +        tmp = load_cpu_field(v7m.vpr);
 +        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
 +        storefn(s, opaque, tmp);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    if (lab_end) {
 +        gen_set_label(lab_end);
 +    }
 +    if (lookup_tb) {
 +        gen_lookup_tb(s);
 +    }
 +    return true;
 +}
 +
 +static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
 +{
 +    arg_VMSR_VMRS *a = opaque;
 +
 +    if (a->rt == 15) {
 +        /* Set the 4 flag bits in the CPSR */
 +        gen_set_nzcv(value);
 +        tcg_temp_free_i32(value);
 +    } else {
 +        store_reg(s, a->rt, value);
 +    }
 +}
 +
 +static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
 +{
 +    arg_VMSR_VMRS *a = opaque;
 +
 +    return load_reg(s, a->rt);
 +}
 +
 +static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
 +{
 +    /*
 +     * Accesses to R15 are UNPREDICTABLE; we choose to undef.
 +     * FPSCR -> r15 is a special case which writes to the PSR flags;
 +     * set a->reg to a special value to tell gen_M_fp_sysreg_read()
 +     * we only care about the top 4 bits of FPSCR there.
 +     */
 +    if (a->rt == 15) {
 +        if (a->l && a->reg == ARM_VFP_FPSCR) {
 +            a->reg = QEMU_VFP_FPSCR_NZCV;
 +        } else {
 +            return false;
 +        }
-+        /* Update rx state. */
++    }
-+        if (!(prev & REG_MCMDR_RXON) &&
++
-+            (value & REG_MCMDR_RXON)) {
++    if (a->l) {
-+            emc->regs[REG_CRXDSA] = emc->regs[REG_RXDLSA];
++        /* VMRS, move FP system register to gp register */
-+        } else if ((prev & REG_MCMDR_RXON) &&
++        return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_gpr, a);
-+                   !(value & REG_MCMDR_RXON)) {
++    } else {
-+            emc->regs[REG_MGSTA] |= REG_MGSTA_RXHA;
++        /* VMSR, move gp register to FP system register */
 +        return gen_M_fp_sysreg_write(s, a->reg, gpr_to_fp_sysreg, a);
 +    }
 +}
 +
 +static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
 +{
 +    arg_vldr_sysreg *a = opaque;
 +    uint32_t offset = a->imm;
 +    TCGv_i32 addr;
 +
 +    if (!a->a) {
 +        offset = -offset;
 +    }
 +
 +    addr = load_reg(s, a->rn);
 +    if (a->p) {
 +        tcg_gen_addi_i32(addr, addr, offset);
 +    }
 +
 +    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
 +        gen_helper_v8m_stackcheck(cpu_env, addr);
 +    }
 +
 +    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
 +                    MO_UL | MO_ALIGN | s->be_data);
 +    tcg_temp_free_i32(value);
 +
 +    if (a->w) {
 +        /* writeback */
 +        if (!a->p) {
 +            tcg_gen_addi_i32(addr, addr, offset);
 +        }
-+        if (!(value & REG_MCMDR_RXON)) {
++        store_reg(s, a->rn, addr);
-+            emc_halt_rx(emc, 0);
++    } else {
 +        tcg_temp_free_i32(addr);
 +    }
 +}
 +
 +static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
 +{
 +    arg_vldr_sysreg *a = opaque;
 +    uint32_t offset = a->imm;
 +    TCGv_i32 addr;
 +    TCGv_i32 value = tcg_temp_new_i32();
 +
 +    if (!a->a) {
 +        offset = -offset;
 +    }
 +
 +    addr = load_reg(s, a->rn);
 +    if (a->p) {
 +        tcg_gen_addi_i32(addr, addr, offset);
 +    }
 +
 +    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
 +        gen_helper_v8m_stackcheck(cpu_env, addr);
 +    }
 +
 +    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
 +                    MO_UL | MO_ALIGN | s->be_data);
 +
 +    if (a->w) {
 +        /* writeback */
 +        if (!a->p) {
 +            tcg_gen_addi_i32(addr, addr, offset);
 +        }
-+        break;
++        store_reg(s, a->rn, addr);
-+    }
++    } else {
-+    case REG_TXDLSA:
++        tcg_temp_free_i32(addr);
-+    case REG_RXDLSA:
++    }
-+    case REG_DMARFC:
++    return value;
 +    case REG_MIID:
 +        emc->regs[reg] = value;
 +        break;
 +    case REG_MIEN:
 +        emc->regs[reg] = value;
 +        emc_update_irq_from_reg_change(emc);
 +        break;
 +    case REG_MISTA:
 +        /* Clear the bits that have 1 in "value". */
 +        emc->regs[reg] &= ~value;
 +        emc_update_irq_from_reg_change(emc);
 +        break;
 +    case REG_MGSTA:
 +        /* Clear the bits that have 1 in "value". */
 +        emc->regs[reg] &= ~value;
 +        break;
 +    case REG_TSDR:
 +        if (emc->regs[REG_MCMDR] & REG_MCMDR_TXON) {
 +            emc->tx_active = true;
 +            /* Keep trying to send packets until we run out. */
 +            while (emc->tx_active) {
 +                emc_try_send_next_packet(emc);
 +            }
 +        }
 +        break;
 +    case REG_RSDR:
 +        if (emc->regs[REG_MCMDR] & REG_MCMDR_RXON) {
 +            emc->rx_active = true;
 +            emc_try_receive_next_packet(emc);
 +        }
 +        break;
 +    case REG_MIIDA:
 +        emc->regs[reg] = value & ~REG_MIIDA_BUSY;
 +        break;
 +    case REG_MRPC:
 +    case REG_MRPCC:
 +    case REG_MREPC:
 +    case REG_CTXDSA:
 +    case REG_CTXBSA:
 +    case REG_CRXDSA:
 +    case REG_CRXBSA:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "%s: Write to read-only reg %s/%d\n",
 +                      __func__, emc_reg_name(reg), reg);
 +        break;
 +    default:
 +        qemu_log_mask(LOG_UNIMP, "%s: Write to unimplemented reg %s/%d\n",
 +                      __func__, emc_reg_name(reg), reg);
 +        break;
 +    }
 +}
 +
-+static const struct MemoryRegionOps npcm7xx_emc_ops = {
++static bool trans_VLDR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
 +    .read = npcm7xx_emc_read,
 +    .write = npcm7xx_emc_write,
 +    .endianness = DEVICE_LITTLE_ENDIAN,
 +    .valid = {
 +        .min_access_size = 4,
 +        .max_access_size = 4,
 +        .unaligned = false,
 +    },
 +};
 +
 +static void emc_cleanup(NetClientState *nc)
 +{
-+    /* Nothing to do yet. */
++    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        return false;
 +    }
 +    if (a->rn == 15) {
 +        return false;
 +    }
 +    return gen_M_fp_sysreg_write(s, a->reg, memory_to_fp_sysreg, a);
 +}
 +
-+static NetClientInfo net_npcm7xx_emc_info = {
++static bool trans_VSTR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
 +    .type = NET_CLIENT_DRIVER_NIC,
 +    .size = sizeof(NICState),
 +    .can_receive = emc_can_receive,
 +    .receive = emc_receive,
 +    .cleanup = emc_cleanup,
 +    .link_status_changed = emc_set_link,
 +};
 +
 +static void npcm7xx_emc_realize(DeviceState *dev, Error **errp)
 +{
-+    NPCM7xxEMCState *emc = NPCM7XX_EMC(dev);
++    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-+    SysBusDevice *sbd = SYS_BUS_DEVICE(emc);
++        return false;
-+
++    }
-+    memory_region_init_io(&emc->iomem, OBJECT(emc), &npcm7xx_emc_ops, emc,
++    if (a->rn == 15) {
-+                          TYPE_NPCM7XX_EMC, 4 * KiB);
++        return false;
-+    sysbus_init_mmio(sbd, &emc->iomem);
++    }
-+    sysbus_init_irq(sbd, &emc->tx_irq);
++    return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_memory, a);
 +    sysbus_init_irq(sbd, &emc->rx_irq);
 +
 +    qemu_macaddr_default_if_unset(&emc->conf.macaddr);
 +    emc->nic = qemu_new_nic(&net_npcm7xx_emc_info, &emc->conf,
 +                            object_get_typename(OBJECT(dev)), dev->id, emc);
 +    qemu_format_nic_info_str(qemu_get_queue(emc->nic), emc->conf.macaddr.a);
 +}
 +
-+static void npcm7xx_emc_unrealize(DeviceState *dev)
+ static bool trans_NOCP(DisasContext *s, arg_nocp *a)
-+{
+ {
-+    NPCM7xxEMCState *emc = NPCM7XX_EMC(dev);
+     /*
-+
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 +    qemu_del_nic(emc->nic);
 +}
 +
 +static const VMStateDescription vmstate_npcm7xx_emc = {
 +    .name = TYPE_NPCM7XX_EMC,
 +    .version_id = 0,
 +    .minimum_version_id = 0,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_UINT8(emc_num, NPCM7xxEMCState),
 +        VMSTATE_UINT32_ARRAY(regs, NPCM7xxEMCState, NPCM7XX_NUM_EMC_REGS),
 +        VMSTATE_BOOL(tx_active, NPCM7xxEMCState),
 +        VMSTATE_BOOL(rx_active, NPCM7xxEMCState),
 +        VMSTATE_END_OF_LIST(),
 +    },
 +};
 +
 +static Property npcm7xx_emc_properties[] = {
 +    DEFINE_NIC_PROPERTIES(NPCM7xxEMCState, conf),
 +    DEFINE_PROP_END_OF_LIST(),
 +};
 +
 +static void npcm7xx_emc_class_init(ObjectClass *klass, void *data)
 +{
 +    DeviceClass *dc = DEVICE_CLASS(klass);
 +
 +    set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
 +    dc->desc = "NPCM7xx EMC Controller";
 +    dc->realize = npcm7xx_emc_realize;
 +    dc->unrealize = npcm7xx_emc_unrealize;
 +    dc->reset = npcm7xx_emc_reset;
 +    dc->vmsd = &vmstate_npcm7xx_emc;
 +    device_class_set_props(dc, npcm7xx_emc_properties);
 +}
 +
 +static const TypeInfo npcm7xx_emc_info = {
 +    .name = TYPE_NPCM7XX_EMC,
 +    .parent = TYPE_SYS_BUS_DEVICE,
 +    .instance_size = sizeof(NPCM7xxEMCState),
 +    .class_init = npcm7xx_emc_class_init,
 +};
 +
 +static void npcm7xx_emc_register_type(void)
 +{
 +    type_register_static(&npcm7xx_emc_info);
 +}
 +
 +type_init(npcm7xx_emc_register_type)
 diff --git a/hw/net/meson.build b/hw/net/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/meson.build
+--- a/target/arm/translate-vfp.c
-+++ b/hw/net/meson.build
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_I82596_COMMON', if_true: files('i82596.c'))
+@@ -XXX,XX +XXX,XX @@ static inline long vfp_f16_offset(unsigned reg, bool top)
- softmmu_ss.add(when: 'CONFIG_SUNHME', if_true: files('sunhme.c'))
+  * Generate code for M-profile lazy FP state preservation if needed;
- softmmu_ss.add(when: 'CONFIG_FTGMAC100', if_true: files('ftgmac100.c'))
+  * this corresponds to the pseudocode PreserveFPState() function.
- softmmu_ss.add(when: 'CONFIG_SUNGEM', if_true: files('sungem.c'))
+  */
-+softmmu_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c'))
+-static void gen_preserve_fp_state(DisasContext *s)
++void gen_preserve_fp_state(DisasContext *s)
- softmmu_ss.add(when: 'CONFIG_ETRAXFS', if_true: files('etraxfs_eth.c'))
+ {
- softmmu_ss.add(when: 'CONFIG_COLDFIRE', if_true: files('mcf_fec.c'))
+     if (s->v7m_lspact) {
-diff --git a/hw/net/trace-events b/hw/net/trace-events
+         /*
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
---- a/hw/net/trace-events
+     return true;
-+++ b/hw/net/trace-events
+ }
-@@ -XXX,XX +XXX,XX @@ imx_fec_receive_last(int last) "rx frame flags 0x%04x"
- imx_enet_receive(size_t size) "len %zu"
+-/*
- imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
+- * M-profile provides two different sets of instructions that can
- imx_enet_receive_last(int last) "rx frame flags 0x%04x"
+- * access floating point system registers: VMSR/VMRS (which move
-+
+- * to/from a general purpose register) and VLDR/VSTR sysreg (which
-+# npcm7xx_emc.c
+- * move directly to/from memory). In some cases there are also side
-+npcm7xx_emc_reset(int emc_num) "Resetting emc%d"
+- * effects which must happen after any write to memory (which could
-+npcm7xx_emc_update_tx_irq(int level) "Setting tx irq to %d"
+- * cause an exception). So we implement the common logic for the
-+npcm7xx_emc_update_rx_irq(int level) "Setting rx irq to %d"
+- * sysreg access in gen_M_fp_sysreg_write() and gen_M_fp_sysreg_read(),
-+npcm7xx_emc_set_mista(uint32_t flags) "ORing 0x%x into MISTA"
+- * which take pointers to callback functions which will perform the
-+npcm7xx_emc_cpu_owned_desc(uint32_t addr) "Can't process cpu-owned descriptor @0x%x"
+- * actual "read/write general purpose register" and "read/write
-+npcm7xx_emc_sent_packet(uint32_t len) "Sent %u byte packet"
+- * memory" operations.
-+npcm7xx_emc_tx_done(uint32_t ctxdsa) "TX done, CTXDSA=0x%x"
+- */
-+npcm7xx_emc_can_receive(int can_receive) "Can receive: %d"
+-
-+npcm7xx_emc_packet_filtered_out(const char* fail_reason) "Packet filtered out: %s"
+-/*
-+npcm7xx_emc_packet_dropped(uint32_t len) "%u byte packet dropped"
+- * Emit code to store the sysreg to its final destination; frees the
-+npcm7xx_emc_receiving_packet(uint32_t len) "Receiving %u byte packet"
+- * TCG temp 'value' it is passed.
-+npcm7xx_emc_received_packet(uint32_t len) "Received %u byte packet"
+- */
-+npcm7xx_emc_rx_done(uint32_t crxdsa) "RX done, CRXDSA=0x%x"
+-typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
-+npcm7xx_emc_reg_read(int emc_num, uint32_t result, const char *name, int regno) "emc%d: 0x%x = reg[%s/%d]"
+-/*
-+npcm7xx_emc_reg_write(int emc_num, const char *name, int regno, uint32_t value) "emc%d: reg[%s/%d] = 0x%x"
+- * Emit code to load the value to be copied to the sysreg; returns
 - * a new TCG temporary
 - */
 -typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
 -
 -/* Common decode/access checks for fp sysreg read/write */
 -typedef enum FPSysRegCheckResult {
 -    FPSysRegCheckFailed, /* caller should return false */
 -    FPSysRegCheckDone, /* caller should return true */
 -    FPSysRegCheckContinue, /* caller should continue generating code */
 -} FPSysRegCheckResult;
 -
 -static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
 -{
 -    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
 -        return FPSysRegCheckFailed;
 -    }
 -
 -    switch (regno) {
 -    case ARM_VFP_FPSCR:
 -    case QEMU_VFP_FPSCR_NZCV:
 -        break;
 -    case ARM_VFP_FPSCR_NZCVQC:
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 -            return FPSysRegCheckFailed;
 -        }
 -        break;
 -    case ARM_VFP_FPCXT_S:
 -    case ARM_VFP_FPCXT_NS:
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 -            return FPSysRegCheckFailed;
 -        }
 -        if (!s->v8m_secure) {
 -            return FPSysRegCheckFailed;
 -        }
 -        break;
 -    case ARM_VFP_VPR:
 -    case ARM_VFP_P0:
 -        if (!dc_isar_feature(aa32_mve, s)) {
 -            return FPSysRegCheckFailed;
 -        }
 -        break;
 -    default:
 -        return FPSysRegCheckFailed;
 -    }
 -
 -    /*
 -     * FPCXT_NS is a special case: it has specific handling for
 -     * "current FP state is inactive", and must do the PreserveFPState()
 -     * but not the usual full set of actions done by ExecuteFPCheck().
 -     * So we don't call vfp_access_check() and the callers must handle this.
 -     */
 -    if (regno != ARM_VFP_FPCXT_NS && !vfp_access_check(s)) {
 -        return FPSysRegCheckDone;
 -    }
 -    return FPSysRegCheckContinue;
 -}
 -
 -static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
 -                                  TCGLabel *label)
 -{
 -    /*
 -     * FPCXT_NS is a special case: it has specific handling for
 -     * "current FP state is inactive", and must do the PreserveFPState()
 -     * but not the usual full set of actions done by ExecuteFPCheck().
 -     * We don't have a TB flag that matches the fpInactive check, so we
 -     * do it at runtime as we don't expect FPCXT_NS accesses to be frequent.
 -     *
 -     * Emit code that checks fpInactive and does a conditional
 -     * branch to label based on it:
 -     *  if cond is TCG_COND_NE then branch if fpInactive != 0 (ie if inactive)
 -     *  if cond is TCG_COND_EQ then branch if fpInactive == 0 (ie if active)
 -     */
 -    assert(cond == TCG_COND_EQ || cond == TCG_COND_NE);
 -
 -    /* fpInactive = FPCCR_NS.ASPEN == 1 && CONTROL.FPCA == 0 */
 -    TCGv_i32 aspen, fpca;
 -    aspen = load_cpu_field(v7m.fpccr[M_REG_NS]);
 -    fpca = load_cpu_field(v7m.control[M_REG_S]);
 -    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 -    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 -    tcg_gen_andi_i32(fpca, fpca, R_V7M_CONTROL_FPCA_MASK);
 -    tcg_gen_or_i32(fpca, fpca, aspen);
 -    tcg_gen_brcondi_i32(tcg_invert_cond(cond), fpca, 0, label);
 -    tcg_temp_free_i32(aspen);
 -    tcg_temp_free_i32(fpca);
 -}
 -
 -static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
 -                                  fp_sysreg_loadfn *loadfn,
 -                                  void *opaque)
 -{
 -    /* Do a write to an M-profile floating point system register */
 -    TCGv_i32 tmp;
 -    TCGLabel *lab_end = NULL;
 -
 -    switch (fp_sysreg_checks(s, regno)) {
 -    case FPSysRegCheckFailed:
 -        return false;
 -    case FPSysRegCheckDone:
 -        return true;
 -    case FPSysRegCheckContinue:
 -        break;
 -    }
 -
 -    switch (regno) {
 -    case ARM_VFP_FPSCR:
 -        tmp = loadfn(s, opaque);
 -        gen_helper_vfp_set_fpscr(cpu_env, tmp);
 -        tcg_temp_free_i32(tmp);
 -        gen_lookup_tb(s);
 -        break;
 -    case ARM_VFP_FPSCR_NZCVQC:
 -    {
 -        TCGv_i32 fpscr;
 -        tmp = loadfn(s, opaque);
 -        if (dc_isar_feature(aa32_mve, s)) {
 -            /* QC is only present for MVE; otherwise RES0 */
 -            TCGv_i32 qc = tcg_temp_new_i32();
 -            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
 -            /*
 -             * The 4 vfp.qc[] fields need only be "zero" vs "non-zero";
 -             * here writing the same value into all elements is simplest.
 -             */
 -            tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
 -                                 16, 16, qc);
 -        }
 -        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
 -        fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
 -        tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
 -        tcg_gen_or_i32(fpscr, fpscr, tmp);
 -        store_cpu_field(fpscr, vfp.xregs[ARM_VFP_FPSCR]);
 -        tcg_temp_free_i32(tmp);
 -        break;
 -    }
 -    case ARM_VFP_FPCXT_NS:
 -        lab_end = gen_new_label();
 -        /* fpInactive case: write is a NOP, so branch to end */
 -        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
 -        /*
 -         * !fpInactive: if FPU disabled, take NOCP exception;
 -         * otherwise PreserveFPState(), and then FPCXT_NS writes
 -         * behave the same as FPCXT_S writes.
 -         */
 -        if (s->fp_excp_el) {
 -            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 -                               syn_uncategorized(), s->fp_excp_el);
 -            /*
 -             * This was only a conditional exception, so override
 -             * gen_exception_insn()'s default to DISAS_NORETURN
 -             */
 -            s->base.is_jmp = DISAS_NEXT;
 -            break;
 -        }
 -        gen_preserve_fp_state(s);
 -        /* fall through */
 -    case ARM_VFP_FPCXT_S:
 -    {
 -        TCGv_i32 sfpa, control;
 -        /*
 -         * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
 -         * bits [27:0] from value and zeroes bits [31:28].
 -         */
 -        tmp = loadfn(s, opaque);
 -        sfpa = tcg_temp_new_i32();
 -        tcg_gen_shri_i32(sfpa, tmp, 31);
 -        control = load_cpu_field(v7m.control[M_REG_S]);
 -        tcg_gen_deposit_i32(control, control, sfpa,
 -                            R_V7M_CONTROL_SFPA_SHIFT, 1);
 -        store_cpu_field(control, v7m.control[M_REG_S]);
 -        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
 -        gen_helper_vfp_set_fpscr(cpu_env, tmp);
 -        tcg_temp_free_i32(tmp);
 -        tcg_temp_free_i32(sfpa);
 -        break;
 -    }
 -    case ARM_VFP_VPR:
 -        /* Behaves as NOP if not privileged */
 -        if (IS_USER(s)) {
 -            break;
 -        }
 -        tmp = loadfn(s, opaque);
 -        store_cpu_field(tmp, v7m.vpr);
 -        break;
 -    case ARM_VFP_P0:
 -    {
 -        TCGv_i32 vpr;
 -        tmp = loadfn(s, opaque);
 -        vpr = load_cpu_field(v7m.vpr);
 -        tcg_gen_deposit_i32(vpr, vpr, tmp,
 -                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
 -        store_cpu_field(vpr, v7m.vpr);
 -        tcg_temp_free_i32(tmp);
 -        break;
 -    }
 -    default:
 -        g_assert_not_reached();
 -    }
 -    if (lab_end) {
 -        gen_set_label(lab_end);
 -    }
 -    return true;
 -}
 -
 -static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
 -                                 fp_sysreg_storefn *storefn,
 -                                 void *opaque)
 -{
 -    /* Do a read from an M-profile floating point system register */
 -    TCGv_i32 tmp;
 -    TCGLabel *lab_end = NULL;
 -    bool lookup_tb = false;
 -
 -    switch (fp_sysreg_checks(s, regno)) {
 -    case FPSysRegCheckFailed:
 -        return false;
 -    case FPSysRegCheckDone:
 -        return true;
 -    case FPSysRegCheckContinue:
 -        break;
 -    }
 -
 -    if (regno == ARM_VFP_FPSCR_NZCVQC && !dc_isar_feature(aa32_mve, s)) {
 -        /* QC is RES0 without MVE, so NZCVQC simplifies to NZCV */
 -        regno = QEMU_VFP_FPSCR_NZCV;
 -    }
 -
 -    switch (regno) {
 -    case ARM_VFP_FPSCR:
 -        tmp = tcg_temp_new_i32();
 -        gen_helper_vfp_get_fpscr(tmp, cpu_env);
 -        storefn(s, opaque, tmp);
 -        break;
 -    case ARM_VFP_FPSCR_NZCVQC:
 -        tmp = tcg_temp_new_i32();
 -        gen_helper_vfp_get_fpscr(tmp, cpu_env);
 -        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
 -        storefn(s, opaque, tmp);
 -        break;
 -    case QEMU_VFP_FPSCR_NZCV:
 -        /*
 -         * Read just NZCV; this is a special case to avoid the
 -         * helper call for the "VMRS to CPSR.NZCV" insn.
 -         */
 -        tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
 -        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
 -        storefn(s, opaque, tmp);
 -        break;
 -    case ARM_VFP_FPCXT_S:
 -    {
 -        TCGv_i32 control, sfpa, fpscr;
 -        /* Bits [27:0] from FPSCR, bit [31] from CONTROL.SFPA */
 -        tmp = tcg_temp_new_i32();
 -        sfpa = tcg_temp_new_i32();
 -        gen_helper_vfp_get_fpscr(tmp, cpu_env);
 -        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
 -        control = load_cpu_field(v7m.control[M_REG_S]);
 -        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
 -        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
 -        tcg_gen_or_i32(tmp, tmp, sfpa);
 -        tcg_temp_free_i32(sfpa);
 -        /*
 -         * Store result before updating FPSCR etc, in case
 -         * it is a memory write which causes an exception.
 -         */
 -        storefn(s, opaque, tmp);
 -        /*
 -         * Now we must reset FPSCR from FPDSCR_NS, and clear
 -         * CONTROL.SFPA; so we'll end the TB here.
 -         */
 -        tcg_gen_andi_i32(control, control, ~R_V7M_CONTROL_SFPA_MASK);
 -        store_cpu_field(control, v7m.control[M_REG_S]);
 -        fpscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
 -        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 -        tcg_temp_free_i32(fpscr);
 -        lookup_tb = true;
 -        break;
 -    }
 -    case ARM_VFP_FPCXT_NS:
 -    {
 -        TCGv_i32 control, sfpa, fpscr, fpdscr, zero;
 -        TCGLabel *lab_active = gen_new_label();
 -
 -        lookup_tb = true;
 -
 -        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
 -        /* fpInactive case: reads as FPDSCR_NS */
 -        TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
 -        storefn(s, opaque, tmp);
 -        lab_end = gen_new_label();
 -        tcg_gen_br(lab_end);
 -
 -        gen_set_label(lab_active);
 -        /*
 -         * !fpInactive: if FPU disabled, take NOCP exception;
 -         * otherwise PreserveFPState(), and then FPCXT_NS
 -         * reads the same as FPCXT_S.
 -         */
 -        if (s->fp_excp_el) {
 -            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 -                               syn_uncategorized(), s->fp_excp_el);
 -            /*
 -             * This was only a conditional exception, so override
 -             * gen_exception_insn()'s default to DISAS_NORETURN
 -             */
 -            s->base.is_jmp = DISAS_NEXT;
 -            break;
 -        }
 -        gen_preserve_fp_state(s);
 -        tmp = tcg_temp_new_i32();
 -        sfpa = tcg_temp_new_i32();
 -        fpscr = tcg_temp_new_i32();
 -        gen_helper_vfp_get_fpscr(fpscr, cpu_env);
 -        tcg_gen_andi_i32(tmp, fpscr, ~FPCR_NZCV_MASK);
 -        control = load_cpu_field(v7m.control[M_REG_S]);
 -        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
 -        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
 -        tcg_gen_or_i32(tmp, tmp, sfpa);
 -        tcg_temp_free_i32(control);
 -        /* Store result before updating FPSCR, in case it faults */
 -        storefn(s, opaque, tmp);
 -        /* If SFPA is zero then set FPSCR from FPDSCR_NS */
 -        fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
 -        zero = tcg_const_i32(0);
 -        tcg_gen_movcond_i32(TCG_COND_EQ, fpscr, sfpa, zero, fpdscr, fpscr);
 -        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 -        tcg_temp_free_i32(zero);
 -        tcg_temp_free_i32(sfpa);
 -        tcg_temp_free_i32(fpdscr);
 -        tcg_temp_free_i32(fpscr);
 -        break;
 -    }
 -    case ARM_VFP_VPR:
 -        /* Behaves as NOP if not privileged */
 -        if (IS_USER(s)) {
 -            break;
 -        }
 -        tmp = load_cpu_field(v7m.vpr);
 -        storefn(s, opaque, tmp);
 -        break;
 -    case ARM_VFP_P0:
 -        tmp = load_cpu_field(v7m.vpr);
 -        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
 -        storefn(s, opaque, tmp);
 -        break;
 -    default:
 -        g_assert_not_reached();
 -    }
 -
 -    if (lab_end) {
 -        gen_set_label(lab_end);
 -    }
 -    if (lookup_tb) {
 -        gen_lookup_tb(s);
 -    }
 -    return true;
 -}
 -
 -static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
 -{
 -    arg_VMSR_VMRS *a = opaque;
 -
 -    if (a->rt == 15) {
 -        /* Set the 4 flag bits in the CPSR */
 -        gen_set_nzcv(value);
 -        tcg_temp_free_i32(value);
 -    } else {
 -        store_reg(s, a->rt, value);
 -    }
 -}
 -
 -static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
 -{
 -    arg_VMSR_VMRS *a = opaque;
 -
 -    return load_reg(s, a->rt);
 -}
 -
 -static bool gen_M_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
 -{
 -    /*
 -     * Accesses to R15 are UNPREDICTABLE; we choose to undef.
 -     * FPSCR -> r15 is a special case which writes to the PSR flags;
 -     * set a->reg to a special value to tell gen_M_fp_sysreg_read()
 -     * we only care about the top 4 bits of FPSCR there.
 -     */
 -    if (a->rt == 15) {
 -        if (a->l && a->reg == ARM_VFP_FPSCR) {
 -            a->reg = QEMU_VFP_FPSCR_NZCV;
 -        } else {
 -            return false;
 -        }
 -    }
 -
 -    if (a->l) {
 -        /* VMRS, move FP system register to gp register */
 -        return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_gpr, a);
 -    } else {
 -        /* VMSR, move gp register to FP system register */
 -        return gen_M_fp_sysreg_write(s, a->reg, gpr_to_fp_sysreg, a);
 -    }
 -}
 -
  static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
  {
      TCGv_i32 tmp;
      bool ignore_vfp_enabled = false;
      if (arm_dc_feature(s, ARM_FEATURE_M)) {
 -        return gen_M_VMSR_VMRS(s, a);
 +        /* M profile version was already handled in m-nocp.decode */
 +        return false;
      }
      if (!dc_isar_feature(aa32_fpsp_v2, s)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
      return true;
  }
 -static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
 -{
 -    arg_vldr_sysreg *a = opaque;
 -    uint32_t offset = a->imm;
 -    TCGv_i32 addr;
 -
 -    if (!a->a) {
 -        offset = -offset;
 -    }
 -
 -    addr = load_reg(s, a->rn);
 -    if (a->p) {
 -        tcg_gen_addi_i32(addr, addr, offset);
 -    }
 -
 -    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
 -        gen_helper_v8m_stackcheck(cpu_env, addr);
 -    }
 -
 -    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
 -                    MO_UL | MO_ALIGN | s->be_data);
 -    tcg_temp_free_i32(value);
 -
 -    if (a->w) {
 -        /* writeback */
 -        if (!a->p) {
 -            tcg_gen_addi_i32(addr, addr, offset);
 -        }
 -        store_reg(s, a->rn, addr);
 -    } else {
 -        tcg_temp_free_i32(addr);
 -    }
 -}
 -
 -static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
 -{
 -    arg_vldr_sysreg *a = opaque;
 -    uint32_t offset = a->imm;
 -    TCGv_i32 addr;
 -    TCGv_i32 value = tcg_temp_new_i32();
 -
 -    if (!a->a) {
 -        offset = -offset;
 -    }
 -
 -    addr = load_reg(s, a->rn);
 -    if (a->p) {
 -        tcg_gen_addi_i32(addr, addr, offset);
 -    }
 -
 -    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
 -        gen_helper_v8m_stackcheck(cpu_env, addr);
 -    }
 -
 -    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
 -                    MO_UL | MO_ALIGN | s->be_data);
 -
 -    if (a->w) {
 -        /* writeback */
 -        if (!a->p) {
 -            tcg_gen_addi_i32(addr, addr, offset);
 -        }
 -        store_reg(s, a->rn, addr);
 -    } else {
 -        tcg_temp_free_i32(addr);
 -    }
 -    return value;
 -}
 -
 -static bool trans_VLDR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
 -{
 -    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 -        return false;
 -    }
 -    if (a->rn == 15) {
 -        return false;
 -    }
 -    return gen_M_fp_sysreg_write(s, a->reg, memory_to_fp_sysreg, a);
 -}
 -
 -static bool trans_VSTR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
 -{
 -    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 -        return false;
 -    }
 -    if (a->rn == 15) {
 -        return false;
 -    }
 -    return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_memory, a);
 -}
  static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
  {
 --
 .20.1

-New patch
+[PULL 08/57] target/arm: Handle writeback in VLDR/VSTR sysreg with no memory access
+A few subcases of VLDR/VSTR sysreg succeed but do not perform a
 memory access:
  * VSTR of VPR when unprivileged
  * VLDR to VPR when unprivileged
  * VLDR to FPCXT_NS when fpInactive
 In these cases, even though we don't do the memory access we should
 still update the base register and perform the stack limit check if
 the insn's addressing mode specifies writeback.  Our implementation
 failed to do this, because we handle these side-effects inside the
 memory_to_fp_sysreg() and fp_sysreg_to_memory() callback functions,
 which are only called if there's something to load or store.
 Fix this by adding an extra argument to the callbacks which is set to
 true to actually perform the access and false to only do side effects
 like writeback, and calling the callback with do_access = false
 for the three cases listed above.
 This produces slightly suboptimal code for the case of a write
 to FPCXT_NS when the FPU is inactive and the insn didn't have
 side effects (ie no writeback, or via VMSR), in which case we'll
 generate a conditional branch over an unconditional branch.
 But this doesn't seem to be important enough to merit requiring
 the callback to report back whether it generated any code or not.
 Cc: qemu-stable@nongnu.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210618141019.10671-5-peter.maydell@linaro.org
 ---
  target/arm/translate-m-nocp.c | 102 ++++++++++++++++++++++++----------
 file changed, 72 insertions(+), 30 deletions(-)
 diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-m-nocp.c
 +++ b/target/arm/translate-m-nocp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
  /*
   * Emit code to store the sysreg to its final destination; frees the
 - * TCG temp 'value' it is passed.
 + * TCG temp 'value' it is passed. do_access is true to do the store,
 + * and false to skip it and only perform side-effects like base
 + * register writeback.
   */
 -typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
 +typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value,
 +                               bool do_access);
  /*
   * Emit code to load the value to be copied to the sysreg; returns
 - * a new TCG temporary
 + * a new TCG temporary. do_access is true to do the store,
 + * and false to skip it and only perform side-effects like base
 + * register writeback.
   */
 -typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
 +typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque,
 +                                  bool do_access);
  /* Common decode/access checks for fp sysreg read/write */
  typedef enum FPSysRegCheckResult {
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
      switch (regno) {
      case ARM_VFP_FPSCR:
 -        tmp = loadfn(s, opaque);
 +        tmp = loadfn(s, opaque, true);
          gen_helper_vfp_set_fpscr(cpu_env, tmp);
          tcg_temp_free_i32(tmp);
          gen_lookup_tb(s);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
      case ARM_VFP_FPSCR_NZCVQC:
      {
          TCGv_i32 fpscr;
 -        tmp = loadfn(s, opaque);
 +        tmp = loadfn(s, opaque, true);
          if (dc_isar_feature(aa32_mve, s)) {
              /* QC is only present for MVE; otherwise RES0 */
              TCGv_i32 qc = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
          break;
      }
      case ARM_VFP_FPCXT_NS:
 +    {
 +        TCGLabel *lab_active = gen_new_label();
 +
          lab_end = gen_new_label();
 -        /* fpInactive case: write is a NOP, so branch to end */
 -        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
 +        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
 +        /*
 +         * fpInactive case: write is a NOP, so only do side effects
 +         * like register writeback before we branch to end
 +         */
 +        loadfn(s, opaque, false);
 +        tcg_gen_br(lab_end);
 +
 +        gen_set_label(lab_active);
          /*
           * !fpInactive: if FPU disabled, take NOCP exception;
           * otherwise PreserveFPState(), and then FPCXT_NS writes
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
              break;
          }
          gen_preserve_fp_state(s);
 -        /* fall through */
 +    }
 +    /* fall through */
      case ARM_VFP_FPCXT_S:
      {
          TCGv_i32 sfpa, control;
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
           * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
           * bits [27:0] from value and zeroes bits [31:28].
           */
 -        tmp = loadfn(s, opaque);
 +        tmp = loadfn(s, opaque, true);
          sfpa = tcg_temp_new_i32();
          tcg_gen_shri_i32(sfpa, tmp, 31);
          control = load_cpu_field(v7m.control[M_REG_S]);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
      case ARM_VFP_VPR:
          /* Behaves as NOP if not privileged */
          if (IS_USER(s)) {
 +            loadfn(s, opaque, false);
              break;
          }
 -        tmp = loadfn(s, opaque);
 +        tmp = loadfn(s, opaque, true);
          store_cpu_field(tmp, v7m.vpr);
          break;
      case ARM_VFP_P0:
      {
          TCGv_i32 vpr;
 -        tmp = loadfn(s, opaque);
 +        tmp = loadfn(s, opaque, true);
          vpr = load_cpu_field(v7m.vpr);
          tcg_gen_deposit_i32(vpr, vpr, tmp,
                              R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
      case ARM_VFP_FPSCR:
          tmp = tcg_temp_new_i32();
          gen_helper_vfp_get_fpscr(tmp, cpu_env);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          break;
      case ARM_VFP_FPSCR_NZCVQC:
          tmp = tcg_temp_new_i32();
          gen_helper_vfp_get_fpscr(tmp, cpu_env);
          tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          break;
      case QEMU_VFP_FPSCR_NZCV:
          /*
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
           */
          tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
          tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          break;
      case ARM_VFP_FPCXT_S:
      {
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
           * Store result before updating FPSCR etc, in case
           * it is a memory write which causes an exception.
           */
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          /*
           * Now we must reset FPSCR from FPDSCR_NS, and clear
           * CONTROL.SFPA; so we'll end the TB here.
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
          gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
          /* fpInactive case: reads as FPDSCR_NS */
          TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          lab_end = gen_new_label();
          tcg_gen_br(lab_end);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
          tcg_gen_or_i32(tmp, tmp, sfpa);
          tcg_temp_free_i32(control);
          /* Store result before updating FPSCR, in case it faults */
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          /* If SFPA is zero then set FPSCR from FPDSCR_NS */
          fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
          zero = tcg_const_i32(0);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
      case ARM_VFP_VPR:
          /* Behaves as NOP if not privileged */
          if (IS_USER(s)) {
 +            storefn(s, opaque, NULL, false);
              break;
          }
          tmp = load_cpu_field(v7m.vpr);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          break;
      case ARM_VFP_P0:
          tmp = load_cpu_field(v7m.vpr);
          tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
 -        storefn(s, opaque, tmp);
 +        storefn(s, opaque, tmp, true);
          break;
      default:
          g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
      return true;
  }
 -static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
 +static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value,
 +                             bool do_access)
  {
      arg_VMSR_VMRS *a = opaque;
 +    if (!do_access) {
 +        return;
 +    }
 +
      if (a->rt == 15) {
          /* Set the 4 flag bits in the CPSR */
          gen_set_nzcv(value);
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
      }
  }
 -static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
 +static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque, bool do_access)
  {
      arg_VMSR_VMRS *a = opaque;
 +    if (!do_access) {
 +        return NULL;
 +    }
      return load_reg(s, a->rt);
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
      }
  }
 -static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
 +static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value,
 +                                bool do_access)
  {
      arg_vldr_sysreg *a = opaque;
      uint32_t offset = a->imm;
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
          offset = -offset;
      }
 +    if (!do_access && !a->w) {
 +        return;
 +    }
 +
      addr = load_reg(s, a->rn);
      if (a->p) {
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
          gen_helper_v8m_stackcheck(cpu_env, addr);
      }
 -    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
 -                    MO_UL | MO_ALIGN | s->be_data);
 -    tcg_temp_free_i32(value);
 +    if (do_access) {
 +        gen_aa32_st_i32(s, value, addr, get_mem_index(s),
 +                        MO_UL | MO_ALIGN | s->be_data);
 +        tcg_temp_free_i32(value);
 +    }
      if (a->w) {
          /* writeback */
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
      }
  }
 -static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
 +static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque,
 +                                    bool do_access)
  {
      arg_vldr_sysreg *a = opaque;
      uint32_t offset = a->imm;
      TCGv_i32 addr;
 -    TCGv_i32 value = tcg_temp_new_i32();
 +    TCGv_i32 value = NULL;
      if (!a->a) {
          offset = -offset;
      }
 +    if (!do_access && !a->w) {
 +        return NULL;
 +    }
 +
      addr = load_reg(s, a->rn);
      if (a->p) {
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
          gen_helper_v8m_stackcheck(cpu_env, addr);
      }
 -    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
 -                    MO_UL | MO_ALIGN | s->be_data);
 +    if (do_access) {
 +        value = tcg_temp_new_i32();
 +        gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
 +                        MO_UL | MO_ALIGN | s->be_data);
 +    }
      if (a->w) {
          /* writeback */
 --
 .20.1

-[PULL 44/49] hw/arm/mps2-tz: Get armv7m_load_kernel() size argument from RAMInfo
+[PULL 09/57] target/arm: Factor FP context update code out into helper function
-The armv7m_load_kernel() function takes a mem_size argument which it
+Factor the code in full_vfp_access_check() which updates the
-expects to be the size of the memory region at guest address 0.  (It
+ownership of the FP context and creates a new FP context
-uses this argument only as a limit on how large a raw image file it
+out into its own function.
 can load at address zero).
 Instead of hardcoding this value, find the RAMInfo corresponding to
 the 0 address and extract its size.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-20-peter.maydell@linaro.org
+Message-id: 20210618141019.10671-6-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 17 ++++++++++++++++-
+ target/arm/translate-vfp.c | 104 +++++++++++++++++++++----------------
-file changed, 16 insertions(+), 1 deletion(-)
+file changed, 58 insertions(+), 46 deletions(-)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/translate-vfp.c
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static void create_non_mpc_ram(MPS2TZMachineState *mms)
+@@ -XXX,XX +XXX,XX @@ void gen_preserve_fp_state(DisasContext *s)
      }
  }
-+static uint32_t boot_ram_size(MPS2TZMachineState *mms)
++/*
 + * Generate code for M-profile FP context handling: update the
 + * ownership of the FP context, and create a new context if
 + * necessary. This corresponds to the parts of the pseudocode
 + * ExecuteFPCheck() after the inital PreserveFPState() call.
 + */
 +static void gen_update_fp_context(DisasContext *s)
 +{
-+    /* Return the size of the RAM block at guest address zero */
++    /* Update ownership of FP context: set FPCCR.S to match current state */
-+    const RAMInfo *p;
++    if (s->v8m_fpccr_s_wrong) {
-+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
++        TCGv_i32 tmp;
 +
-+    for (p = mmc->raminfo; p->name; p++) {
++        tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
-+        if (p->base == 0) {
++        if (s->v8m_secure) {
-+            return p->size;
++            tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
 +        } else {
 +            tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
 +        }
++        store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
++        /* Don't need to do this for any further FP insns in this TB */
++        s->v8m_fpccr_s_wrong = false;
 +    }
-+    g_assert_not_reached();
++
 +    if (s->v7m_new_fp_ctxt_needed) {
 +        /*
 +         * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA,
 +         * the FPSCR, and VPR.
 +         */
 +        TCGv_i32 control, fpscr;
 +        uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
 +
 +        fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
 +        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 +        tcg_temp_free_i32(fpscr);
 +        if (dc_isar_feature(aa32_mve, s)) {
 +            TCGv_i32 z32 = tcg_const_i32(0);
 +            store_cpu_field(z32, v7m.vpr);
 +        }
 +
 +        /*
 +         * We don't need to arrange to end the TB, because the only
 +         * parts of FPSCR which we cache in the TB flags are the VECLEN
 +         * and VECSTRIDE, and those don't exist for M-profile.
 +         */
 +
 +        if (s->v8m_secure) {
 +            bits |= R_V7M_CONTROL_SFPA_MASK;
 +        }
 +        control = load_cpu_field(v7m.control[M_REG_S]);
 +        tcg_gen_ori_i32(control, control, bits);
 +        store_cpu_field(control, v7m.control[M_REG_S]);
 +        /* Don't need to do this for any further FP insns in this TB */
 +        s->v7m_new_fp_ctxt_needed = false;
 +    }
 +}
 +
- static void mps2tz_common_init(MachineState *machine)
+ /*
- {
+  * Check that VFP access is enabled. If it is, do the necessary
-     MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
+  * M-profile lazy-FP handling and then return true.
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+         /* Trigger lazy-state preservation if necessary */
-     create_non_mpc_ram(mms);
+         gen_preserve_fp_state(s);
--    armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename, 0x400000);
+-        /* Update ownership of FP context: set FPCCR.S to match current state */
-+    armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename,
+-        if (s->v8m_fpccr_s_wrong) {
-+                       boot_ram_size(mms));
+-            TCGv_i32 tmp;
- }
+-
+-            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
- static void mps2_tz_idau_check(IDAUInterface *ii, uint32_t address,
+-            if (s->v8m_secure) {
 -                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
 -            } else {
 -                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
 -            }
 -            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
 -            /* Don't need to do this for any further FP insns in this TB */
 -            s->v8m_fpccr_s_wrong = false;
 -        }
 -
 -        if (s->v7m_new_fp_ctxt_needed) {
 -            /*
 -             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA,
 -             * the FPSCR, and VPR.
 -             */
 -            TCGv_i32 control, fpscr;
 -            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
 -
 -            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
 -            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
 -            tcg_temp_free_i32(fpscr);
 -            if (dc_isar_feature(aa32_mve, s)) {
 -                TCGv_i32 z32 = tcg_const_i32(0);
 -                store_cpu_field(z32, v7m.vpr);
 -            }
 -
 -            /*
 -             * We don't need to arrange to end the TB, because the only
 -             * parts of FPSCR which we cache in the TB flags are the VECLEN
 -             * and VECSTRIDE, and those don't exist for M-profile.
 -             */
 -
 -            if (s->v8m_secure) {
 -                bits |= R_V7M_CONTROL_SFPA_MASK;
 -            }
 -            control = load_cpu_field(v7m.control[M_REG_S]);
 -            tcg_gen_ori_i32(control, control, bits);
 -            store_cpu_field(control, v7m.control[M_REG_S]);
 -            /* Don't need to do this for any further FP insns in this TB */
 -            s->v7m_new_fp_ctxt_needed = false;
 -        }
 +        /* Update ownership of FP context and create new FP context if needed */
 +        gen_update_fp_context(s);
      }
      return true;
 --
 .20.1

-[PULL 37/49] hw/arm/mps2-tz: Allow PPCPortInfo structures to specify device interrupts
+[PULL 10/57] target/arm: Split vfp_access_check() into A and M versions
-The mps2-tz code uses PPCPortInfo data structures to define what
+vfp_access_check and its helper routine full_vfp_access_check() has
-devices are present and how they are wired up.  Currently we use
+gradually grown and is now an awkward mix of A-profile only and
-these to specify device types and addresses, but hard-code the
+M-profile only pieces.  Refactor it into an A-profile only and an
-interrupt line wiring in each make_* helper function.  This works for
+M-profile only version, taking advantage of the fact that now the
-the two boards we have at the moment, but the AN524 has some devices
+only direct call to full_vfp_access_check() is in A-profile-only
-with different interrupt assignments.
+code.
 This commit adds the framework to allow PPCPortInfo structures to
 specify interrupt numbers.  We add an array of interrupt numbers to
 the PPCPortInfo struct, and pass it through to the make_* helpers.
 The following commit will change the make_* helpers over to using the
 framework.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-13-peter.maydell@linaro.org
+Message-id: 20210618141019.10671-7-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 36 ++++++++++++++++++++++++------------
+ target/arm/translate-vfp.c | 79 +++++++++++++++++++++++---------------
-file changed, 24 insertions(+), 12 deletions(-)
+file changed, 48 insertions(+), 31 deletions(-)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/translate-vfp.c
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static qemu_irq get_sse_irq_in(MPS2TZMachineState *mms, int irqno)
+@@ -XXX,XX +XXX,XX @@ static void gen_update_fp_context(DisasContext *s)
-  * needs to be plugged into the downstream end of the PPC port.
+ }
  /*
 - * Check that VFP access is enabled. If it is, do the necessary
 - * M-profile lazy-FP handling and then return true.
 - * If not, emit code to generate an appropriate exception and
 - * return false.
 + * Check that VFP access is enabled, A-profile specific version.
 + *
 + * If VFP is enabled, return true. If not, emit code to generate an
 + * appropriate exception and return false.
   * The ignore_vfp_enabled argument specifies that we should ignore
 - * whether VFP is enabled via FPEXC[EN]: this should be true for FMXR/FMRX
 + * whether VFP is enabled via FPEXC.EN: this should be true for FMXR/FMRX
   * accesses to FPSID, FPEXC, MVFR0, MVFR1, MVFR2, and false for all other insns.
   */
- typedef MemoryRegion *MakeDevFn(MPS2TZMachineState *mms, void *opaque,
+-static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
--                                const char *name, hwaddr size);
++static bool vfp_access_check_a(DisasContext *s, bool ignore_vfp_enabled)
 +                                const char *name, hwaddr size,
 +                                const int *irqs);
  typedef struct PPCPortInfo {
      const char *name;
@@ -XXX,XX +XXX,XX @@ typedef struct PPCPortInfo {
      void *opaque;
      hwaddr addr;
      hwaddr size;
 +    int irqs[3]; /* currently no device needs more IRQ lines than this */
  } PPCPortInfo;
  typedef struct PPCInfo {
@@ -XXX,XX +XXX,XX @@ typedef struct PPCInfo {
  } PPCInfo;
  static MemoryRegion *make_unimp_dev(MPS2TZMachineState *mms,
 -                                       void *opaque,
 -                                       const char *name, hwaddr size)
 +                                    void *opaque,
 +                                    const char *name, hwaddr size,
 +                                    const int *irqs)
  {
-     /* Initialize, configure and realize a TYPE_UNIMPLEMENTED_DEVICE,
+     if (s->fp_excp_el) {
-      * and return a pointer to its MemoryRegion.
+-        if (arm_dc_feature(s, ARM_FEATURE_M)) {
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_unimp_dev(MPS2TZMachineState *mms,
+-            /*
 -             * M-profile mostly catches the "FPU disabled" case early, in
 -             * disas_m_nocp(), but a few insns (eg LCTP, WLSTP, DLSTP)
 -             * which do coprocessor-checks are outside the large ranges of
 -             * the encoding space handled by the patterns in m-nocp.decode,
 -             * and for them we may need to raise NOCP here.
 -             */
 -            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 -                               syn_uncategorized(), s->fp_excp_el);
 -        } else {
 -            gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
 -                               syn_fp_access_trap(1, 0xe, false),
 -                               s->fp_excp_el);
 -        }
 +        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
 +                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
          unallocated_encoding(s);
          return false;
      }
 +    return true;
 +}
 -    if (arm_dc_feature(s, ARM_FEATURE_M)) {
 -        /* Handle M-profile lazy FP state mechanics */
 -
 -        /* Trigger lazy-state preservation if necessary */
 -        gen_preserve_fp_state(s);
 -
 -        /* Update ownership of FP context and create new FP context if needed */
 -        gen_update_fp_context(s);
 +/*
 + * Check that VFP access is enabled, M-profile specific version.
 + *
 + * If VFP is enabled, do the necessary M-profile lazy-FP handling and then
 + * return true. If not, emit code to generate an appropriate exception and
 + * return false.
 + */
 +static bool vfp_access_check_m(DisasContext *s)
 +{
 +    if (s->fp_excp_el) {
 +        /*
 +         * M-profile mostly catches the "FPU disabled" case early, in
 +         * disas_m_nocp(), but a few insns (eg LCTP, WLSTP, DLSTP)
 +         * which do coprocessor-checks are outside the large ranges of
 +         * the encoding space handled by the patterns in m-nocp.decode,
 +         * and for them we may need to raise NOCP here.
 +         */
 +        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 +                           syn_uncategorized(), s->fp_excp_el);
 +        return false;
      }
 +    /* Handle M-profile lazy FP state mechanics */
 +
 +    /* Trigger lazy-state preservation if necessary */
 +    gen_preserve_fp_state(s);
 +
 +    /* Update ownership of FP context and create new FP context if needed */
 +    gen_update_fp_context(s);
 +
      return true;
  }
- static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
+@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
--                               const char *name, hwaddr size)
+  */
-+                               const char *name, hwaddr size,
+ bool vfp_access_check(DisasContext *s)
 +                               const int *irqs)
  {
-     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+-    return full_vfp_access_check(s, false);
-     CMSDKAPBUART *uart = opaque;
++    if (arm_dc_feature(s, ARM_FEATURE_M)) {
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
++        return vfp_access_check_m(s);
 +    } else {
 +        return vfp_access_check_a(s, false);
 +    }
  }
- static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
+ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
--                              const char *name, hwaddr size)
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
-+                              const char *name, hwaddr size,
+         return false;
-+                              const int *irqs)
+     }
- {
-     MPS2SCC *scc = opaque;
+-    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
-     DeviceState *sccdev;
++    /*
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
++     * Call vfp_access_check_a() directly, because we need to tell
- }
++     * it to ignore FPEXC.EN for some register accesses.
++     */
- static MemoryRegion *make_fpgaio(MPS2TZMachineState *mms, void *opaque,
++    if (!vfp_access_check_a(s, ignore_vfp_enabled)) {
--                                 const char *name, hwaddr size)
+         return true;
-+                                 const char *name, hwaddr size,
+     }
-+                                 const int *irqs)
  {
      MPS2FPGAIO *fpgaio = opaque;
      MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_fpgaio(MPS2TZMachineState *mms, void *opaque,
  }
  static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
 -                                  const char *name, hwaddr size)
 +                                  const char *name, hwaddr size,
 +                                  const int *irqs)
  {
      SysBusDevice *s;
      NICInfo *nd = &nd_table[0];
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
  }
  static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
 -                              const char *name, hwaddr size)
 +                              const char *name, hwaddr size,
 +                              const int *irqs)
  {
      TZMPC *mpc = opaque;
      int i = mpc - &mms->ssram_mpc[0];
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
  }
  static MemoryRegion *make_dma(MPS2TZMachineState *mms, void *opaque,
 -                              const char *name, hwaddr size)
 +                              const char *name, hwaddr size,
 +                              const int *irqs)
  {
      PL080State *dma = opaque;
      int i = dma - &mms->dma[0];
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_dma(MPS2TZMachineState *mms, void *opaque,
  }
  static MemoryRegion *make_spi(MPS2TZMachineState *mms, void *opaque,
 -                              const char *name, hwaddr size)
 +                              const char *name, hwaddr size,
 +                              const int *irqs)
  {
      /*
       * The AN505 has five PL022 SPI controllers.
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_spi(MPS2TZMachineState *mms, void *opaque,
  }
  static MemoryRegion *make_i2c(MPS2TZMachineState *mms, void *opaque,
 -                              const char *name, hwaddr size)
 +                              const char *name, hwaddr size,
 +                              const int *irqs)
  {
      ArmSbconI2CState *i2c = opaque;
      SysBusDevice *s;
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
                  continue;
              }
 -            mr = pinfo->devfn(mms, pinfo->opaque, pinfo->name, pinfo->size);
 +            mr = pinfo->devfn(mms, pinfo->opaque, pinfo->name, pinfo->size,
 +                              pinfo->irqs);
              portname = g_strdup_printf("port[%d]", port);
              object_property_set_link(OBJECT(ppc), portname, OBJECT(mr),
                                       &error_fatal);
 --
 .20.1

-[PULL 14/49] target/arm: Restrict v8M IDAU to TCG
+[PULL 11/57] target/arm: Handle FPU check for FPCXT_NS insns via vfp_access_check_m()
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Instead of open-coding the "take NOCP exception if FPU disabled,
 otherwise call gen_preserve_fp_state()" code in the accessors for
 FPCXT_NS, add an argument to vfp_access_check_m() which tells it to
 skip the gen_update_fp_context() call, so we can use it for the
 FPCXT_NS case.
-IDAU is specific to M-profile. KVM only supports A-profile.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Restrict this interface to TCG, as it is pointless (and
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-confusing) on a KVM-only build.
+Message-id: 20210618141019.10671-8-peter.maydell@linaro.org
 ---
  target/arm/translate-a32.h    |  2 +-
  target/arm/translate-m-nocp.c | 10 ++--------
  target/arm/translate-vfp.c    | 13 ++++++++-----
 files changed, 11 insertions(+), 14 deletions(-)
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20210221222617.2579610-2-f4bug@amsat.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/cpu.c     | 7 -------
  target/arm/cpu_tcg.c | 8 ++++++++
 files changed, 8 insertions(+), 7 deletions(-)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/target/arm/translate-a32.h
-+++ b/target/arm/cpu.c
++++ b/target/arm/translate-a32.h
-@@ -XXX,XX +XXX,XX @@ static const TypeInfo arm_cpu_type_info = {
+@@ -XXX,XX +XXX,XX @@ bool disas_neon_shared(DisasContext *s, uint32_t insn);
-     .class_init = arm_cpu_class_init,
+ void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
- };
+ void arm_gen_condlabel(DisasContext *s);
+ bool vfp_access_check(DisasContext *s);
--static const TypeInfo idau_interface_type_info = {
+-void gen_preserve_fp_state(DisasContext *s);
--    .name = TYPE_IDAU_INTERFACE,
++bool vfp_access_check_m(DisasContext *s, bool skip_context_update);
--    .parent = TYPE_INTERFACE,
+ void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop);
--    .class_size = sizeof(IDAUInterfaceClass),
+ void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop);
--};
+ void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop);
--
+diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
- static void arm_cpu_register_types(void)
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-m-nocp.c
 +++ b/target/arm/translate-m-nocp.c
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
           * otherwise PreserveFPState(), and then FPCXT_NS writes
           * behave the same as FPCXT_S writes.
           */
 -        if (s->fp_excp_el) {
 -            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 -                               syn_uncategorized(), s->fp_excp_el);
 +        if (!vfp_access_check_m(s, true)) {
              /*
               * This was only a conditional exception, so override
               * gen_exception_insn()'s default to DISAS_NORETURN
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
              s->base.is_jmp = DISAS_NEXT;
              break;
          }
 -        gen_preserve_fp_state(s);
      }
      /* fall through */
      case ARM_VFP_FPCXT_S:
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
           * otherwise PreserveFPState(), and then FPCXT_NS
           * reads the same as FPCXT_S.
           */
 -        if (s->fp_excp_el) {
 -            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 -                               syn_uncategorized(), s->fp_excp_el);
 +        if (!vfp_access_check_m(s, true)) {
              /*
               * This was only a conditional exception, so override
               * gen_exception_insn()'s default to DISAS_NORETURN
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
              s->base.is_jmp = DISAS_NEXT;
              break;
          }
 -        gen_preserve_fp_state(s);
          tmp = tcg_temp_new_i32();
          sfpa = tcg_temp_new_i32();
          fpscr = tcg_temp_new_i32();
 diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c
 +++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static inline long vfp_f16_offset(unsigned reg, bool top)
   * Generate code for M-profile lazy FP state preservation if needed;
   * this corresponds to the pseudocode PreserveFPState() function.
   */
 -void gen_preserve_fp_state(DisasContext *s)
 +static void gen_preserve_fp_state(DisasContext *s)
  {
-     const size_t cpu_count = ARRAY_SIZE(arm_cpus);
+     if (s->v7m_lspact) {
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_register_types(void)
+         /*
-     if (cpu_count) {
+@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check_a(DisasContext *s, bool ignore_vfp_enabled)
-         size_t i;
+  * If VFP is enabled, do the necessary M-profile lazy-FP handling and then
+  * return true. If not, emit code to generate an appropriate exception and
--        type_register_static(&idau_interface_type_info);
+  * return false.
-         for (i = 0; i < cpu_count; ++i) {
++ * skip_context_update is true to skip the "update FP context" part of this.
-             arm_cpu_register(&arm_cpus[i]);
+  */
-         }
+-static bool vfp_access_check_m(DisasContext *s)
-diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
++bool vfp_access_check_m(DisasContext *s, bool skip_context_update)
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu_tcg.c
 +++ b/target/arm/cpu_tcg.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/core/tcg-cpu-ops.h"
  #endif /* CONFIG_TCG */
  #include "internals.h"
 +#include "target/arm/idau.h"
  /* CPU models. These are not needed for the AArch64 linux-user build. */
  #if !defined(CONFIG_USER_ONLY) || !defined(TARGET_AARCH64)
@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_tcg_cpus[] = {
      { .name = "pxa270-c5",   .initfn = pxa270c5_initfn },
  };
 +static const TypeInfo idau_interface_type_info = {
 +    .name = TYPE_IDAU_INTERFACE,
 +    .parent = TYPE_INTERFACE,
 +    .class_size = sizeof(IDAUInterfaceClass),
 +};
 +
  static void arm_tcg_cpu_register_types(void)
  {
-     size_t i;
+     if (s->fp_excp_el) {
+         /*
-+    type_register_static(&idau_interface_type_info);
+@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check_m(DisasContext *s)
-     for (i = 0; i < ARRAY_SIZE(arm_tcg_cpus); ++i) {
+     /* Trigger lazy-state preservation if necessary */
-         arm_cpu_register(&arm_tcg_cpus[i]);
+     gen_preserve_fp_state(s);
 -    /* Update ownership of FP context and create new FP context if needed */
 -    gen_update_fp_context(s);
 +    if (!skip_context_update) {
 +        /* Update ownership of FP context and create new FP context if needed */
 +        gen_update_fp_context(s);
 +    }
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check_m(DisasContext *s)
  bool vfp_access_check(DisasContext *s)
  {
      if (arm_dc_feature(s, ARM_FEATURE_M)) {
 -        return vfp_access_check_m(s);
 +        return vfp_access_check_m(s, false);
      } else {
          return vfp_access_check_a(s, false);
      }
 --
 .20.1

-[PULL 03/49] target/arm: Add support for FEAT_SSBS, Speculative Store Bypass Safe
+[PULL 12/57] target/arm: Implement MVE VLDR/VSTR (non-widening forms)
-From: Rebecca Cran <rebecca@nuviainc.com>
+Implement the forms of the MVE VLDR and VSTR insns which perform
 non-widening loads of bytes, halfwords or words from memory into
 vector elements of the same width (encodings T5, T6, T7).
-Add support for FEAT_SSBS. SSBS (Speculative Store Bypass Safe) is an
+(At the moment we know for MVE and M-profile in general that
-optional feature in ARMv8.0, and mandatory in ARMv8.5.
+vfp_access_check() can never return false, but we include the
 conventional return-true-on-failure check for consistency
 with non-M-profile translation code.)
-Signed-off-by: Rebecca Cran <rebecca@nuviainc.com>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210216224543.16142-2-rebecca@nuviainc.com
+Message-id: 20210617121628.20116-2-peter.maydell@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h           | 15 ++++++++++++++-
+ target/arm/{translate-mve.c => helper-mve.h} |  19 +-
- target/arm/internals.h     |  6 ++++++
+ target/arm/helper.h                          |   2 +
- target/arm/helper.c        | 37 +++++++++++++++++++++++++++++++++++++
+ target/arm/internals.h                       |  11 ++
- target/arm/translate-a64.c | 12 ++++++++++++
+ target/arm/mve.decode                        |  22 +++
-files changed, 69 insertions(+), 1 deletion(-)
+ target/arm/mve_helper.c                      | 172 +++++++++++++++++++
  target/arm/translate-mve.c                   | 119 +++++++++++++
  target/arm/meson.build                       |   1 +
 files changed, 334 insertions(+), 12 deletions(-)
  copy target/arm/{translate-mve.c => helper-mve.h} (61%)
  create mode 100644 target/arm/mve_helper.c
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/translate-mve.c b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
+similarity index 61%
---- a/target/arm/cpu.h
+copy from target/arm/translate-mve.c
-+++ b/target/arm/cpu.h
+copy to target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
+index XXXXXXX..XXXXXXX 100644
- #define SCTLR_TE      (1U << 30) /* AArch32 only */
+--- a/target/arm/translate-mve.c
- #define SCTLR_EnIB    (1U << 30) /* v8.3, AArch64 only */
++++ b/target/arm/helper-mve.h
- #define SCTLR_EnIA    (1U << 31) /* v8.3, AArch64 only */
+@@ -XXX,XX +XXX,XX @@
 +#define SCTLR_DSSBS_32 (1U << 31) /* v8.5, AArch32 only */
  #define SCTLR_BT0     (1ULL << 35) /* v8.5-BTI */
  #define SCTLR_BT1     (1ULL << 36) /* v8.5-BTI */
  #define SCTLR_ITFSB   (1ULL << 37) /* v8.5-MemTag */
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
  #define SCTLR_TCF     (3ULL << 40) /* v8.5-MemTag */
  #define SCTLR_ATA0    (1ULL << 42) /* v8.5-MemTag */
  #define SCTLR_ATA     (1ULL << 43) /* v8.5-MemTag */
 -#define SCTLR_DSSBS   (1ULL << 44) /* v8.5 */
 +#define SCTLR_DSSBS_64 (1ULL << 44) /* v8.5, AArch64 only */
  #define CPTR_TCPAC    (1U << 31)
  #define CPTR_TTA      (1U << 20)
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
  #define CPSR_IL (1U << 20)
  #define CPSR_DIT (1U << 21)
  #define CPSR_PAN (1U << 22)
 +#define CPSR_SSBS (1U << 23)
  #define CPSR_J (1U << 24)
  #define CPSR_IT_0_1 (3U << 25)
  #define CPSR_Q (1U << 27)
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
  #define PSTATE_A (1U << 8)
  #define PSTATE_D (1U << 9)
  #define PSTATE_BTYPE (3U << 10)
 +#define PSTATE_SSBS (1U << 12)
  #define PSTATE_IL (1U << 20)
  #define PSTATE_SS (1U << 21)
  #define PSTATE_PAN (1U << 22)
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_dit(const ARMISARegisters *id)
      return FIELD_EX32(id->id_pfr0, ID_PFR0, DIT) != 0;
  }
 +static inline bool isar_feature_aa32_ssbs(const ARMISARegisters *id)
 +{
 +    return FIELD_EX32(id->id_pfr2, ID_PFR2, SSBS) != 0;
 +}
 +
  /*
-  * 64-bit feature tests via id registers.
+- *  ARM translation: M-profile MVE instructions
 + *  M-profile MVE specific helper definitions
   *
   *  Copyright (c) 2021 Linaro, Ltd.
   *
@@ -XXX,XX +XXX,XX @@
   * You should have received a copy of the GNU Lesser General Public
   * License along with this library; if not, see <http://www.gnu.org/licenses/>.
   */
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_dit(const ARMISARegisters *id)
+-
-     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, DIT) != 0;
+-#include "qemu/osdep.h"
- }
+-#include "tcg/tcg-op.h"
+-#include "tcg/tcg-op-gvec.h"
-+static inline bool isar_feature_aa64_ssbs(const ARMISARegisters *id)
+-#include "exec/exec-all.h"
-+{
+-#include "exec/gen-icount.h"
-+    return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, SSBS) != 0;
+-#include "translate.h"
-+}
+-#include "translate-a32.h"
-+
+-
- /*
+-/* Include the generated decoder */
-  * Feature tests for "does this exist in either 32-bit or 64-bit?"
+-#include "decode-mve.c.inc"
-  */
++DEF_HELPER_FLAGS_3(mve_vldrb, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vldrh, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vldrw, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vstrb, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vstrh, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vstrw, TCG_CALL_NO_WG, void, env, ptr, i32)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
  #include "helper-a64.h"
  #include "helper-sve.h"
  #endif
 +
 +#include "helper-mve.h"
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/internals.h
 +++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t aarch32_cpsr_valid_mask(uint64_t features,
+@@ -XXX,XX +XXX,XX @@ static inline uint64_t useronly_maybe_clean_ptr(uint32_t desc, uint64_t ptr)
-     if (isar_feature_aa32_dit(id)) {
+     return ptr;
          valid |= CPSR_DIT;
      }
 +    if (isar_feature_aa32_ssbs(id)) {
 +        valid |= CPSR_SSBS;
 +    }
      return valid;
  }
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t aarch64_pstate_valid_mask(const ARMISARegisters *id)
-     if (isar_feature_aa64_dit(id)) {
++/* Values for M-profile PSR.ECI for MVE insns */
-         valid |= PSTATE_DIT;
++enum MVEECIState {
-     }
++    ECI_NONE = 0, /* No completed beats */
-+    if (isar_feature_aa64_ssbs(id)) {
++    ECI_A0 = 1, /* Completed: A0 */
-+        valid |= PSTATE_SSBS;
++    ECI_A0A1 = 2, /* Completed: A0, A1 */
-+    }
++    /* 3 is reserved */
-     if (isar_feature_aa64_mte(id)) {
++    ECI_A0A1A2 = 4, /* Completed: A0, A1, A2 */
-         valid |= PSTATE_TCO;
++    ECI_A0A1A2B0 = 5, /* Completed: A0, A1, A2, B0 */
-     }
++    /* All other values reserved */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo dit_reginfo = {
      .readfn = aa64_dit_read, .writefn = aa64_dit_write
  };
 +static uint64_t aa64_ssbs_read(CPUARMState *env, const ARMCPRegInfo *ri)
 +{
 +    return env->pstate & PSTATE_SSBS;
 +}
 +
 +static void aa64_ssbs_write(CPUARMState *env, const ARMCPRegInfo *ri,
 +                           uint64_t value)
 +{
 +    env->pstate = (env->pstate & ~PSTATE_SSBS) | (value & PSTATE_SSBS);
 +}
 +
 +static const ARMCPRegInfo ssbs_reginfo = {
 +    .name = "SSBS", .state = ARM_CP_STATE_AA64,
 +    .opc0 = 3, .opc1 = 3, .crn = 4, .crm = 2, .opc2 = 6,
 +    .type = ARM_CP_NO_RAW, .access = PL0_RW,
 +    .readfn = aa64_ssbs_read, .writefn = aa64_ssbs_write
 +};
 +
- static CPAccessResult aa64_cacheop_poc_access(CPUARMState *env,
+ #endif
-                                               const ARMCPRegInfo *ri,
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-                                               bool isread)
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+--- a/target/arm/mve.decode
-     if (cpu_isar_feature(aa64_dit, cpu)) {
++++ b/target/arm/mve.decode
-         define_one_arm_cp_reg(cpu, &dit_reginfo);
+@@ -XXX,XX +XXX,XX @@
-     }
+ #
-+    if (cpu_isar_feature(aa64_ssbs, cpu)) {
+ # This file is processed by scripts/decodetree.py
-+        define_one_arm_cp_reg(cpu, &ssbs_reginfo);
+ #
-+    }
++
++%qd 22:1 13:3
-     if (arm_feature(env, ARM_FEATURE_EL2) && cpu_isar_feature(aa64_vh, cpu)) {
++
-         define_arm_cp_regs(cpu, vhe_reginfo);
++&vldr_vstr rn qd imm p a w size l
-@@ -XXX,XX +XXX,XX @@ static void take_aarch32_exception(CPUARMState *env, int new_mode,
++
-     env->uncached_cpsr &= ~(CPSR_IL | CPSR_J);
++@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd
-     env->daif |= mask;
++
++# Vector loads and stores
-+    if (cpu_isar_feature(aa32_ssbs, env_archcpu(env))) {
++
-+        if (env->cp15.sctlr_el[new_el] & SCTLR_DSSBS_32) {
++# Non-widening loads/stores (P=0 W=0 is 'related encoding')
-+            env->uncached_cpsr |= CPSR_SSBS;
++VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111100 .......   @vldr_vstr \
-+        } else {
++                 size=0 p=0 w=1
-+            env->uncached_cpsr &= ~CPSR_SSBS;
++VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111101 .......   @vldr_vstr \
 +                 size=1 p=0 w=1
 +VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111110 .......   @vldr_vstr \
 +                 size=2 p=0 w=1
 +VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111100 .......   @vldr_vstr \
 +                 size=0 p=1
 +VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
 +                 size=1 p=1
 +VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
 +                 size=2 p=1
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * M-profile MVE Operations
 + *
 + * Copyright (c) 2021 Linaro, Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2.1 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "cpu.h"
 +#include "internals.h"
 +#include "vec_internal.h"
 +#include "exec/helper-proto.h"
 +#include "exec/cpu_ldst.h"
 +#include "exec/exec-all.h"
 +
 +static uint16_t mve_element_mask(CPUARMState *env)
 +{
 +    /*
 +     * Return the mask of which elements in the MVE vector should be
 +     * updated. This is a combination of multiple things:
 +     *  (1) by default, we update every lane in the vector
 +     *  (2) VPT predication stores its state in the VPR register;
 +     *  (3) low-overhead-branch tail predication will mask out part
 +     *      the vector on the final iteration of the loop
 +     *  (4) if EPSR.ECI is set then we must execute only some beats
 +     *      of the insn
 +     * We combine all these into a 16-bit result with the same semantics
 +     * as VPR.P0: 0 to mask the lane, 1 if it is active.
 +     * 8-bit vector ops will look at all bits of the result;
 +     * 16-bit ops will look at bits 0, 2, 4, ...;
 +     * 32-bit ops will look at bits 0, 4, 8 and 12.
 +     * Compare pseudocode GetCurInstrBeat(), though that only returns
 +     * the 4-bit slice of the mask corresponding to a single beat.
 +     */
 +    uint16_t mask = FIELD_EX32(env->v7m.vpr, V7M_VPR, P0);
 +
 +    if (!(env->v7m.vpr & R_V7M_VPR_MASK01_MASK)) {
 +        mask |= 0xff;
 +    }
 +    if (!(env->v7m.vpr & R_V7M_VPR_MASK23_MASK)) {
 +        mask |= 0xff00;
 +    }
 +
 +    if (env->v7m.ltpsize < 4 &&
 +        env->regs[14] <= (1 << (4 - env->v7m.ltpsize))) {
 +        /*
 +         * Tail predication active, and this is the last loop iteration.
 +         * The element size is (1 << ltpsize), and we only want to process
 +         * loopcount elements, so we want to retain the least significant
 +         * (loopcount * esize) predicate bits and zero out bits above that.
 +         */
 +        int masklen = env->regs[14] << env->v7m.ltpsize;
 +        assert(masklen <= 16);
 +        mask &= MAKE_64BIT_MASK(0, masklen);
 +    }
 +
 +    if ((env->condexec_bits & 0xf) == 0) {
 +        /*
 +         * ECI bits indicate which beats are already executed;
 +         * we handle this by effectively predicating them out.
 +         */
 +        int eci = env->condexec_bits >> 4;
 +        switch (eci) {
 +        case ECI_NONE:
 +            break;
 +        case ECI_A0:
 +            mask &= 0xfff0;
 +            break;
 +        case ECI_A0A1:
 +            mask &= 0xff00;
 +            break;
 +        case ECI_A0A1A2:
 +        case ECI_A0A1A2B0:
 +            mask &= 0xf000;
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +    }
 +
-     if (new_mode == ARM_CPU_MODE_HYP) {
++    return mask;
-         env->thumb = (env->cp15.sctlr_el[2] & SCTLR_TE) != 0;
++}
-         env->elr_el[2] = env->regs[15];
++
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
++static void mve_advance_vpt(CPUARMState *env)
-         new_mode |= PSTATE_TCO;
++{
-     }
++    /* Advance the VPT and ECI state if necessary */
++    uint32_t vpr = env->v7m.vpr;
-+    if (cpu_isar_feature(aa64_ssbs, cpu)) {
++    unsigned mask01, mask23;
-+        if (env->cp15.sctlr_el[new_el] & SCTLR_DSSBS_64) {
++
-+            new_mode |= PSTATE_SSBS;
++    if ((env->condexec_bits & 0xf) == 0) {
-+        } else {
++        env->condexec_bits = (env->condexec_bits == (ECI_A0A1A2B0 << 4)) ?
-+            new_mode &= ~PSTATE_SSBS;
++            (ECI_A0 << 4) : (ECI_NONE << 4);
 +    }
 +
 +    if (!(vpr & (R_V7M_VPR_MASK01_MASK | R_V7M_VPR_MASK23_MASK))) {
 +        /* VPT not enabled, nothing to do */
 +        return;
 +    }
 +
 +    mask01 = FIELD_EX32(vpr, V7M_VPR, MASK01);
 +    mask23 = FIELD_EX32(vpr, V7M_VPR, MASK23);
 +    if (mask01 > 8) {
 +        /* high bit set, but not 0b1000: invert the relevant half of P0 */
 +        vpr ^= 0xff;
 +    }
 +    if (mask23 > 8) {
 +        /* high bit set, but not 0b1000: invert the relevant half of P0 */
 +        vpr ^= 0xff00;
 +    }
 +    vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
 +    vpr = FIELD_DP32(vpr, V7M_VPR, MASK23, mask23 << 1);
 +    env->v7m.vpr = vpr;
 +}
 +
 +
 +#define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE)                         \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
 +    {                                                                   \
 +        TYPE *d = vd;                                                   \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned b, e;                                                  \
 +        /*                                                              \
 +         * R_SXTM allows the dest reg to become UNKNOWN for abandoned   \
 +         * beats so we don't care if we update part of the dest and     \
 +         * then take an exception.                                      \
 +         */                                                             \
 +        for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
 +            if (mask & (1 << b)) {                                      \
 +                d[H##ESIZE(e)] = cpu_##LDTYPE##_data_ra(env, addr, GETPC()); \
 +            }                                                           \
 +            addr += MSIZE;                                              \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define DO_VSTR(OP, MSIZE, STTYPE, ESIZE, TYPE)                         \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
 +    {                                                                   \
 +        TYPE *d = vd;                                                   \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned b, e;                                                  \
 +        for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
 +            if (mask & (1 << b)) {                                      \
 +                cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
 +            }                                                           \
 +            addr += MSIZE;                                              \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +DO_VLDR(vldrb, 1, ldub, 1, uint8_t)
 +DO_VLDR(vldrh, 2, lduw, 2, uint16_t)
 +DO_VLDR(vldrw, 4, ldl, 4, uint32_t)
 +
 +DO_VSTR(vstrb, 1, stb, 1, uint8_t)
 +DO_VSTR(vstrh, 2, stw, 2, uint16_t)
 +DO_VSTR(vstrw, 4, stl, 4, uint32_t)
 +
 +#undef DO_VLDR
 +#undef DO_VSTR
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
  /* Include the generated decoder */
  #include "decode-mve.c.inc"
 +
 +typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 +
 +/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
 +static inline long mve_qreg_offset(unsigned reg)
 +{
 +    return offsetof(CPUARMState, vfp.zregs[reg].d[0]);
 +}
 +
 +static TCGv_ptr mve_qreg_ptr(unsigned reg)
 +{
 +    TCGv_ptr ret = tcg_temp_new_ptr();
 +    tcg_gen_addi_ptr(ret, cpu_env, mve_qreg_offset(reg));
 +    return ret;
 +}
 +
 +static bool mve_check_qreg_bank(DisasContext *s, int qmask)
 +{
 +    /*
 +     * Check whether Qregs are in range. For v8.1M only Q0..Q7
 +     * are supported, see VFPSmallRegisterBank().
 +     */
 +    return qmask < 8;
 +}
 +
 +static bool mve_eci_check(DisasContext *s)
 +{
 +    /*
 +     * This is a beatwise insn: check that ECI is valid (not a
 +     * reserved value) and note that we are handling it.
 +     * Return true if OK, false if we generated an exception.
 +     */
 +    s->eci_handled = true;
 +    switch (s->eci) {
 +    case ECI_NONE:
 +    case ECI_A0:
 +    case ECI_A0A1:
 +    case ECI_A0A1A2:
 +    case ECI_A0A1A2B0:
 +        return true;
 +    default:
 +        /* Reserved value: INVSTATE UsageFault */
 +        gen_exception_insn(s, s->pc_curr, EXCP_INVSTATE, syn_uncategorized(),
 +                           default_exception_el(s));
 +        return false;
 +    }
 +}
 +
 +static void mve_update_eci(DisasContext *s)
 +{
 +    /*
 +     * The helper function will always update the CPUState field,
 +     * so we only need to update the DisasContext field.
 +     */
 +    if (s->eci) {
 +        s->eci = (s->eci == ECI_A0A1A2B0) ? ECI_A0 : ECI_NONE;
 +    }
 +}
 +
 +static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
 +{
 +    TCGv_i32 addr;
 +    uint32_t offset;
 +    TCGv_ptr qreg;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd) ||
 +        !fn) {
 +        return false;
 +    }
 +
 +    /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
 +    if (a->rn == 15 || (a->rn == 13 && a->w)) {
 +        return false;
 +    }
 +
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    offset = a->imm << a->size;
 +    if (!a->a) {
 +        offset = -offset;
 +    }
 +    addr = load_reg(s, a->rn);
 +    if (a->p) {
 +        tcg_gen_addi_i32(addr, addr, offset);
 +    }
 +
 +    qreg = mve_qreg_ptr(a->qd);
 +    fn(cpu_env, qreg, addr);
 +    tcg_temp_free_ptr(qreg);
 +
 +    /*
 +     * Writeback always happens after the last beat of the insn,
 +     * regardless of predication
 +     */
 +    if (a->w) {
 +        if (!a->p) {
 +            tcg_gen_addi_i32(addr, addr, offset);
 +        }
-+    }
++        store_reg(s, a->rn, addr);
-+
++    } else {
-     pstate_write(env, PSTATE_DAIF | new_mode);
++        tcg_temp_free_i32(addr);
-     env->aarch64 = 1;
++    }
-     aarch64_restore_sp(env, new_el);
++    mve_update_eci(s);
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
++    return true;
-index XXXXXXX..XXXXXXX 100644
++}
---- a/target/arm/translate-a64.c
++
-+++ b/target/arm/translate-a64.c
++static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
-@@ -XXX,XX +XXX,XX @@ static void handle_msr_i(DisasContext *s, uint32_t insn,
++{
-         tcg_temp_free_i32(t1);
++    static MVEGenLdStFn * const ldstfns[4][2] = {
-         break;
++        { gen_helper_mve_vstrb, gen_helper_mve_vldrb },
++        { gen_helper_mve_vstrh, gen_helper_mve_vldrh },
-+    case 0x19: /* SSBS */
++        { gen_helper_mve_vstrw, gen_helper_mve_vldrw },
-+        if (!dc_isar_feature(aa64_ssbs, s)) {
++        { NULL, NULL }
-+            goto do_unallocated;
++    };
-+        }
++    return do_ldst(s, a, ldstfns[a->size][a->l]);
-+        if (crm & 1) {
++}
-+            set_pstate_bits(PSTATE_SSBS);
+diff --git a/target/arm/meson.build b/target/arm/meson.build
-+        } else {
+index XXXXXXX..XXXXXXX 100644
-+            clear_pstate_bits(PSTATE_SSBS);
+--- a/target/arm/meson.build
-+        }
++++ b/target/arm/meson.build
-+        /* Don't need to rebuild hflags since SSBS is a nop */
+@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
-+        break;
+   'helper.c',
-+
+   'iwmmxt_helper.c',
-     case 0x1a: /* DIT */
+   'm_helper.c',
-         if (!dc_isar_feature(aa64_dit, s)) {
++  'mve_helper.c',
-             goto do_unallocated;
+   'neon_helper.c',
    'op_helper.c',
    'tlb_helper.c',
 --
 .20.1

-New patch
+[PULL 13/57] target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
+Implement the variants of MVE VLDR (encodings T1, T2) which perform
+"widening" loads where bytes or halfwords are loaded from memory and
+zero or sign-extended into halfword or word length vector elements,
+and the narrowing MVE VSTR (encodings T1, T2) where bytes or
+halfwords are stored from halfword or word elements.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210617121628.20116-3-peter.maydell@linaro.org
+---
+ target/arm/helper-mve.h    | 10 ++++++++++
+ target/arm/mve.decode      | 25 +++++++++++++++++++++++--
+ target/arm/mve_helper.c    | 11 +++++++++++
+ target/arm/translate-mve.c | 14 ++++++++++++++
+files changed, 58 insertions(+), 2 deletions(-)
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper-mve.h
++++ b/target/arm/helper-mve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vldrw, TCG_CALL_NO_WG, void, env, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_vstrb, TCG_CALL_NO_WG, void, env, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_vstrh, TCG_CALL_NO_WG, void, env, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_vstrw, TCG_CALL_NO_WG, void, env, ptr, i32)
++
++DEF_HELPER_FLAGS_3(mve_vldrb_sh, TCG_CALL_NO_WG, void, env, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_vldrb_sw, TCG_CALL_NO_WG, void, env, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_vldrb_uh, TCG_CALL_NO_WG, void, env, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_vldrb_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_vldrh_sw, TCG_CALL_NO_WG, void, env, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_vldrh_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve.decode
++++ b/target/arm/mve.decode
+@@ -XXX,XX +XXX,XX @@
+ %qd 22:1 13:3
+-&vldr_vstr rn qd imm p a w size l
++&vldr_vstr rn qd imm p a w size l u
+-@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd
++@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
++# Note that both Rn and Qd are 3 bits only (no D bit)
++@vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
+ # Vector loads and stores
++# Widening loads and narrowing stores:
++# for these P=0 W=0 is 'related encoding'; sz=11 is 'related encoding'
++# This means we need to expand out to multiple patterns for P, W, SZ.
++# For stores the U bit must be 0 but we catch that in the trans_ function.
++# The naming scheme here is "VLDSTB_H == in-memory byte load/store to/from
++# signed halfword element in register", etc.
++VLDSTB_H         111 . 110 0 a:1 0 1   . 0 ... ... 0 111 01 ....... @vldst_wn \
++                 p=0 w=1 size=1
++VLDSTB_H         111 . 110 1 a:1 0 w:1 . 0 ... ... 0 111 01 ....... @vldst_wn \
++                 p=1 size=1
++VLDSTB_W         111 . 110 0 a:1 0 1   . 0 ... ... 0 111 10 ....... @vldst_wn \
++                 p=0 w=1 size=2
++VLDSTB_W         111 . 110 1 a:1 0 w:1 . 0 ... ... 0 111 10 ....... @vldst_wn \
++                 p=1 size=2
++VLDSTH_W         111 . 110 0 a:1 0 1   . 1 ... ... 0 111 10 ....... @vldst_wn \
++                 p=0 w=1 size=2
++VLDSTH_W         111 . 110 1 a:1 0 w:1 . 1 ... ... 0 111 10 ....... @vldst_wn \
++                 p=1 size=2
++
+ # Non-widening loads/stores (P=0 W=0 is 'related encoding')
+ VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111100 .......   @vldr_vstr \
+                  size=0 p=0 w=1
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve_helper.c
++++ b/target/arm/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrb, 1, stb, 1, uint8_t)
+ DO_VSTR(vstrh, 2, stw, 2, uint16_t)
+ DO_VSTR(vstrw, 4, stl, 4, uint32_t)
++DO_VLDR(vldrb_sh, 1, ldsb, 2, int16_t)
++DO_VLDR(vldrb_sw, 1, ldsb, 4, int32_t)
++DO_VLDR(vldrb_uh, 1, ldub, 2, uint16_t)
++DO_VLDR(vldrb_uw, 1, ldub, 4, uint32_t)
++DO_VLDR(vldrh_sw, 2, ldsw, 4, int32_t)
++DO_VLDR(vldrh_uw, 2, lduw, 4, uint32_t)
++
++DO_VSTR(vstrb_h, 1, stb, 2, int16_t)
++DO_VSTR(vstrb_w, 1, stb, 4, int32_t)
++DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
++
+ #undef DO_VLDR
+ #undef DO_VSTR
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-mve.c
++++ b/target/arm/translate-mve.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
+     };
+     return do_ldst(s, a, ldstfns[a->size][a->l]);
+ }
++
++#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST)                  \
++    static bool trans_##OP(DisasContext *s, arg_VLDR_VSTR *a)   \
++    {                                                           \
++        static MVEGenLdStFn * const ldstfns[2][2] = {           \
++            { gen_helper_mve_##ST, gen_helper_mve_##SLD },      \
++            { NULL, gen_helper_mve_##ULD },                     \
++        };                                                      \
++        return do_ldst(s, a, ldstfns[a->u][a->l]);              \
++    }
++
++DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
++DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
++DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
+--
+.20.1

-New patch
+[PULL 14/57] target/arm: Implement MVE VCLZ
+Implement the MVE VCLZ insn (and the necessary machinery
 for MVE 1-input vector ops).
 Note that for non-load instructions predication is always performed
 at a byte level granularity regardless of element size (R_ZLSJ),
 and so the masking logic here differs from that used in the VLDR
 and VSTR helpers.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210617121628.20116-4-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  4 ++
  target/arm/mve.decode      |  8 ++++
  target/arm/mve_helper.c    | 82 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c | 38 ++++++++++++++++++
 files changed, 132 insertions(+)
 diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper-mve.h
 +++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vldrh_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
  DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
  DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
  DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  #
  %qd 22:1 13:3
 +%qm 5:1 1:3
  &vldr_vstr rn qd imm p a w size l u
 +&1op qd qm size
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
  @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
 +@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
 +
  # Vector loads and stores
  # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
                   size=1 p=1
  VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
                   size=2 p=1
 +
 +# Vector miscellaneous
 +
 +VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
  #undef DO_VLDR
  #undef DO_VSTR
 +
 +/*
 + * The mergemask(D, R, M) macro performs the operation "*D = R" but
 + * storing only the bytes which correspond to 1 bits in M,
 + * leaving other bytes in *D unchanged. We use _Generic
 + * to select the correct implementation based on the type of D.
 + */
 +
 +static void mergemask_ub(uint8_t *d, uint8_t r, uint16_t mask)
 +{
 +    if (mask & 1) {
 +        *d = r;
 +    }
 +}
 +
 +static void mergemask_sb(int8_t *d, int8_t r, uint16_t mask)
 +{
 +    mergemask_ub((uint8_t *)d, r, mask);
 +}
 +
 +static void mergemask_uh(uint16_t *d, uint16_t r, uint16_t mask)
 +{
 +    uint16_t bmask = expand_pred_b_data[mask & 3];
 +    *d = (*d & ~bmask) | (r & bmask);
 +}
 +
 +static void mergemask_sh(int16_t *d, int16_t r, uint16_t mask)
 +{
 +    mergemask_uh((uint16_t *)d, r, mask);
 +}
 +
 +static void mergemask_uw(uint32_t *d, uint32_t r, uint16_t mask)
 +{
 +    uint32_t bmask = expand_pred_b_data[mask & 0xf];
 +    *d = (*d & ~bmask) | (r & bmask);
 +}
 +
 +static void mergemask_sw(int32_t *d, int32_t r, uint16_t mask)
 +{
 +    mergemask_uw((uint32_t *)d, r, mask);
 +}
 +
 +static void mergemask_uq(uint64_t *d, uint64_t r, uint16_t mask)
 +{
 +    uint64_t bmask = expand_pred_b_data[mask & 0xff];
 +    *d = (*d & ~bmask) | (r & bmask);
 +}
 +
 +static void mergemask_sq(int64_t *d, int64_t r, uint16_t mask)
 +{
 +    mergemask_uq((uint64_t *)d, r, mask);
 +}
 +
 +#define mergemask(D, R, M)                      \
 +    _Generic(D,                                 \
 +             uint8_t *: mergemask_ub,           \
 +             int8_t *:  mergemask_sb,           \
 +             uint16_t *: mergemask_uh,          \
 +             int16_t *:  mergemask_sh,          \
 +             uint32_t *: mergemask_uw,          \
 +             int32_t *:  mergemask_sw,          \
 +             uint64_t *: mergemask_uq,          \
 +             int64_t *:  mergemask_sq)(D, R, M)
 +
 +#define DO_1OP(OP, ESIZE, TYPE, FN)                                     \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
 +    {                                                                   \
 +        TYPE *d = vd, *m = vm;                                          \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            mergemask(&d[H##ESIZE(e)], FN(m[H##ESIZE(e)]), mask);       \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define DO_CLZ_B(N)   (clz32(N) - 24)
 +#define DO_CLZ_H(N)   (clz32(N) - 16)
 +
 +DO_1OP(vclzb, 1, uint8_t, DO_CLZ_B)
 +DO_1OP(vclzh, 2, uint16_t, DO_CLZ_H)
 +DO_1OP(vclzw, 4, uint32_t, clz32)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
  #include "decode-mve.c.inc"
  typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
  DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
  DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
  DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
 +
 +static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
 +{
 +    TCGv_ptr qd, qm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd | a->qm) ||
 +        !fn) {
 +        return false;
 +    }
 +
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    qm = mve_qreg_ptr(a->qm);
 +    fn(cpu_env, qd, qm);
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_ptr(qm);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +#define DO_1OP(INSN, FN)                                        \
 +    static bool trans_##INSN(DisasContext *s, arg_1op *a)       \
 +    {                                                           \
 +        static MVEGenOneOpFn * const fns[] = {                  \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +            gen_helper_mve_##FN##w,                             \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_1op(s, a, fns[a->size]);                      \
 +    }
 +
 +DO_1OP(VCLZ, vclz)
 --
 .20.1

-New patch
+[PULL 15/57] target/arm: Implement MVE VCLS
+Implement the MVE VCLS insn.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210617121628.20116-5-peter.maydell@linaro.org
+---
+ target/arm/helper-mve.h    | 4 ++++
+ target/arm/mve.decode      | 1 +
+ target/arm/mve_helper.c    | 7 +++++++
+ target/arm/translate-mve.c | 1 +
+files changed, 13 insertions(+)
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper-mve.h
++++ b/target/arm/helper-mve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_vclsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
++DEF_HELPER_FLAGS_3(mve_vclsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
++DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
++
+ DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+ DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+ DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve.decode
++++ b/target/arm/mve.decode
+@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
+ # Vector miscellaneous
++VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
+ VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve_helper.c
++++ b/target/arm/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ static void mergemask_sq(int64_t *d, int64_t r, uint16_t mask)
+         mve_advance_vpt(env);                                           \
+     }
++#define DO_CLS_B(N)   (clrsb32(N) - 24)
++#define DO_CLS_H(N)   (clrsb32(N) - 16)
++
++DO_1OP(vclsb, 1, int8_t, DO_CLS_B)
++DO_1OP(vclsh, 2, int16_t, DO_CLS_H)
++DO_1OP(vclsw, 4, int32_t, clrsb32)
++
+ #define DO_CLZ_B(N)   (clz32(N) - 24)
+ #define DO_CLZ_H(N)   (clz32(N) - 16)
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-mve.c
++++ b/target/arm/translate-mve.c
+@@ -XXX,XX +XXX,XX @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
+     }
+ DO_1OP(VCLZ, vclz)
++DO_1OP(VCLS, vcls)
+--
+.20.1

-New patch
+[PULL 16/57] target/arm: Implement MVE VREV16, VREV32, VREV64
+Implement the MVE instructions VREV16, VREV32 and VREV64.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210617121628.20116-6-peter.maydell@linaro.org
+---
+ target/arm/helper-mve.h    |  7 +++++++
+ target/arm/mve.decode      |  4 ++++
+ target/arm/mve_helper.c    |  7 +++++++
+ target/arm/translate-mve.c | 33 +++++++++++++++++++++++++++++++++
+files changed, 51 insertions(+)
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper-mve.h
++++ b/target/arm/helper-mve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+ DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+ DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+ DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
++
++DEF_HELPER_FLAGS_3(mve_vrev16b, TCG_CALL_NO_WG, void, env, ptr, ptr)
++DEF_HELPER_FLAGS_3(mve_vrev32b, TCG_CALL_NO_WG, void, env, ptr, ptr)
++DEF_HELPER_FLAGS_3(mve_vrev32h, TCG_CALL_NO_WG, void, env, ptr, ptr)
++DEF_HELPER_FLAGS_3(mve_vrev64b, TCG_CALL_NO_WG, void, env, ptr, ptr)
++DEF_HELPER_FLAGS_3(mve_vrev64h, TCG_CALL_NO_WG, void, env, ptr, ptr)
++DEF_HELPER_FLAGS_3(mve_vrev64w, TCG_CALL_NO_WG, void, env, ptr, ptr)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve.decode
++++ b/target/arm/mve.decode
+@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
+ VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
+ VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
++
++VREV16           1111 1111 1 . 11 .. 00 ... 0 0001 01 . 0 ... 0 @1op
++VREV32           1111 1111 1 . 11 .. 00 ... 0 0000 11 . 0 ... 0 @1op
++VREV64           1111 1111 1 . 11 .. 00 ... 0 0000 01 . 0 ... 0 @1op
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve_helper.c
++++ b/target/arm/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_1OP(vclsw, 4, int32_t, clrsb32)
+ DO_1OP(vclzb, 1, uint8_t, DO_CLZ_B)
+ DO_1OP(vclzh, 2, uint16_t, DO_CLZ_H)
+ DO_1OP(vclzw, 4, uint32_t, clz32)
++
++DO_1OP(vrev16b, 2, uint16_t, bswap16)
++DO_1OP(vrev32b, 4, uint32_t, bswap32)
++DO_1OP(vrev32h, 4, uint32_t, hswap32)
++DO_1OP(vrev64b, 8, uint64_t, bswap64)
++DO_1OP(vrev64h, 8, uint64_t, hswap64)
++DO_1OP(vrev64w, 8, uint64_t, wswap64)
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-mve.c
++++ b/target/arm/translate-mve.c
+@@ -XXX,XX +XXX,XX @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
+ DO_1OP(VCLZ, vclz)
+ DO_1OP(VCLS, vcls)
++
++static bool trans_VREV16(DisasContext *s, arg_1op *a)
++{
++    static MVEGenOneOpFn * const fns[] = {
++        gen_helper_mve_vrev16b,
++        NULL,
++        NULL,
++        NULL,
++    };
++    return do_1op(s, a, fns[a->size]);
++}
++
++static bool trans_VREV32(DisasContext *s, arg_1op *a)
++{
++    static MVEGenOneOpFn * const fns[] = {
++        gen_helper_mve_vrev32b,
++        gen_helper_mve_vrev32h,
++        NULL,
++        NULL,
++    };
++    return do_1op(s, a, fns[a->size]);
++}
++
++static bool trans_VREV64(DisasContext *s, arg_1op *a)
++{
++    static MVEGenOneOpFn * const fns[] = {
++        gen_helper_mve_vrev64b,
++        gen_helper_mve_vrev64h,
++        gen_helper_mve_vrev64w,
++        NULL,
++    };
++    return do_1op(s, a, fns[a->size]);
++}
+--
+.20.1

-[PULL 49/49] hw/arm/mps2: Update old infocenter.arm.com URLs
+[PULL 17/57] target/arm: Implement MVE VMVN (register)
-Update old infocenter.arm.com URLs to the equivalent developer.arm.com
+Implement the MVE VMVN(register) operation.  Note that for
-ones (the old URLs should redirect, but we might as well avoid the
+predication this operation is byte-by-byte.
 redirection notice, and the new URLs are pleasantly shorter).
 This commit covers the links to the MPS2 board TRM, the various
 Application Notes, the IoTKit and SSE-200 documents.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-25-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-7-peter.maydell@linaro.org
 ---
- include/hw/arm/armsse.h          |  4 ++--
+ target/arm/helper-mve.h    | 2 ++
- include/hw/misc/armsse-cpuid.h   |  2 +-
+ target/arm/mve.decode      | 3 +++
- include/hw/misc/armsse-mhu.h     |  2 +-
+ target/arm/mve_helper.c    | 4 ++++
- include/hw/misc/iotkit-secctl.h  |  2 +-
+ target/arm/translate-mve.c | 5 +++++
- include/hw/misc/iotkit-sysctl.h  |  2 +-
+files changed, 14 insertions(+)
  include/hw/misc/iotkit-sysinfo.h |  2 +-
  include/hw/misc/mps2-fpgaio.h    |  2 +-
  hw/arm/mps2-tz.c                 | 11 +++++------
  hw/misc/armsse-cpuid.c           |  2 +-
  hw/misc/armsse-mhu.c             |  2 +-
  hw/misc/iotkit-sysctl.c          |  2 +-
  hw/misc/iotkit-sysinfo.c         |  2 +-
  hw/misc/mps2-fpgaio.c            |  2 +-
  hw/misc/mps2-scc.c               |  2 +-
 files changed, 19 insertions(+), 20 deletions(-)
-diff --git a/include/hw/arm/armsse.h b/include/hw/arm/armsse.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/armsse.h
+--- a/target/arm/helper-mve.h
-+++ b/include/hw/arm/armsse.h
++++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vrev32h, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vrev64b, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vrev64h, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vrev64w, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vmvn, TCG_CALL_NO_WG, void, env, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
 @@ -XXX,XX +XXX,XX @@
-  * hardware, which include the IoT Kit and the SSE-050, SSE-100 and
+ @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
-  * SSE-200. Currently we model:
-  *  - the Arm IoT Kit which is documented in
+ @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
-- * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
++@1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
-+ *    https://developer.arm.com/documentation/ecm0601256/latest
-  *  - the SSE-200 which is documented in
+ # Vector loads and stores
-- * http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
-+ *    https://developer.arm.com/documentation/101104/latest/
+@@ -XXX,XX +XXX,XX @@ VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
-  *
+ VREV16           1111 1111 1 . 11 .. 00 ... 0 0001 01 . 0 ... 0 @1op
-  * The IoTKit contains:
+ VREV32           1111 1111 1 . 11 .. 00 ... 0 0000 11 . 0 ... 0 @1op
-  *  a Cortex-M33
+ VREV64           1111 1111 1 . 11 .. 00 ... 0 0000 01 . 0 ... 0 @1op
-diff --git a/include/hw/misc/armsse-cpuid.h b/include/hw/misc/armsse-cpuid.h
++
 +VMVN             1111 1111 1 . 11 00 00 ... 0 0101 11 . 0 ... 0 @1op_nosz
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/armsse-cpuid.h
+--- a/target/arm/mve_helper.c
-+++ b/include/hw/misc/armsse-cpuid.h
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DO_1OP(vrev32h, 4, uint32_t, hswap32)
- /*
+ DO_1OP(vrev64b, 8, uint64_t, bswap64)
-  * This is a model of the "CPU_IDENTITY" register block which is part of the
+ DO_1OP(vrev64h, 8, uint64_t, hswap64)
-  * Arm SSE-200 and documented in
+ DO_1OP(vrev64w, 8, uint64_t, wswap64)
-- * http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
++
-+ * https://developer.arm.com/documentation/101104/latest/
++#define DO_NOT(N) (~(N))
-  *
++
-  * QEMU interface:
++DO_1OP(vmvn, 8, uint64_t, DO_NOT)
-  *  + QOM property "CPUID": the value to use for the CPUID register
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 diff --git a/include/hw/misc/armsse-mhu.h b/include/hw/misc/armsse-mhu.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/armsse-mhu.h
+--- a/target/arm/translate-mve.c
-+++ b/include/hw/misc/armsse-mhu.h
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_1op *a)
- /*
+     };
-  * This is a model of the Message Handling Unit (MHU) which is part of the
+     return do_1op(s, a, fns[a->size]);
-  * Arm SSE-200 and documented in
+ }
-- * http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
++
-+ * https://developer.arm.com/documentation/101104/latest/
++static bool trans_VMVN(DisasContext *s, arg_1op *a)
-  *
++{
-  * QEMU interface:
++    return do_1op(s, a, gen_helper_mve_vmvn);
-  *  + sysbus MMIO region 0: the system information register bank
++}
 diff --git a/include/hw/misc/iotkit-secctl.h b/include/hw/misc/iotkit-secctl.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/misc/iotkit-secctl.h
 +++ b/include/hw/misc/iotkit-secctl.h
@@ -XXX,XX +XXX,XX @@
  /* This is a model of the security controller which is part of the
   * Arm IoT Kit and documented in
 - * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
 + * https://developer.arm.com/documentation/ecm0601256/latest
   *
   * QEMU interface:
   *  + sysbus MMIO region 0 is the "secure privilege control block" registers
 diff --git a/include/hw/misc/iotkit-sysctl.h b/include/hw/misc/iotkit-sysctl.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/misc/iotkit-sysctl.h
 +++ b/include/hw/misc/iotkit-sysctl.h
@@ -XXX,XX +XXX,XX @@
  /*
   * This is a model of the "system control element" which is part of the
   * Arm IoTKit and documented in
 - * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
 + * https://developer.arm.com/documentation/ecm0601256/latest
   * Specifically, it implements the "system information block" and
   * "system control register" blocks.
   *
 diff --git a/include/hw/misc/iotkit-sysinfo.h b/include/hw/misc/iotkit-sysinfo.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/misc/iotkit-sysinfo.h
 +++ b/include/hw/misc/iotkit-sysinfo.h
@@ -XXX,XX +XXX,XX @@
  /*
   * This is a model of the "system information block" which is part of the
   * Arm IoTKit and documented in
 - * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
 + * https://developer.arm.com/documentation/ecm0601256/latest
   * QEMU interface:
   *  + QOM property "SYS_VERSION": value to use for SYS_VERSION register
   *  + QOM property "SYS_CONFIG": value to use for SYS_CONFIG register
 diff --git a/include/hw/misc/mps2-fpgaio.h b/include/hw/misc/mps2-fpgaio.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/misc/mps2-fpgaio.h
 +++ b/include/hw/misc/mps2-fpgaio.h
@@ -XXX,XX +XXX,XX @@
  /* This is a model of the FPGAIO register block in the AN505
   * FPGA image for the MPS2 dev board; it is documented in the
   * application note:
 - * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
 + * https://developer.arm.com/documentation/dai0505/latest/
   *
   * QEMU interface:
   *  + sysbus MMIO region 0: the register bank
 diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/mps2-tz.c
 +++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
   * https://developer.arm.com/products/system-design/development-boards/fpga-prototyping-boards/mps2
   *
   * Board TRM:
 - * http://infocenter.arm.com/help/topic/com.arm.doc.100112_0200_06_en/versatile_express_cortex_m_prototyping_systems_v2m_mps2_and_v2m_mps2plus_technical_reference_100112_0200_06_en.pdf
 + * https://developer.arm.com/documentation/100112/latest/
   * Application Note AN505:
 - * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
 + * https://developer.arm.com/documentation/dai0505/latest/
   * Application Note AN521:
 - * http://infocenter.arm.com/help/topic/com.arm.doc.dai0521c/index.html
 + * https://developer.arm.com/documentation/dai0521/latest/
   * Application Note AN524:
   * https://developer.arm.com/documentation/dai0524/latest/
   *
   * The AN505 defers to the Cortex-M33 processor ARMv8M IoT Kit FVP User Guide
   * (ARM ECM0601256) for the details of some of the device layout:
 - *   http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
 + *  https://developer.arm.com/documentation/ecm0601256/latest
   * Similarly, the AN521 and AN524 use the SSE-200, and the SSE-200 TRM defines
   * most of the device layout:
 - *  http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
 - *
 + *  https://developer.arm.com/documentation/101104/latest/
   */
  #include "qemu/osdep.h"
 diff --git a/hw/misc/armsse-cpuid.c b/hw/misc/armsse-cpuid.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/armsse-cpuid.c
 +++ b/hw/misc/armsse-cpuid.c
@@ -XXX,XX +XXX,XX @@
  /*
   * This is a model of the "CPU_IDENTITY" register block which is part of the
   * Arm SSE-200 and documented in
 - * http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
 + * https://developer.arm.com/documentation/101104/latest/
   *
   * It consists of one read-only CPUID register (set by QOM property), plus the
   * usual ID registers.
 diff --git a/hw/misc/armsse-mhu.c b/hw/misc/armsse-mhu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/armsse-mhu.c
 +++ b/hw/misc/armsse-mhu.c
@@ -XXX,XX +XXX,XX @@
  /*
   * This is a model of the Message Handling Unit (MHU) which is part of the
   * Arm SSE-200 and documented in
 - * http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
 + * https://developer.arm.com/documentation/101104/latest/
   */
  #include "qemu/osdep.h"
 diff --git a/hw/misc/iotkit-sysctl.c b/hw/misc/iotkit-sysctl.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/iotkit-sysctl.c
 +++ b/hw/misc/iotkit-sysctl.c
@@ -XXX,XX +XXX,XX @@
  /*
   * This is a model of the "system control element" which is part of the
   * Arm IoTKit and documented in
 - * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
 + * https://developer.arm.com/documentation/ecm0601256/latest
   * Specifically, it implements the "system control register" blocks.
   */
 diff --git a/hw/misc/iotkit-sysinfo.c b/hw/misc/iotkit-sysinfo.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/iotkit-sysinfo.c
 +++ b/hw/misc/iotkit-sysinfo.c
@@ -XXX,XX +XXX,XX @@
  /*
   * This is a model of the "system information block" which is part of the
   * Arm IoTKit and documented in
 - * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
 + * https://developer.arm.com/documentation/ecm0601256/latest
   * It consists of 2 read-only version/config registers, plus the
   * usual ID registers.
   */
 diff --git a/hw/misc/mps2-fpgaio.c b/hw/misc/mps2-fpgaio.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/mps2-fpgaio.c
 +++ b/hw/misc/mps2-fpgaio.c
@@ -XXX,XX +XXX,XX @@
  /* This is a model of the "FPGA system control and I/O" block found
   * in the AN505 FPGA image for the MPS2 devboard.
   * It is documented in AN505:
 - * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
 + * https://developer.arm.com/documentation/dai0505/latest/
   */
  #include "qemu/osdep.h"
 diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/mps2-scc.c
 +++ b/hw/misc/mps2-scc.c
@@ -XXX,XX +XXX,XX @@
   * found in the FPGA images of MPS2 development boards.
   *
   * Documentation of it can be found in the MPS2 TRM:
 - * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100112_0100_03_en/index.html
 + * https://developer.arm.com/documentation/100112/latest/
   * and also in the Application Notes documenting individual FPGA images.
   */
 --
 .20.1

-[PULL 48/49] docs/system/arm/mps2.rst: Document the new mps3-an524 board
+[PULL 18/57] target/arm: Implement MVE VABS
-Add brief documentation of the new mps3-an524 board.
+Implement the MVE VABS functions (both integer and floating point).
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-24-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-8-peter.maydell@linaro.org
 ---
- docs/system/arm/mps2.rst | 24 ++++++++++++++++++------
+ target/arm/helper-mve.h    |  6 ++++++
-file changed, 18 insertions(+), 6 deletions(-)
+ target/arm/mve.decode      |  3 +++
  target/arm/mve_helper.c    | 13 +++++++++++++
  target/arm/translate-mve.c | 15 +++++++++++++++
 files changed, 37 insertions(+)
-diff --git a/docs/system/arm/mps2.rst b/docs/system/arm/mps2.rst
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/docs/system/arm/mps2.rst
+--- a/target/arm/helper-mve.h
-+++ b/docs/system/arm/mps2.rst
++++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vrev64h, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vrev64w, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vmvn, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vabsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vabsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vfabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vfabss, TCG_CALL_NO_WG, void, env, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VREV32           1111 1111 1 . 11 .. 00 ... 0 0000 11 . 0 ... 0 @1op
  VREV64           1111 1111 1 . 11 .. 00 ... 0 0000 01 . 0 ... 0 @1op
  VMVN             1111 1111 1 . 11 00 00 ... 0 0101 11 . 0 ... 0 @1op_nosz
 +
 +VABS             1111 1111 1 . 11 .. 01 ... 0 0011 01 . 0 ... 0 @1op
 +VABS_fp          1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
 @@ -XXX,XX +XXX,XX @@
--Arm MPS2 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``)
+ #include "exec/helper-proto.h"
--================================================================================================================
+ #include "exec/cpu_ldst.h"
-+Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``)
+ #include "exec/exec-all.h"
-+=========================================================================================================================================
++#include "tcg/tcg.h"
- These board models all use Arm M-profile CPUs.
+ static uint16_t mve_element_mask(CPUARMState *env)
+ {
--The Arm MPS2 and MPS2+ dev boards are FPGA based (the 2+ has a bigger
+@@ -XXX,XX +XXX,XX @@ DO_1OP(vrev64w, 8, uint64_t, wswap64)
--FPGA but is otherwise the same as the 2). Since the CPU itself
+ #define DO_NOT(N) (~(N))
--and most of the devices are in the FPGA, the details of the board
--as seen by the guest depend significantly on the FPGA image.
+ DO_1OP(vmvn, 8, uint64_t, DO_NOT)
 +The Arm MPS2, MPS2+ and MPS3 dev boards are FPGA based (the 2+ has a
 +bigger FPGA but is otherwise the same as the 2; the 3 has a bigger
 +FPGA again, can handle 4GB of RAM and has a USB controller and QSPI flash).
 +
-+Since the CPU itself and most of the devices are in the FPGA, the
++#define DO_ABS(N) ((N) < 0 ? -(N) : (N))
-+details of the board as seen by the guest depend significantly on the
++#define DO_FABSH(N)  ((N) & dup_const(MO_16, 0x7fff))
-+FPGA image.
++#define DO_FABSS(N)  ((N) & dup_const(MO_32, 0x7fffffff))
++
- QEMU models the following FPGA images:
++DO_1OP(vabsb, 1, int8_t, DO_ABS)
++DO_1OP(vabsh, 2, int16_t, DO_ABS)
-@@ -XXX,XX +XXX,XX @@ QEMU models the following FPGA images:
++DO_1OP(vabsw, 4, int32_t, DO_ABS)
-   Cortex-M3 'DesignStart' as documented in Arm Application Note AN511
++
- ``mps2-an521``
++/* We can do these 64 bits at a time */
-   Dual Cortex-M33 as documented in Arm Application Note AN521
++DO_1OP(vfabsh, 8, uint64_t, DO_FABSH)
-+``mps3-an524``
++DO_1OP(vfabss, 8, uint64_t, DO_FABSS)
-+  Dual Cortex-M33 on an MPS3, as documented in Arm Application Note AN524
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+index XXXXXXX..XXXXXXX 100644
- Differences between QEMU and real hardware:
+--- a/target/arm/translate-mve.c
++++ b/target/arm/translate-mve.c
- - AN385/AN386 remapping of low 16K of memory to either ZBT SSRAM1 or to
+@@ -XXX,XX +XXX,XX @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
-   block RAM is unimplemented (QEMU always maps this to ZBT SSRAM1, as
-   if zbt_boot_ctrl is always zero)
+ DO_1OP(VCLZ, vclz)
-+- AN524 remapping of low memory to either BRAM or to QSPI flash is
+ DO_1OP(VCLS, vcls)
-+  unimplemented (QEMU always maps this to BRAM, ignoring the
++DO_1OP(VABS, vabs)
-+  SCC CFG_REG0 memory-remap bit)
- - QEMU provides a LAN9118 ethernet rather than LAN9220; the only guest
+ static bool trans_VREV16(DisasContext *s, arg_1op *a)
-   visible difference is that the LAN9118 doesn't support checksum
+ {
-   offloading
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMVN(DisasContext *s, arg_1op *a)
-+- QEMU does not model the QSPI flash in MPS3 boards as real QSPI
+ {
-+  flash, but only as simple ROM, so attempting to rewrite the flash
+     return do_1op(s, a, gen_helper_mve_vmvn);
-+  from the guest will fail
+ }
-+- QEMU does not model the USB controller in MPS3 boards
++
 +static bool trans_VABS_fp(DisasContext *s, arg_1op *a)
 +{
 +    static MVEGenOneOpFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vfabsh,
 +        gen_helper_mve_vfabss,
 +        NULL,
 +    };
 +    if (!dc_isar_feature(aa32_mve_fp, s)) {
 +        return false;
 +    }
 +    return do_1op(s, a, fns[a->size]);
 +}
 --
 .20.1

-[PULL 39/49] hw/arm/mps2-tz: Size the uart-irq-orgate based on the number of UARTs
+[PULL 19/57] target/arm: Implement MVE VNEG
-We create an OR gate to wire together the overflow IRQs for all the
+Implement the MVE VNEG insn (both integer and floating point forms).
 UARTs on the board; this has to have twice the number of inputs as
 there are UARTs, since each UART feeds it a TX overflow and an RX
 overflow interrupt line.  Replace the hardcoded '10' with a
 calculation based on the size of the uart[] array in the
 MPS2TZMachineState.  (We rely on OR gate inputs that are never wired
 up or asserted being treated as always-zero.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-15-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-9-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 11 ++++++++---
+ target/arm/helper-mve.h    |  6 ++++++
-file changed, 8 insertions(+), 3 deletions(-)
+ target/arm/mve.decode      |  2 ++
  target/arm/mve_helper.c    | 12 ++++++++++++
  target/arm/translate-mve.c | 15 +++++++++++++++
 files changed, 35 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-      */
+ DEF_HELPER_FLAGS_3(mve_vabsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     memory_region_add_subregion(system_memory, 0x80000000, machine->ram);
+ DEF_HELPER_FLAGS_3(mve_vfabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+ DEF_HELPER_FLAGS_3(mve_vfabss, TCG_CALL_NO_WG, void, env, ptr, ptr)
--    /* The overflow IRQs for all UARTs are ORed together.
++
-+    /*
++DEF_HELPER_FLAGS_3(mve_vnegb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+     * The overflow IRQs for all UARTs are ORed together.
++DEF_HELPER_FLAGS_3(mve_vnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-      * Tx, Rx and "combined" IRQs are sent to the NVIC separately.
++DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
--     * Create the OR gate for this.
++DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+     * Create the OR gate for this: it has one input for the TX overflow
++DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+     * and one for the RX overflow for each UART we might have.
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-+     * (If the board has fewer than the maximum possible number of UARTs
+index XXXXXXX..XXXXXXX 100644
-+     * those inputs are never wired up and are treated as always-zero.)
+--- a/target/arm/mve.decode
-      */
++++ b/target/arm/mve.decode
-     object_initialize_child(OBJECT(mms), "uart-irq-orgate",
+@@ -XXX,XX +XXX,XX @@ VMVN             1111 1111 1 . 11 00 00 ... 0 0101 11 . 0 ... 0 @1op_nosz
-                             &mms->uart_irq_orgate, TYPE_OR_IRQ);
--    object_property_set_int(OBJECT(&mms->uart_irq_orgate), "num-lines", 10,
+ VABS             1111 1111 1 . 11 .. 01 ... 0 0011 01 . 0 ... 0 @1op
-+    object_property_set_int(OBJECT(&mms->uart_irq_orgate), "num-lines",
+ VABS_fp          1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
-+                            2 * ARRAY_SIZE(mms->uart),
++VNEG             1111 1111 1 . 11 .. 01 ... 0 0011 11 . 0 ... 0 @1op
-                             &error_fatal);
++VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
-     qdev_realize(DEVICE(&mms->uart_irq_orgate), NULL, &error_fatal);
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-     qdev_connect_gpio_out(DEVICE(&mms->uart_irq_orgate), 0,
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_1OP(vabsw, 4, int32_t, DO_ABS)
  /* We can do these 64 bits at a time */
  DO_1OP(vfabsh, 8, uint64_t, DO_FABSH)
  DO_1OP(vfabss, 8, uint64_t, DO_FABSS)
 +
 +#define DO_NEG(N)    (-(N))
 +#define DO_FNEGH(N) ((N) ^ dup_const(MO_16, 0x8000))
 +#define DO_FNEGS(N) ((N) ^ dup_const(MO_32, 0x80000000))
 +
 +DO_1OP(vnegb, 1, int8_t, DO_NEG)
 +DO_1OP(vnegh, 2, int16_t, DO_NEG)
 +DO_1OP(vnegw, 4, int32_t, DO_NEG)
 +
 +/* We can do these 64 bits at a time */
 +DO_1OP(vfnegh, 8, uint64_t, DO_FNEGH)
 +DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
  DO_1OP(VCLZ, vclz)
  DO_1OP(VCLS, vcls)
  DO_1OP(VABS, vabs)
 +DO_1OP(VNEG, vneg)
  static bool trans_VREV16(DisasContext *s, arg_1op *a)
  {
@@ -XXX,XX +XXX,XX @@ static bool trans_VABS_fp(DisasContext *s, arg_1op *a)
      }
      return do_1op(s, a, fns[a->size]);
  }
 +
 +static bool trans_VNEG_fp(DisasContext *s, arg_1op *a)
 +{
 +    static MVEGenOneOpFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vfnegh,
 +        gen_helper_mve_vfnegs,
 +        NULL,
 +    };
 +    if (!dc_isar_feature(aa32_mve_fp, s)) {
 +        return false;
 +    }
 +    return do_1op(s, a, fns[a->size]);
 +}
 --
 .20.1

-[PULL 26/49] hw/arm/mps2-tz: Make SYSCLK frequency board-specific
+[PULL 20/57] tcg: Make gen_dup_i32/i64() public as tcg_gen_dup_i32/i64
-The AN524 has a different SYSCLK frequency from the AN505 and AN521;
+The Arm MVE VDUP implementation would like to be able to emit code to
-make the SYSCLK frequency a field in the MPS2TZMachineClass rather
+duplicate a byte or halfword value into an i32.  We have code to do
-than a compile-time constant so we can support the AN524.
+this already in tcg-op-gvec.c, so all we need to do is make the
 functions global.
+For consistency with other functions made available to the frontends:
+ * we rename to tcg_gen_dup_*
+ * we expose both the _i32 and _i64 forms
+ * we provide the #define for a _tl form
+Suggested-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210617121628.20116-10-peter.maydell@linaro.org
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210215115138.20465-2-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 10 ++++++----
+ include/tcg/tcg-op.h |  8 ++++++++
-file changed, 6 insertions(+), 4 deletions(-)
+ include/tcg/tcg.h    |  1 -
  tcg/tcg-op-gvec.c    | 20 ++++++++++----------
 files changed, 18 insertions(+), 11 deletions(-)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/include/tcg/tcg-op.h
-+++ b/hw/arm/mps2-tz.c
++++ b/include/tcg/tcg-op.h
-@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineClass {
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
-     MachineClass parent;
+ void tcg_gen_umax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
-     MPS2TZFPGAType fpga_type;
+ void tcg_gen_abs_i32(TCGv_i32, TCGv_i32);
-     uint32_t scc_id;
-+    uint32_t sysclk_frq; /* Main SYSCLK frequency in Hz */
++/* Replicate a value of size @vece from @in to all the lanes in @out */
-     const char *armsse_type;
++void tcg_gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in);
- };
++
+ static inline void tcg_gen_discard_i32(TCGv_i32 arg)
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
  OBJECT_DECLARE_TYPE(MPS2TZMachineState, MPS2TZMachineClass, MPS2TZ_MACHINE)
 -/* Main SYSCLK frequency in Hz */
 -#define SYSCLK_FRQ 20000000
  /* Slow 32Khz S32KCLK frequency in Hz */
  #define S32KCLK_FRQ (32 * 1000)
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_unimp_dev(MPS2TZMachineState *mms,
  static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
                                 const char *name, hwaddr size)
  {
-+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+     tcg_gen_op1_i32(INDEX_op_discard, arg);
-     CMSDKAPBUART *uart = opaque;
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
-     int i = uart - &mms->uart[0];
+ void tcg_gen_umax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
-     int rxirqno = i * 2;
+ void tcg_gen_abs_i64(TCGv_i64, TCGv_i64);
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
++/* Replicate a value of size @vece from @in to all the lanes in @out */
-     object_initialize_child(OBJECT(mms), name, uart, TYPE_CMSDK_APB_UART);
++void tcg_gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in);
-     qdev_prop_set_chr(DEVICE(uart), "chardev", serial_hd(i));
++
--    qdev_prop_set_uint32(DEVICE(uart), "pclk-frq", SYSCLK_FRQ);
+ #if TCG_TARGET_REG_BITS == 64
-+    qdev_prop_set_uint32(DEVICE(uart), "pclk-frq", mmc->sysclk_frq);
+ static inline void tcg_gen_discard_i64(TCGv_i64 arg)
-     sysbus_realize(SYS_BUS_DEVICE(uart), &error_fatal);
+ {
-     s = SYS_BUS_DEVICE(uart);
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
-     sysbus_connect_irq(s, 0, get_sse_irq_in(mms, txirqno));
+ #define tcg_gen_atomic_smax_fetch_tl tcg_gen_atomic_smax_fetch_i64
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+ #define tcg_gen_atomic_umax_fetch_tl tcg_gen_atomic_umax_fetch_i64
+ #define tcg_gen_dup_tl_vec  tcg_gen_dup_i64_vec
-     /* These clocks don't need migration because they are fixed-frequency */
++#define tcg_gen_dup_tl tcg_gen_dup_i64
-     mms->sysclk = clock_new(OBJECT(machine), "SYSCLK");
+ #else
--    clock_set_hz(mms->sysclk, SYSCLK_FRQ);
+ #define tcg_gen_movi_tl tcg_gen_movi_i32
-+    clock_set_hz(mms->sysclk, mmc->sysclk_frq);
+ #define tcg_gen_mov_tl tcg_gen_mov_i32
-     mms->s32kclk = clock_new(OBJECT(machine), "S32KCLK");
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
-     clock_set_hz(mms->s32kclk, S32KCLK_FRQ);
+ #define tcg_gen_atomic_smax_fetch_tl tcg_gen_atomic_smax_fetch_i32
+ #define tcg_gen_atomic_umax_fetch_tl tcg_gen_atomic_umax_fetch_i32
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
+ #define tcg_gen_dup_tl_vec  tcg_gen_dup_i32_vec
-     mmc->fpga_type = FPGA_AN505;
++#define tcg_gen_dup_tl tcg_gen_dup_i32
-     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
+ #endif
-     mmc->scc_id = 0x41045050;
-+    mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
+ #if UINTPTR_MAX == UINT32_MAX
-     mmc->armsse_type = TYPE_IOTKIT;
+diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/tcg/tcg.h
 +++ b/include/tcg/tcg.h
@@ -XXX,XX +XXX,XX @@ uint64_t dup_const(unsigned vece, uint64_t c);
          : (qemu_build_not_reached_always(), 0))                    \
       : dup_const(VECE, C))
 -
  /*
   * Memory helpers that will be used by TCG generated code.
   */
 diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tcg/tcg-op-gvec.c
 +++ b/tcg/tcg-op-gvec.c
@@ -XXX,XX +XXX,XX @@ uint64_t (dup_const)(unsigned vece, uint64_t c)
  }
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
+ /* Duplicate IN into OUT as per VECE.  */
-     mmc->fpga_type = FPGA_AN521;
+-static void gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
-     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
++void tcg_gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
-     mmc->scc_id = 0x41045210;
+ {
-+    mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
+     switch (vece) {
-     mmc->armsse_type = TYPE_SSE200;
+     case MO_8:
@@ -XXX,XX +XXX,XX @@ static void gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
      }
  }
+-static void gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in)
++void tcg_gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in)
+ {
+     switch (vece) {
+     case MO_8:
+@@ -XXX,XX +XXX,XX @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz,
+                 && (vece != MO_32 || !check_size_impl(oprsz, 4))) {
+                 t_64 = tcg_temp_new_i64();
+                 tcg_gen_extu_i32_i64(t_64, in_32);
+-                gen_dup_i64(vece, t_64, t_64);
++                tcg_gen_dup_i64(vece, t_64, t_64);
+             } else {
+                 t_32 = tcg_temp_new_i32();
+-                gen_dup_i32(vece, t_32, in_32);
++                tcg_gen_dup_i32(vece, t_32, in_32);
+             }
+         } else if (in_64) {
+             /* We are given a 64-bit variable input.  */
+             t_64 = tcg_temp_new_i64();
+-            gen_dup_i64(vece, t_64, in_64);
++            tcg_gen_dup_i64(vece, t_64, in_64);
+         } else {
+             /* We are given a constant input.  */
+             /* For 64-bit hosts, use 64-bit constants for "simple" constants
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_2s(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
+     } else if (g->fni8 && check_size_impl(oprsz, 8)) {
+         TCGv_i64 t64 = tcg_temp_new_i64();
+-        gen_dup_i64(g->vece, t64, c);
++        tcg_gen_dup_i64(g->vece, t64, c);
+         expand_2s_i64(dofs, aofs, oprsz, t64, g->scalar_first, g->fni8);
+         tcg_temp_free_i64(t64);
+     } else if (g->fni4 && check_size_impl(oprsz, 4)) {
+         TCGv_i32 t32 = tcg_temp_new_i32();
+         tcg_gen_extrl_i64_i32(t32, c);
+-        gen_dup_i32(g->vece, t32, t32);
++        tcg_gen_dup_i32(g->vece, t32, t32);
+         expand_2s_i32(dofs, aofs, oprsz, t32, g->scalar_first, g->fni4);
+         tcg_temp_free_i32(t32);
+     } else {
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_ands(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
+ {
+     TCGv_i64 tmp = tcg_temp_new_i64();
+-    gen_dup_i64(vece, tmp, c);
++    tcg_gen_dup_i64(vece, tmp, c);
+     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_ands);
+     tcg_temp_free_i64(tmp);
+ }
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_xors(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
+ {
+     TCGv_i64 tmp = tcg_temp_new_i64();
+-    gen_dup_i64(vece, tmp, c);
++    tcg_gen_dup_i64(vece, tmp, c);
+     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_xors);
+     tcg_temp_free_i64(tmp);
+ }
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_ors(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
+ {
+     TCGv_i64 tmp = tcg_temp_new_i64();
+-    gen_dup_i64(vece, tmp, c);
++    tcg_gen_dup_i64(vece, tmp, c);
+     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_ors);
+     tcg_temp_free_i64(tmp);
+ }
 --
 .20.1

-[PULL 46/49] hw/arm/mps2-tz: Stub out USB controller for mps3-an524
+[PULL 21/57] target/arm: Implement MVE VDUP
-The AN524 has a USB controller (an ISP1763); we don't have a model of
+Implement the MVE VDUP insn, which duplicates a value from
-it but we should provide a stub "unimplemented-device" for it.  This
+a general-purpose register into every lane of a vector
-is slightly complicated because the USB controller shares a PPC port
+register (subject to predication).
 with the ethernet controller.
 Implement a make_* function which provides creates a container
 MemoryRegion with both the ethernet controller and an
 unimplemented-device stub for the USB controller.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-22-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-11-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
+ target/arm/helper-mve.h    |  2 ++
-file changed, 47 insertions(+), 1 deletion(-)
+ target/arm/mve.decode      | 10 ++++++++++
  target/arm/mve_helper.c    | 16 ++++++++++++++++
  target/arm/translate-mve.c | 27 +++++++++++++++++++++++++++
 files changed, 55 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
-     ARMSSE iotkit;
+ DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
-     MemoryRegion ram[MPS2TZ_RAM_MAX];
-+    MemoryRegion eth_usb_container;
++DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
 +
-     MPS2SCC scc;
+ DEF_HELPER_FLAGS_3(mve_vclsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     MPS2FPGAIO fpgaio;
+ DEF_HELPER_FLAGS_3(mve_vclsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     TZPPC ppc[5];
+ DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-     UnimplementedDeviceState gfx;
+index XXXXXXX..XXXXXXX 100644
-     UnimplementedDeviceState cldc;
+--- a/target/arm/mve.decode
-     UnimplementedDeviceState rtc;
++++ b/target/arm/mve.decode
-+    UnimplementedDeviceState usb;
+@@ -XXX,XX +XXX,XX @@
-     PL080State dma[4];
-     TZMSC msc[4];
+ %qd 22:1 13:3
-     CMSDKAPBUART uart[6];
+ %qm 5:1 1:3
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
++%qn 7:1 17:3
-     return sysbus_mmio_get_region(s, 0);
- }
+ &vldr_vstr rn qd imm p a w size l u
+ &1op qd qm size
-+static MemoryRegion *make_eth_usb(MPS2TZMachineState *mms, void *opaque,
+@@ -XXX,XX +XXX,XX @@ VABS             1111 1111 1 . 11 .. 01 ... 0 0011 01 . 0 ... 0 @1op
-+                                  const char *name, hwaddr size,
+ VABS_fp          1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
-+                                  const int *irqs)
+ VNEG             1111 1111 1 . 11 .. 01 ... 0 0011 11 . 0 ... 0 @1op
  VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
 +
 +&vdup qd rt size
 +# Qd is in the fields usually named Qn
 +@vdup            .... .... . . .. ... . rt:4 .... . . . . .... qd=%qn &vdup
 +
 +# B and E bits encode size, which we decode here to the usual size values
 +VDUP             1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
 +VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
 +VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static void mergemask_sq(int64_t *d, int64_t r, uint16_t mask)
               uint64_t *: mergemask_uq,          \
               int64_t *:  mergemask_sq)(D, R, M)
 +void HELPER(mve_vdup)(CPUARMState *env, void *vd, uint32_t val)
 +{
 +    /*
-+     * The AN524 makes the ethernet and USB share a PPC port.
++     * The generated code already replicated an 8 or 16 bit constant
-+     * irqs[] is the ethernet IRQ.
++     * into the 32-bit value, so we only need to write the 32-bit
 +     * value to all elements of the Qreg, allowing for predication.
 +     */
-+    SysBusDevice *s;
++    uint32_t *d = vd;
-+    NICInfo *nd = &nd_table[0];
++    uint16_t mask = mve_element_mask(env);
-+
++    unsigned e;
-+    memory_region_init(&mms->eth_usb_container, OBJECT(mms),
++    for (e = 0; e < 16 / 4; e++, mask >>= 4) {
-+                       "mps2-tz-eth-usb-container", 0x200000);
++        mergemask(&d[H4(e)], val, mask);
-+
++    }
-+    /*
++    mve_advance_vpt(env);
 +     * In hardware this is a LAN9220; the LAN9118 is software compatible
 +     * except that it doesn't support the checksum-offload feature.
 +     */
 +    qemu_check_nic_model(nd, "lan9118");
 +    mms->lan9118 = qdev_new(TYPE_LAN9118);
 +    qdev_set_nic_properties(mms->lan9118, nd);
 +
 +    s = SYS_BUS_DEVICE(mms->lan9118);
 +    sysbus_realize_and_unref(s, &error_fatal);
 +    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
 +
 +    memory_region_add_subregion(&mms->eth_usb_container,
 +                                0, sysbus_mmio_get_region(s, 0));
 +
 +    /* The USB OTG controller is an ISP1763; we don't have a model of it. */
 +    object_initialize_child(OBJECT(mms), "usb-otg",
 +                            &mms->usb, TYPE_UNIMPLEMENTED_DEVICE);
 +    qdev_prop_set_string(DEVICE(&mms->usb), "name", "usb-otg");
 +    qdev_prop_set_uint64(DEVICE(&mms->usb), "size", 0x100000);
 +    s = SYS_BUS_DEVICE(&mms->usb);
 +    sysbus_realize(s, &error_fatal);
 +
 +    memory_region_add_subregion(&mms->eth_usb_container,
 +                                0x100000, sysbus_mmio_get_region(s, 0));
 +
 +    return &mms->eth_usb_container;
 +}
 +
- static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
+ #define DO_1OP(OP, ESIZE, TYPE, FN)                                     \
-                               const char *name, hwaddr size,
+     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
-                               const int *irqs)
+     {                                                                   \
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-                 { "gpio1", make_unimp_dev, &mms->gpio[1], 0x41101000, 0x1000 },
+index XXXXXXX..XXXXXXX 100644
-                 { "gpio2", make_unimp_dev, &mms->gpio[2], 0x41102000, 0x1000 },
+--- a/target/arm/translate-mve.c
-                 { "gpio3", make_unimp_dev, &mms->gpio[3], 0x41103000, 0x1000 },
++++ b/target/arm/translate-mve.c
--                { "eth", make_eth_dev, NULL, 0x41400000, 0x100000, { 48 } },
+@@ -XXX,XX +XXX,XX @@ DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
-+                { "eth-usb", make_eth_usb, NULL, 0x41400000, 0x200000, { 48 } },
+ DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
-             },
+ DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
-         },
-     };
++static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
 +{
 +    TCGv_ptr qd;
 +    TCGv_i32 rt;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd)) {
 +        return false;
 +    }
 +    if (a->rt == 13 || a->rt == 15) {
 +        /* UNPREDICTABLE; we choose to UNDEF */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    rt = load_reg(s, a->rt);
 +    tcg_gen_dup_i32(a->size, rt, rt);
 +    gen_helper_mve_vdup(cpu_env, qd, rt);
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_i32(rt);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
  static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
  {
      TCGv_ptr qd, qm;
 --
 .20.1

-[PULL 35/49] hw/misc/mps2-scc: Implement CFG_REG5 and CFG_REG6 for MPS3 AN524
+[PULL 22/57] target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
-The AN524 version of the SCC interface has different behaviour for
+Implement the MVE vector logical operations operating
-some of the CFG registers; implement it.
+on two registers.
 Each board in this family can have minor differences in the meaning
 of the CFG registers, so rather than trying to specify all the
 possible semantics via individual device properties, we make the
 behaviour conditional on the part-number field of the SCC_ID register
 which the board code already passes us.
 For the AN524, the differences are:
  * CFG3 is reserved rather than being board switches
  * CFG5 is a new register ("ACLK Frequency in Hz")
  * CFG6 is a new register ("Clock divider for BRAM")
 We implement both of the new registers as reads-as-written.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-11-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-12-peter.maydell@linaro.org
 ---
- include/hw/misc/mps2-scc.h |  3 ++
+ target/arm/helper-mve.h    |  6 ++++++
- hw/misc/mps2-scc.c         | 71 ++++++++++++++++++++++++++++++++++++--
+ target/arm/mve.decode      |  9 +++++++++
-files changed, 72 insertions(+), 2 deletions(-)
+ target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
  target/arm/translate-mve.c | 37 +++++++++++++++++++++++++++++++++++++
 files changed, 78 insertions(+)
-diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/mps2-scc.h
+--- a/target/arm/helper-mve.h
-+++ b/include/hw/misc/mps2-scc.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ struct MPS2SCC {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+ DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     uint32_t cfg0;
+ DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     uint32_t cfg1;
+ DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+    uint32_t cfg2;
++
-     uint32_t cfg4;
++DEF_HELPER_FLAGS_4(mve_vand, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    uint32_t cfg5;
++DEF_HELPER_FLAGS_4(mve_vbic, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    uint32_t cfg6;
++DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     uint32_t cfgdata_rtn;
++DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     uint32_t cfgdata_out;
++DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     uint32_t cfgctrl;
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/mps2-scc.c
+--- a/target/arm/mve.decode
-+++ b/hw/misc/mps2-scc.c
++++ b/target/arm/mve.decode
 @@ -XXX,XX +XXX,XX @@
- REG32(CFG0, 0)
+ &vldr_vstr rn qd imm p a w size l u
- REG32(CFG1, 4)
+ &1op qd qm size
-+REG32(CFG2, 8)
++&2op qd qm qn size
- REG32(CFG3, 0xc)
- REG32(CFG4, 0x10)
+ @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
-+REG32(CFG5, 0x14)
+ # Note that both Rn and Qd are 3 bits only (no D bit)
-+REG32(CFG6, 0x18)
+@@ -XXX,XX +XXX,XX @@
- REG32(CFGDATA_RTN, 0xa0)
- REG32(CFGDATA_OUT, 0xa4)
+ @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
- REG32(CFGCTRL, 0xa8)
+ @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
-@@ -XXX,XX +XXX,XX @@ REG32(DLL, 0x100)
++@2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
- REG32(AID, 0xFF8)
- REG32(ID, 0xFFC)
+ # Vector loads and stores
-+static int scc_partno(MPS2SCC *s)
+@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
  VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
                   size=2 p=1
 +# Vector 2-op
 +VAND             1110 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 +VBIC             1110 1111 0 . 01 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 +VORR             1110 1111 0 . 10 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 +VORN             1110 1111 0 . 11 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 +VEOR             1111 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_1OP(vnegw, 4, int32_t, DO_NEG)
  /* We can do these 64 bits at a time */
  DO_1OP(vfnegh, 8, uint64_t, DO_FNEGH)
  DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
 +
 +#define DO_2OP(OP, ESIZE, TYPE, FN)                                     \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env,                       \
 +                                void *vd, void *vn, void *vm)           \
 +    {                                                                   \
 +        TYPE *d = vd, *n = vn, *m = vm;                                 \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            mergemask(&d[H##ESIZE(e)],                                  \
 +                      FN(n[H##ESIZE(e)], m[H##ESIZE(e)]), mask);        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define DO_AND(N, M)  ((N) & (M))
 +#define DO_BIC(N, M)  ((N) & ~(M))
 +#define DO_ORR(N, M)  ((N) | (M))
 +#define DO_ORN(N, M)  ((N) | ~(M))
 +#define DO_EOR(N, M)  ((N) ^ (M))
 +
 +DO_2OP(vand, 8, uint64_t, DO_AND)
 +DO_2OP(vbic, 8, uint64_t, DO_BIC)
 +DO_2OP(vorr, 8, uint64_t, DO_ORR)
 +DO_2OP(vorn, 8, uint64_t, DO_ORN)
 +DO_2OP(veor, 8, uint64_t, DO_EOR)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
  typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 +typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool trans_VNEG_fp(DisasContext *s, arg_1op *a)
      }
      return do_1op(s, a, fns[a->size]);
  }
 +
 +static bool do_2op(DisasContext *s, arg_2op *a, MVEGenTwoOpFn fn)
 +{
-+    /* Return the partno field of the SCC_ID (0x524, 0x511, etc) */
++    TCGv_ptr qd, qn, qm;
-+    return extract32(s->id, 4, 8);
++
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd | a->qn | a->qm) ||
 +        !fn) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    qn = mve_qreg_ptr(a->qn);
 +    qm = mve_qreg_ptr(a->qm);
 +    fn(cpu_env, qd, qn, qm);
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_ptr(qn);
 +    tcg_temp_free_ptr(qm);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
- /* Handle a write via the SYS_CFG channel to the specified function/device.
++#define DO_LOGIC(INSN, HELPER)                                  \
-  * Return false on error (reported to guest via SYS_CFGCTRL ERROR bit).
++    static bool trans_##INSN(DisasContext *s, arg_2op *a)       \
-  */
++    {                                                           \
-@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
++        return do_2op(s, a, HELPER);                            \
-     case A_CFG1:
++    }
-         r = s->cfg1;
++
-         break;
++DO_LOGIC(VAND, gen_helper_mve_vand)
-+    case A_CFG2:
++DO_LOGIC(VBIC, gen_helper_mve_vbic)
-+        if (scc_partno(s) != 0x524) {
++DO_LOGIC(VORR, gen_helper_mve_vorr)
-+            /* CFG2 reserved on other boards */
++DO_LOGIC(VORN, gen_helper_mve_vorn)
-+            goto bad_offset;
++DO_LOGIC(VEOR, gen_helper_mve_veor)
 +        }
 +        r = s->cfg2;
 +        break;
      case A_CFG3:
 +        if (scc_partno(s) == 0x524) {
 +            /* CFG3 reserved on AN524 */
 +            goto bad_offset;
 +        }
          /* These are user-settable DIP switches on the board. We don't
           * model that, so just return zeroes.
           */
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
      case A_CFG4:
          r = s->cfg4;
          break;
 +    case A_CFG5:
 +        if (scc_partno(s) != 0x524) {
 +            /* CFG5 reserved on other boards */
 +            goto bad_offset;
 +        }
 +        r = s->cfg5;
 +        break;
 +    case A_CFG6:
 +        if (scc_partno(s) != 0x524) {
 +            /* CFG6 reserved on other boards */
 +            goto bad_offset;
 +        }
 +        r = s->cfg6;
 +        break;
      case A_CFGDATA_RTN:
          r = s->cfgdata_rtn;
          break;
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
          r = s->id;
          break;
      default:
 +    bad_offset:
          qemu_log_mask(LOG_GUEST_ERROR,
                        "MPS2 SCC read: bad offset %x\n", (int) offset);
          r = 0;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
              led_set_state(s->led[i], extract32(value, i, 1));
          }
          break;
 +    case A_CFG2:
 +        if (scc_partno(s) != 0x524) {
 +            /* CFG2 reserved on other boards */
 +            goto bad_offset;
 +        }
 +        /* AN524: QSPI Select signal */
 +        s->cfg2 = value;
 +        break;
 +    case A_CFG5:
 +        if (scc_partno(s) != 0x524) {
 +            /* CFG5 reserved on other boards */
 +            goto bad_offset;
 +        }
 +        /* AN524: ACLK frequency in Hz */
 +        s->cfg5 = value;
 +        break;
 +    case A_CFG6:
 +        if (scc_partno(s) != 0x524) {
 +            /* CFG6 reserved on other boards */
 +            goto bad_offset;
 +        }
 +        /* AN524: Clock divider for BRAM */
 +        s->cfg6 = value;
 +        break;
      case A_CFGDATA_OUT:
          s->cfgdata_out = value;
          break;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
          s->dll = deposit32(s->dll, 24, 8, extract32(value, 24, 8));
          break;
      default:
 +    bad_offset:
          qemu_log_mask(LOG_GUEST_ERROR,
                        "MPS2 SCC write: bad offset 0x%x\n", (int) offset);
          break;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_reset(DeviceState *dev)
      trace_mps2_scc_reset();
      s->cfg0 = 0;
      s->cfg1 = 0;
 +    s->cfg2 = 0;
 +    s->cfg5 = 0;
 +    s->cfg6 = 0;
      s->cfgdata_rtn = 0;
      s->cfgdata_out = 0;
      s->cfgctrl = 0x100000;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_realize(DeviceState *dev, Error **errp)
  static const VMStateDescription mps2_scc_vmstate = {
      .name = "mps2-scc",
 -    .version_id = 2,
 -    .minimum_version_id = 2,
 +    .version_id = 3,
 +    .minimum_version_id = 3,
      .fields = (VMStateField[]) {
          VMSTATE_UINT32(cfg0, MPS2SCC),
          VMSTATE_UINT32(cfg1, MPS2SCC),
 +        VMSTATE_UINT32(cfg2, MPS2SCC),
 +        /* cfg3, cfg4 are read-only so need not be migrated */
 +        VMSTATE_UINT32(cfg5, MPS2SCC),
 +        VMSTATE_UINT32(cfg6, MPS2SCC),
          VMSTATE_UINT32(cfgdata_rtn, MPS2SCC),
          VMSTATE_UINT32(cfgdata_out, MPS2SCC),
          VMSTATE_UINT32(cfgctrl, MPS2SCC),
 --
 .20.1

-[PULL 43/49] hw/arm/mps2-tz: Support ROMs as well as RAMs
+[PULL 23/57] target/arm: Implement MVE VADD, VSUB, VMUL
-The AN505 and AN521 don't have any read-only memory, but the AN524
+Implement the MVE VADD, VSUB and VMUL insns.
 does; add a flag to ROMInfo to mark a region as ROM.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-19-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-13-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 6 ++++++
+ target/arm/helper-mve.h    | 12 ++++++++++++
-file changed, 6 insertions(+)
+ target/arm/mve.decode      |  5 +++++
  target/arm/mve_helper.c    | 14 ++++++++++++++
  target/arm/translate-mve.c | 16 ++++++++++++++++
 files changed, 47 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ typedef struct RAMInfo {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vbic, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-  * Flag values:
+ DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-  *  IS_ALIAS: this RAM area is an alias to the upstream end of the
+ DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-  *    MPC specified by its .mpc value
+ DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+ *  IS_ROM: this RAM area is read-only
++
-  */
++DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #define IS_ALIAS 1
++DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+#define IS_ROM 2
++DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++
- struct MPS2TZMachineClass {
++DEF_HELPER_FLAGS_4(mve_vsubb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     MachineClass parent;
++DEF_HELPER_FLAGS_4(mve_vsubh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *mr_for_raminfo(MPS2TZMachineState *mms,
++DEF_HELPER_FLAGS_4(mve_vsubw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     if (raminfo->mrindex < 0) {
++
-         /* Means this RAMInfo is for QEMU's "system memory" */
++DEF_HELPER_FLAGS_4(mve_vmulb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         MachineState *machine = MACHINE(mms);
++DEF_HELPER_FLAGS_4(mve_vmulh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+        assert(!(raminfo->flags & IS_ROM));
++DEF_HELPER_FLAGS_4(mve_vmulw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         return machine->ram;
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
  @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
 +@2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
  @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
  # Vector loads and stores
@@ -XXX,XX +XXX,XX @@ VORR             1110 1111 0 . 10 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
  VORN             1110 1111 0 . 11 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
  VEOR             1111 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 +VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 +VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 +VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
          mve_advance_vpt(env);                                           \
      }
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *mr_for_raminfo(MPS2TZMachineState *mms,
++/* provide unsigned 2-op helpers for all sizes */
++#define DO_2OP_U(OP, FN)                        \
-     memory_region_init_ram(ram, NULL, raminfo->name,
++    DO_2OP(OP##b, 1, uint8_t, FN)               \
-                            raminfo->size, &error_fatal);
++    DO_2OP(OP##h, 2, uint16_t, FN)              \
-+    if (raminfo->flags & IS_ROM) {
++    DO_2OP(OP##w, 4, uint32_t, FN)
-+        memory_region_set_readonly(ram, true);
++
  #define DO_AND(N, M)  ((N) & (M))
  #define DO_BIC(N, M)  ((N) & ~(M))
  #define DO_ORR(N, M)  ((N) | (M))
@@ -XXX,XX +XXX,XX @@ DO_2OP(vbic, 8, uint64_t, DO_BIC)
  DO_2OP(vorr, 8, uint64_t, DO_ORR)
  DO_2OP(vorn, 8, uint64_t, DO_ORN)
  DO_2OP(veor, 8, uint64_t, DO_EOR)
 +
 +#define DO_ADD(N, M) ((N) + (M))
 +#define DO_SUB(N, M) ((N) - (M))
 +#define DO_MUL(N, M) ((N) * (M))
 +
 +DO_2OP_U(vadd, DO_ADD)
 +DO_2OP_U(vsub, DO_SUB)
 +DO_2OP_U(vmul, DO_MUL)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_LOGIC(VBIC, gen_helper_mve_vbic)
  DO_LOGIC(VORR, gen_helper_mve_vorr)
  DO_LOGIC(VORN, gen_helper_mve_vorn)
  DO_LOGIC(VEOR, gen_helper_mve_veor)
 +
 +#define DO_2OP(INSN, FN) \
 +    static bool trans_##INSN(DisasContext *s, arg_2op *a)       \
 +    {                                                           \
 +        static MVEGenTwoOpFn * const fns[] = {                  \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +            gen_helper_mve_##FN##w,                             \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_2op(s, a, fns[a->size]);                      \
 +    }
-     return ram;
++
- }
++DO_2OP(VADD, vadd)
++DO_2OP(VSUB, vsub)
 +DO_2OP(VMUL, vmul)
 --
 .20.1

-[PULL 47/49] hw/arm/mps2-tz: Provide PL031 RTC on mps3-an524
+[PULL 24/57] target/arm: Implement MVE VMULH
-The AN524 has a PL031 RTC, which we have a model of; provide it
+Implement the MVE VMULH insn, which performs a vector
-rather than an unimplemented-device stub.
+multiply and returns the high half of the result.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-23-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-14-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 22 ++++++++++++++++++++--
+ target/arm/helper-mve.h    |  7 +++++++
-file changed, 20 insertions(+), 2 deletions(-)
+ target/arm/mve.decode      |  3 +++
  target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 38 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vsubw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #include "hw/misc/tz-msc.h"
+ DEF_HELPER_FLAGS_4(mve_vmulb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #include "hw/arm/armsse.h"
+ DEF_HELPER_FLAGS_4(mve_vmulh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #include "hw/dma/pl080.h"
+ DEF_HELPER_FLAGS_4(mve_vmulw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+#include "hw/rtc/pl031.h"
++
- #include "hw/ssi/pl022.h"
++DEF_HELPER_FLAGS_4(mve_vmulhsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #include "hw/i2c/arm_sbcon_i2c.h"
++DEF_HELPER_FLAGS_4(mve_vmulhsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #include "hw/net/lan9118.h"
++DEF_HELPER_FLAGS_4(mve_vmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
++DEF_HELPER_FLAGS_4(mve_vmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     UnimplementedDeviceState gpio[4];
++DEF_HELPER_FLAGS_4(mve_vmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     UnimplementedDeviceState gfx;
++DEF_HELPER_FLAGS_4(mve_vmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     UnimplementedDeviceState cldc;
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
--    UnimplementedDeviceState rtc;
+index XXXXXXX..XXXXXXX 100644
-     UnimplementedDeviceState usb;
+--- a/target/arm/mve.decode
-+    PL031State rtc;
++++ b/target/arm/mve.decode
-     PL080State dma[4];
+@@ -XXX,XX +XXX,XX @@ VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
-     TZMSC msc[4];
+ VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
-     CMSDKAPBUART uart[6];
+ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_i2c(MPS2TZMachineState *mms, void *opaque,
-     return sysbus_mmio_get_region(s, 0);
++VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
- }
++VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
++
-+static MemoryRegion *make_rtc(MPS2TZMachineState *mms, void *opaque,
+ # Vector miscellaneous
-+                              const char *name, hwaddr size,
-+                              const int *irqs)
+ VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(veor, 8, uint64_t, DO_EOR)
  DO_2OP_U(vadd, DO_ADD)
  DO_2OP_U(vsub, DO_SUB)
  DO_2OP_U(vmul, DO_MUL)
 +
 +/*
 + * Because the computation type is at least twice as large as required,
 + * these work for both signed and unsigned source types.
 + */
 +static inline uint8_t do_mulh_b(int32_t n, int32_t m)
 +{
-+    PL031State *pl031 = opaque;
++    return (n * m) >> 8;
 +    SysBusDevice *s;
 +
 +    object_initialize_child(OBJECT(mms), name, pl031, TYPE_PL031);
 +    s = SYS_BUS_DEVICE(pl031);
 +    sysbus_realize(s, &error_fatal);
 +    /*
 +     * The board docs don't give an IRQ number for the PL031, so
 +     * presumably it is not connected.
 +     */
 +    return sysbus_mmio_get_region(s, 0);
 +}
 +
- static void create_non_mpc_ram(MPS2TZMachineState *mms)
++static inline uint16_t do_mulh_h(int32_t n, int32_t m)
- {
++{
-     /*
++    return (n * m) >> 16;
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
++}
++
-                 { /* port 9 reserved */ },
++static inline uint32_t do_mulh_w(int64_t n, int64_t m)
-                 { "clcd", make_unimp_dev, &mms->cldc, 0x4130a000, 0x1000 },
++{
--                { "rtc", make_unimp_dev, &mms->rtc, 0x4130b000, 0x1000 },
++    return (n * m) >> 32;
-+                { "rtc", make_rtc, &mms->rtc, 0x4130b000, 0x1000 },
++}
-             },
++
-         }, {
++DO_2OP(vmulhsb, 1, int8_t, do_mulh_b)
-             .name = "ahb_ppcexp0",
++DO_2OP(vmulhsh, 2, int16_t, do_mulh_h)
 +DO_2OP(vmulhsw, 4, int32_t, do_mulh_w)
 +DO_2OP(vmulhub, 1, uint8_t, do_mulh_b)
 +DO_2OP(vmulhuh, 2, uint16_t, do_mulh_h)
 +DO_2OP(vmulhuw, 4, uint32_t, do_mulh_w)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_LOGIC(VEOR, gen_helper_mve_veor)
  DO_2OP(VADD, vadd)
  DO_2OP(VSUB, vsub)
  DO_2OP(VMUL, vmul)
 +DO_2OP(VMULH_S, vmulhs)
 +DO_2OP(VMULH_U, vmulhu)
 --
 .20.1

-[PULL 45/49] hw/arm/mps2-tz: Add new mps3-an524 board
+[PULL 25/57] target/arm: Implement MVE VRMULH
-Add support for the mps3-an524 board; this is an SSE-200 based FPGA
+Implement the MVE VRMULH insn, which performs a rounding multiply
-image, like the existing mps2-an521.  It has a usefully larger amount
+and then returns the high half.
 of RAM, and a PL031 RTC, as well as some more minor differences.
 In real hardware this image runs on a newer generation of the FPGA
 board, the MPS3 rather than the older MPS2.  Architecturally the two
 boards are similar, so we implement the MPS3 boards in the mps2-tz.c
 file as variations of the existing MPS2 boards.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-21-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-15-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 139 +++++++++++++++++++++++++++++++++++++++++++++--
+ target/arm/helper-mve.h    |  7 +++++++
-file changed, 135 insertions(+), 4 deletions(-)
+ target/arm/mve.decode      |  3 +++
  target/arm/mve_helper.c    | 22 ++++++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 34 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-  * This source file covers the following FPGA images, for TrustZone cores:
+ DEF_HELPER_FLAGS_4(mve_vmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-  *  "mps2-an505" -- Cortex-M33 as documented in ARM Application Note AN505
+ DEF_HELPER_FLAGS_4(mve_vmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-  *  "mps2-an521" -- Dual Cortex-M33 as documented in Application Note AN521
+ DEF_HELPER_FLAGS_4(mve_vmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 + *  "mps2-an524" -- Dual Cortex-M33 as documented in Application Note AN524
   *
   * Links to the TRM for the board itself and to the various Application
   * Notes which document the FPGA images can be found here:
@@ -XXX,XX +XXX,XX @@
   * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
   * Application Note AN521:
   * http://infocenter.arm.com/help/topic/com.arm.doc.dai0521c/index.html
 + * Application Note AN524:
 + * https://developer.arm.com/documentation/dai0524/latest/
   *
   * The AN505 defers to the Cortex-M33 processor ARMv8M IoT Kit FVP User Guide
   * (ARM ECM0601256) for the details of some of the device layout:
   *   http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
 - * Similarly, the AN521 uses the SSE-200, and the SSE-200 TRM defines
 + * Similarly, the AN521 and AN524 use the SSE-200, and the SSE-200 TRM defines
   * most of the device layout:
   *  http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
   *
@@ -XXX,XX +XXX,XX @@
  #include "hw/qdev-clock.h"
  #include "qom/object.h"
 -#define MPS2TZ_NUMIRQ_MAX 92
 +#define MPS2TZ_NUMIRQ_MAX 95
  #define MPS2TZ_RAM_MAX 4
  typedef enum MPS2TZFPGAType {
      FPGA_AN505,
      FPGA_AN521,
 +    FPGA_AN524,
  } MPS2TZFPGAType;
  /*
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
      TZPPC ppc[5];
      TZMPC mpc[3];
      PL022State spi[5];
 -    ArmSbconI2CState i2c[4];
 +    ArmSbconI2CState i2c[5];
      UnimplementedDeviceState i2s_audio;
      UnimplementedDeviceState gpio[4];
      UnimplementedDeviceState gfx;
 +    UnimplementedDeviceState cldc;
 +    UnimplementedDeviceState rtc;
      PL080State dma[4];
      TZMSC msc[4];
 -    CMSDKAPBUART uart[5];
 +    CMSDKAPBUART uart[6];
      SplitIRQ sec_resp_splitter;
      qemu_or_irq uart_irq_orgate;
      DeviceState *lan9118;
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
  #define TYPE_MPS2TZ_MACHINE "mps2tz"
  #define TYPE_MPS2TZ_AN505_MACHINE MACHINE_TYPE_NAME("mps2-an505")
  #define TYPE_MPS2TZ_AN521_MACHINE MACHINE_TYPE_NAME("mps2-an521")
 +#define TYPE_MPS3TZ_AN524_MACHINE MACHINE_TYPE_NAME("mps3-an524")
  OBJECT_DECLARE_TYPE(MPS2TZMachineState, MPS2TZMachineClass, MPS2TZ_MACHINE)
@@ -XXX,XX +XXX,XX @@ static const uint32_t an505_oscclk[] = {
      25000000,
  };
 +static const uint32_t an524_oscclk[] = {
 +    24000000,
 +    32000000,
 +    50000000,
 +    50000000,
 +    24576000,
 +    23750000,
 +};
 +
- static const RAMInfo an505_raminfo[] = { {
++DEF_HELPER_FLAGS_4(mve_vrmulhsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         .name = "ssram-0",
++DEF_HELPER_FLAGS_4(mve_vrmulhsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         .base = 0x00000000,
++DEF_HELPER_FLAGS_4(mve_vrmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@ static const RAMInfo an505_raminfo[] = { {
++DEF_HELPER_FLAGS_4(mve_vrmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     },
++DEF_HELPER_FLAGS_4(mve_vrmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- };
++DEF_HELPER_FLAGS_4(mve_vrmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-+static const RAMInfo an524_raminfo[] = { {
+index XXXXXXX..XXXXXXX 100644
-+        .name = "bram",
+--- a/target/arm/mve.decode
-+        .base = 0x00000000,
++++ b/target/arm/mve.decode
-+        .size = 512 * KiB,
+@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
-+        .mpc = 0,
+ VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
-+        .mrindex = 0,
+ VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
-+    }, {
-+        .name = "sram",
++VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
-+        .base = 0x20000000,
++VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 +        .size = 32 * 4 * KiB,
 +        .mpc = 1,
 +        .mrindex = 1,
 +    }, {
 +        /* We don't model QSPI flash yet; for now expose it as simple ROM */
 +        .name = "QSPI",
 +        .base = 0x28000000,
 +        .size = 8 * MiB,
 +        .mpc = 1,
 +        .mrindex = 2,
 +        .flags = IS_ROM,
 +    }, {
 +        .name = "DDR",
 +        .base = 0x60000000,
 +        .size = 2 * GiB,
 +        .mpc = 2,
 +        .mrindex = -1,
 +    }, {
 +        .name = NULL,
 +    },
 +};
 +
- static const RAMInfo *find_raminfo_for_mpc(MPS2TZMachineState *mms, int mpc)
+ # Vector miscellaneous
- {
-     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+ VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-         },
+index XXXXXXX..XXXXXXX 100644
-     };
+--- a/target/arm/mve_helper.c
++++ b/target/arm/mve_helper.c
-+    const PPCInfo an524_ppcs[] = { {
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t do_mulh_w(int64_t n, int64_t m)
-+            .name = "apb_ppcexp0",
+     return (n * m) >> 32;
 +            .ports = {
 +                { "bram-mpc", make_mpc, &mms->mpc[0], 0x58007000, 0x1000 },
 +                { "qspi-mpc", make_mpc, &mms->mpc[1], 0x58008000, 0x1000 },
 +                { "ddr-mpc", make_mpc, &mms->mpc[2], 0x58009000, 0x1000 },
 +            },
 +        }, {
 +            .name = "apb_ppcexp1",
 +            .ports = {
 +                { "i2c0", make_i2c, &mms->i2c[0], 0x41200000, 0x1000 },
 +                { "i2c1", make_i2c, &mms->i2c[1], 0x41201000, 0x1000 },
 +                { "spi0", make_spi, &mms->spi[0], 0x41202000, 0x1000, { 52 } },
 +                { "spi1", make_spi, &mms->spi[1], 0x41203000, 0x1000, { 53 } },
 +                { "spi2", make_spi, &mms->spi[2], 0x41204000, 0x1000, { 54 } },
 +                { "i2c2", make_i2c, &mms->i2c[2], 0x41205000, 0x1000 },
 +                { "i2c3", make_i2c, &mms->i2c[3], 0x41206000, 0x1000 },
 +                { /* port 7 reserved */ },
 +                { "i2c4", make_i2c, &mms->i2c[4], 0x41208000, 0x1000 },
 +            },
 +        }, {
 +            .name = "apb_ppcexp2",
 +            .ports = {
 +                { "scc", make_scc, &mms->scc, 0x41300000, 0x1000 },
 +                { "i2s-audio", make_unimp_dev, &mms->i2s_audio,
 +                  0x41301000, 0x1000 },
 +                { "fpgaio", make_fpgaio, &mms->fpgaio, 0x41302000, 0x1000 },
 +                { "uart0", make_uart, &mms->uart[0], 0x41303000, 0x1000, { 32, 33, 42 } },
 +                { "uart1", make_uart, &mms->uart[1], 0x41304000, 0x1000, { 34, 35, 43 } },
 +                { "uart2", make_uart, &mms->uart[2], 0x41305000, 0x1000, { 36, 37, 44 } },
 +                { "uart3", make_uart, &mms->uart[3], 0x41306000, 0x1000, { 38, 39, 45 } },
 +                { "uart4", make_uart, &mms->uart[4], 0x41307000, 0x1000, { 40, 41, 46 } },
 +                { "uart5", make_uart, &mms->uart[5], 0x41308000, 0x1000, { 124, 125, 126 } },
 +
 +                { /* port 9 reserved */ },
 +                { "clcd", make_unimp_dev, &mms->cldc, 0x4130a000, 0x1000 },
 +                { "rtc", make_unimp_dev, &mms->rtc, 0x4130b000, 0x1000 },
 +            },
 +        }, {
 +            .name = "ahb_ppcexp0",
 +            .ports = {
 +                { "gpio0", make_unimp_dev, &mms->gpio[0], 0x41100000, 0x1000 },
 +                { "gpio1", make_unimp_dev, &mms->gpio[1], 0x41101000, 0x1000 },
 +                { "gpio2", make_unimp_dev, &mms->gpio[2], 0x41102000, 0x1000 },
 +                { "gpio3", make_unimp_dev, &mms->gpio[3], 0x41103000, 0x1000 },
 +                { "eth", make_eth_dev, NULL, 0x41400000, 0x100000, { 48 } },
 +            },
 +        },
 +    };
 +
      switch (mmc->fpga_type) {
      case FPGA_AN505:
      case FPGA_AN521:
          ppcs = an505_ppcs;
          num_ppcs = ARRAY_SIZE(an505_ppcs);
          break;
 +    case FPGA_AN524:
 +        ppcs = an524_ppcs;
 +        num_ppcs = ARRAY_SIZE(an524_ppcs);
 +        break;
      default:
          g_assert_not_reached();
      }
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
      mps2tz_set_default_ram_info(mmc);
  }
-+static void mps3tz_an524_class_init(ObjectClass *oc, void *data)
++static inline uint8_t do_rmulh_b(int32_t n, int32_t m)
 +{
-+    MachineClass *mc = MACHINE_CLASS(oc);
++    return (n * m + (1U << 7)) >> 8;
 +    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_CLASS(oc);
 +
 +    mc->desc = "ARM MPS3 with AN524 FPGA image for dual Cortex-M33";
 +    mc->default_cpus = 2;
 +    mc->min_cpus = mc->default_cpus;
 +    mc->max_cpus = mc->default_cpus;
 +    mmc->fpga_type = FPGA_AN524;
 +    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
 +    mmc->scc_id = 0x41045240;
 +    mmc->sysclk_frq = 32 * 1000 * 1000; /* 32MHz */
 +    mmc->oscclk = an524_oscclk;
 +    mmc->len_oscclk = ARRAY_SIZE(an524_oscclk);
 +    mmc->fpgaio_num_leds = 10;
 +    mmc->fpgaio_has_switches = true;
 +    mmc->numirq = 95;
 +    mmc->raminfo = an524_raminfo;
 +    mmc->armsse_type = TYPE_SSE200;
 +    mps2tz_set_default_ram_info(mmc);
 +}
 +
- static const TypeInfo mps2tz_info = {
++static inline uint16_t do_rmulh_h(int32_t n, int32_t m)
-     .name = TYPE_MPS2TZ_MACHINE,
++{
-     .parent = TYPE_MACHINE,
++    return (n * m + (1U << 15)) >> 16;
-@@ -XXX,XX +XXX,XX @@ static const TypeInfo mps2tz_an521_info = {
++}
      .class_init = mps2tz_an521_class_init,
  };
 +static const TypeInfo mps3tz_an524_info = {
 +    .name = TYPE_MPS3TZ_AN524_MACHINE,
 +    .parent = TYPE_MPS2TZ_MACHINE,
 +    .class_init = mps3tz_an524_class_init,
 +};
 +
- static void mps2tz_machine_init(void)
++static inline uint32_t do_rmulh_w(int64_t n, int64_t m)
- {
++{
-     type_register_static(&mps2tz_info);
++    return (n * m + (1U << 31)) >> 32;
-     type_register_static(&mps2tz_an505_info);
++}
-     type_register_static(&mps2tz_an521_info);
++
-+    type_register_static(&mps3tz_an524_info);
+ DO_2OP(vmulhsb, 1, int8_t, do_mulh_b)
- }
+ DO_2OP(vmulhsh, 2, int16_t, do_mulh_h)
+ DO_2OP(vmulhsw, 4, int32_t, do_mulh_w)
- type_init(mps2tz_machine_init);
+ DO_2OP(vmulhub, 1, uint8_t, do_mulh_b)
  DO_2OP(vmulhuh, 2, uint16_t, do_mulh_h)
  DO_2OP(vmulhuw, 4, uint32_t, do_mulh_w)
 +
 +DO_2OP(vrmulhsb, 1, int8_t, do_rmulh_b)
 +DO_2OP(vrmulhsh, 2, int16_t, do_rmulh_h)
 +DO_2OP(vrmulhsw, 4, int32_t, do_rmulh_w)
 +DO_2OP(vrmulhub, 1, uint8_t, do_rmulh_b)
 +DO_2OP(vrmulhuh, 2, uint16_t, do_rmulh_h)
 +DO_2OP(vrmulhuw, 4, uint32_t, do_rmulh_w)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VSUB, vsub)
  DO_2OP(VMUL, vmul)
  DO_2OP(VMULH_S, vmulhs)
  DO_2OP(VMULH_U, vmulhu)
 +DO_2OP(VRMULH_S, vrmulhs)
 +DO_2OP(VRMULH_U, vrmulhu)
 --
 .20.1

-[PULL 32/49] hw/arm/mps2-tz: Make FPGAIO switch and LED config per-board
+[PULL 26/57] target/arm: Implement MVE VMAX, VMIN
-Set the FPGAIO num-leds and have-switches properties explicitly
+Implement the MVE VMAX and VMIN insns.
 per-board, rather than relying on the defaults.  The AN505 and AN521
 both have the same settings as the default values, but the AN524 will
 be different.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-8-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-16-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 9 +++++++++
+ target/arm/helper-mve.h    | 14 ++++++++++++++
-file changed, 9 insertions(+)
+ target/arm/mve.decode      |  5 +++++
  target/arm/mve_helper.c    | 14 ++++++++++++++
  target/arm/translate-mve.c |  4 ++++
 files changed, 37 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineClass {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     uint32_t sysclk_frq; /* Main SYSCLK frequency in Hz */
+ DEF_HELPER_FLAGS_4(mve_vrmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     uint32_t len_oscclk;
+ DEF_HELPER_FLAGS_4(mve_vrmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     const uint32_t *oscclk;
+ DEF_HELPER_FLAGS_4(mve_vrmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    uint32_t fpgaio_num_leds; /* Number of LEDs in FPGAIO LED0 register */
++
-+    bool fpgaio_has_switches; /* Does FPGAIO have SWITCH register? */
++DEF_HELPER_FLAGS_4(mve_vmaxsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     const char *armsse_type;
++DEF_HELPER_FLAGS_4(mve_vmaxsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- };
++DEF_HELPER_FLAGS_4(mve_vmaxsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++DEF_HELPER_FLAGS_4(mve_vmaxub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_fpgaio(MPS2TZMachineState *mms, void *opaque,
++DEF_HELPER_FLAGS_4(mve_vmaxuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-                                  const char *name, hwaddr size)
++DEF_HELPER_FLAGS_4(mve_vmaxuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- {
++
-     MPS2FPGAIO *fpgaio = opaque;
++DEF_HELPER_FLAGS_4(mve_vminsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
++DEF_HELPER_FLAGS_4(mve_vminsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++DEF_HELPER_FLAGS_4(mve_vminsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     object_initialize_child(OBJECT(mms), "fpgaio", fpgaio, TYPE_MPS2_FPGAIO);
++DEF_HELPER_FLAGS_4(mve_vminub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    qdev_prop_set_uint32(DEVICE(fpgaio), "num-leds", mmc->fpgaio_num_leds);
++DEF_HELPER_FLAGS_4(mve_vminuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    qdev_prop_set_bit(DEVICE(fpgaio), "has-switches", mmc->fpgaio_has_switches);
++DEF_HELPER_FLAGS_4(mve_vminuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     sysbus_realize(SYS_BUS_DEVICE(fpgaio), &error_fatal);
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-     return sysbus_mmio_get_region(SYS_BUS_DEVICE(fpgaio), 0);
+index XXXXXXX..XXXXXXX 100644
- }
+--- a/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
++++ b/target/arm/mve.decode
-     mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
+@@ -XXX,XX +XXX,XX @@ VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
-     mmc->oscclk = an505_oscclk;
+ VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
-     mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
+ VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
-+    mmc->fpgaio_num_leds = 2;
-+    mmc->fpgaio_has_switches = false;
++VMAX_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
-     mmc->armsse_type = TYPE_IOTKIT;
++VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
- }
++VMIN_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
++VMIN_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
++
-     mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
+ # Vector miscellaneous
-     mmc->oscclk = an505_oscclk; /* AN521 is the same as AN505 here */
-     mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
+ VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
-+    mmc->fpgaio_num_leds = 2;
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-+    mmc->fpgaio_has_switches = false;
+index XXXXXXX..XXXXXXX 100644
-     mmc->armsse_type = TYPE_SSE200;
+--- a/target/arm/mve_helper.c
- }
++++ b/target/arm/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
      DO_2OP(OP##h, 2, uint16_t, FN)              \
      DO_2OP(OP##w, 4, uint32_t, FN)
 +/* provide signed 2-op helpers for all sizes */
 +#define DO_2OP_S(OP, FN)                        \
 +    DO_2OP(OP##b, 1, int8_t, FN)                \
 +    DO_2OP(OP##h, 2, int16_t, FN)               \
 +    DO_2OP(OP##w, 4, int32_t, FN)
 +
  #define DO_AND(N, M)  ((N) & (M))
  #define DO_BIC(N, M)  ((N) & ~(M))
  #define DO_ORR(N, M)  ((N) | (M))
@@ -XXX,XX +XXX,XX @@ DO_2OP(vrmulhsw, 4, int32_t, do_rmulh_w)
  DO_2OP(vrmulhub, 1, uint8_t, do_rmulh_b)
  DO_2OP(vrmulhuh, 2, uint16_t, do_rmulh_h)
  DO_2OP(vrmulhuw, 4, uint32_t, do_rmulh_w)
 +
 +#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
 +#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
 +
 +DO_2OP_S(vmaxs, DO_MAX)
 +DO_2OP_U(vmaxu, DO_MAX)
 +DO_2OP_S(vmins, DO_MIN)
 +DO_2OP_U(vminu, DO_MIN)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULH_S, vmulhs)
  DO_2OP(VMULH_U, vmulhu)
  DO_2OP(VRMULH_S, vrmulhs)
  DO_2OP(VRMULH_U, vrmulhu)
 +DO_2OP(VMAX_S, vmaxs)
 +DO_2OP(VMAX_U, vmaxu)
 +DO_2OP(VMIN_S, vmins)
 +DO_2OP(VMIN_U, vminu)
 --
 .20.1

-[PULL 31/49] hw/misc/mps2-fpgaio: Support SWITCH register
+[PULL 27/57] target/arm: Implement MVE VABD
-MPS3 boards have an extra SWITCH register in the FPGAIO block which
+Implement the MVE VABD insn.
 reports the value of some switches.  Implement this, governed by a
 property the board code can use to specify whether whether it exists.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-7-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-17-peter.maydell@linaro.org
 ---
- include/hw/misc/mps2-fpgaio.h |  1 +
+ target/arm/helper-mve.h    | 7 +++++++
- hw/misc/mps2-fpgaio.c         | 10 ++++++++++
+ target/arm/mve.decode      | 3 +++
-files changed, 11 insertions(+)
+ target/arm/mve_helper.c    | 5 +++++
  target/arm/translate-mve.c | 2 ++
 files changed, 17 insertions(+)
-diff --git a/include/hw/misc/mps2-fpgaio.h b/include/hw/misc/mps2-fpgaio.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/mps2-fpgaio.h
+--- a/target/arm/helper-mve.h
-+++ b/include/hw/misc/mps2-fpgaio.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ struct MPS2FPGAIO {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vminsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     MemoryRegion iomem;
+ DEF_HELPER_FLAGS_4(mve_vminub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     LEDState *led[MPS2FPGAIO_MAX_LEDS];
+ DEF_HELPER_FLAGS_4(mve_vminuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     uint32_t num_leds;
+ DEF_HELPER_FLAGS_4(mve_vminuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    bool has_switches;
++
++DEF_HELPER_FLAGS_4(mve_vabdsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     uint32_t led0;
++DEF_HELPER_FLAGS_4(mve_vabdsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     uint32_t prescale;
++DEF_HELPER_FLAGS_4(mve_vabdsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-diff --git a/hw/misc/mps2-fpgaio.c b/hw/misc/mps2-fpgaio.c
++DEF_HELPER_FLAGS_4(mve_vabdub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vabduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vabduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/mps2-fpgaio.c
+--- a/target/arm/mve.decode
-+++ b/hw/misc/mps2-fpgaio.c
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ REG32(CLK100HZ, 0x14)
+@@ -XXX,XX +XXX,XX @@ VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
- REG32(COUNTER, 0x18)
+ VMIN_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
- REG32(PRESCALE, 0x1c)
+ VMIN_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
- REG32(PSCNTR, 0x20)
-+REG32(SWITCH, 0x28)
++VABD_S           111 0 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
- REG32(MISC, 0x4c)
++VABD_U           111 1 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
++
- static uint32_t counter_from_tickoff(int64_t now, int64_t tick_offset, int frq)
+ # Vector miscellaneous
-@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_fpgaio_read(void *opaque, hwaddr offset, unsigned size)
-         resync_counter(s);
+ VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
-         r = s->pscntr;
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-         break;
+index XXXXXXX..XXXXXXX 100644
-+    case A_SWITCH:
+--- a/target/arm/mve_helper.c
-+        if (!s->has_switches) {
++++ b/target/arm/mve_helper.c
-+            goto bad_offset;
+@@ -XXX,XX +XXX,XX @@ DO_2OP_S(vmaxs, DO_MAX)
-+        }
+ DO_2OP_U(vmaxu, DO_MAX)
-+        /* User-togglable board switches. We don't model that, so report 0. */
+ DO_2OP_S(vmins, DO_MIN)
-+        r = 0;
+ DO_2OP_U(vminu, DO_MIN)
-+        break;
++
-     default:
++#define DO_ABD(N, M)  ((N) >= (M) ? (N) - (M) : (M) - (N))
-+    bad_offset:
++
-         qemu_log_mask(LOG_GUEST_ERROR,
++DO_2OP_S(vabds, DO_ABD)
-                       "MPS2 FPGAIO read: bad offset %x\n", (int) offset);
++DO_2OP_U(vabdu, DO_ABD)
-         r = 0;
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static Property mps2_fpgaio_properties[] = {
+index XXXXXXX..XXXXXXX 100644
-     DEFINE_PROP_UINT32("prescale-clk", MPS2FPGAIO, prescale_clk, 20000000),
+--- a/target/arm/translate-mve.c
-     /* Number of LEDs controlled by LED0 register */
++++ b/target/arm/translate-mve.c
-     DEFINE_PROP_UINT32("num-leds", MPS2FPGAIO, num_leds, 2),
+@@ -XXX,XX +XXX,XX @@ DO_2OP(VMAX_S, vmaxs)
-+    DEFINE_PROP_BOOL("has-switches", MPS2FPGAIO, has_switches, false),
+ DO_2OP(VMAX_U, vmaxu)
-     DEFINE_PROP_END_OF_LIST(),
+ DO_2OP(VMIN_S, vmins)
- };
+ DO_2OP(VMIN_U, vminu)
++DO_2OP(VABD_S, vabds)
 +DO_2OP(VABD_U, vabdu)
 --
 .20.1

-[PULL 23/49] hw/display/omap_lcdc: Inline template header into C file
+[PULL 28/57] target/arm: Implement MVE VHADD, VHSUB
-We only include the template header once, so just inline it into the
+Implement MVE VHADD and VHSUB insns, which perform an addition
-source file for the device.
+or subtraction and then halve the result.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210617121628.20116-18-peter.maydell@linaro.org
 Message-id: 20210215103215.4944-9-peter.maydell@linaro.org
 ---
- hw/display/omap_lcd_template.h | 154 ---------------------------------
+ target/arm/helper-mve.h    | 14 ++++++++++++++
- hw/display/omap_lcdc.c         | 127 ++++++++++++++++++++++++++-
+ target/arm/mve.decode      |  5 +++++
-files changed, 125 insertions(+), 156 deletions(-)
+ target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
- delete mode 100644 hw/display/omap_lcd_template.h
+ target/arm/translate-mve.c |  4 ++++
 files changed, 48 insertions(+)
-diff --git a/hw/display/omap_lcd_template.h b/hw/display/omap_lcd_template.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 deleted file mode 100644
 index XXXXXXX..XXXXXXX
 --- a/hw/display/omap_lcd_template.h
 +++ /dev/null
@@ -XXX,XX +XXX,XX @@
 -/*
 - * QEMU OMAP LCD Emulator templates
 - *
 - * Copyright (c) 2006 Andrzej Zaborowski  <balrog@zabor.org>
 - *
 - * Redistribution and use in source and binary forms, with or without
 - * modification, are permitted provided that the following conditions
 - * are met:
 - *
 - * 1. Redistributions of source code must retain the above copyright
 - *    notice, this list of conditions and the following disclaimer.
 - * 2. Redistributions in binary form must reproduce the above copyright
 - *    notice, this list of conditions and the following disclaimer in
 - *    the documentation and/or other materials provided with the
 - *    distribution.
 - *
 - * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS''
 - * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
 - * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
 - * PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR
 - * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 - * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 - * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
 - * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
 - * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 - * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 - */
 -
 -/*
 - * 2-bit colour
 - */
 -static void draw_line2_32(void *opaque, uint8_t *d, const uint8_t *s,
 -                          int width, int deststep)
 -{
 -    uint16_t *pal = opaque;
 -    uint8_t v, r, g, b;
 -
 -    do {
 -        v = ldub_p((void *) s);
 -        r = (pal[v & 3] >> 4) & 0xf0;
 -        g = pal[v & 3] & 0xf0;
 -        b = (pal[v & 3] << 4) & 0xf0;
 -        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 -        d += 4;
 -        v >>= 2;
 -        r = (pal[v & 3] >> 4) & 0xf0;
 -        g = pal[v & 3] & 0xf0;
 -        b = (pal[v & 3] << 4) & 0xf0;
 -        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 -        d += 4;
 -        v >>= 2;
 -        r = (pal[v & 3] >> 4) & 0xf0;
 -        g = pal[v & 3] & 0xf0;
 -        b = (pal[v & 3] << 4) & 0xf0;
 -        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 -        d += 4;
 -        v >>= 2;
 -        r = (pal[v & 3] >> 4) & 0xf0;
 -        g = pal[v & 3] & 0xf0;
 -        b = (pal[v & 3] << 4) & 0xf0;
 -        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 -        d += 4;
 -        s++;
 -        width -= 4;
 -    } while (width > 0);
 -}
 -
 -/*
 - * 4-bit colour
 - */
 -static void draw_line4_32(void *opaque, uint8_t *d, const uint8_t *s,
 -                          int width, int deststep)
 -{
 -    uint16_t *pal = opaque;
 -    uint8_t v, r, g, b;
 -
 -    do {
 -        v = ldub_p((void *) s);
 -        r = (pal[v & 0xf] >> 4) & 0xf0;
 -        g = pal[v & 0xf] & 0xf0;
 -        b = (pal[v & 0xf] << 4) & 0xf0;
 -        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 -        d += 4;
 -        v >>= 4;
 -        r = (pal[v & 0xf] >> 4) & 0xf0;
 -        g = pal[v & 0xf] & 0xf0;
 -        b = (pal[v & 0xf] << 4) & 0xf0;
 -        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 -        d += 4;
 -        s++;
 -        width -= 2;
 -    } while (width > 0);
 -}
 -
 -/*
 - * 8-bit colour
 - */
 -static void draw_line8_32(void *opaque, uint8_t *d, const uint8_t *s,
 -                          int width, int deststep)
 -{
 -    uint16_t *pal = opaque;
 -    uint8_t v, r, g, b;
 -
 -    do {
 -        v = ldub_p((void *) s);
 -        r = (pal[v] >> 4) & 0xf0;
 -        g = pal[v] & 0xf0;
 -        b = (pal[v] << 4) & 0xf0;
 -        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 -        s++;
 -        d += 4;
 -    } while (-- width != 0);
 -}
 -
 -/*
 - * 12-bit colour
 - */
 -static void draw_line12_32(void *opaque, uint8_t *d, const uint8_t *s,
 -                           int width, int deststep)
 -{
 -    uint16_t v;
 -    uint8_t r, g, b;
 -
 -    do {
 -        v = lduw_le_p((void *) s);
 -        r = (v >> 4) & 0xf0;
 -        g = v & 0xf0;
 -        b = (v << 4) & 0xf0;
 -        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 -        s += 2;
 -        d += 4;
 -    } while (-- width != 0);
 -}
 -
 -/*
 - * 16-bit colour
 - */
 -static void draw_line16_32(void *opaque, uint8_t *d, const uint8_t *s,
 -                           int width, int deststep)
 -{
 -    uint16_t v;
 -    uint8_t r, g, b;
 -
 -    do {
 -        v = lduw_le_p((void *) s);
 -        r = (v >> 8) & 0xf8;
 -        g = (v >> 3) & 0xfc;
 -        b = (v << 3) & 0xf8;
 -        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 -        s += 2;
 -        d += 4;
 -    } while (-- width != 0);
 -}
 diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/omap_lcdc.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/display/omap_lcdc.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vabdsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ DEF_HELPER_FLAGS_4(mve_vabdub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #define draw_line_func drawfn
+ DEF_HELPER_FLAGS_4(mve_vabduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ DEF_HELPER_FLAGS_4(mve_vabduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--#define DEPTH 32
++
--#include "omap_lcd_template.h"
++DEF_HELPER_FLAGS_4(mve_vhaddsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+/*
++DEF_HELPER_FLAGS_4(mve_vhaddsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+ * 2-bit colour
++DEF_HELPER_FLAGS_4(mve_vhaddsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+ */
++DEF_HELPER_FLAGS_4(mve_vhaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+static void draw_line2_32(void *opaque, uint8_t *d, const uint8_t *s,
++DEF_HELPER_FLAGS_4(mve_vhadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+                          int width, int deststep)
++DEF_HELPER_FLAGS_4(mve_vhadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vhsubsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhsubsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhsubsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VMIN_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
  VABD_S           111 0 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
  VABD_U           111 1 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
 +VHADD_S          111 0 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
 +VHADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
 +VHSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 +VHSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vminu, DO_MIN)
  DO_2OP_S(vabds, DO_ABD)
  DO_2OP_U(vabdu, DO_ABD)
 +
 +static inline uint32_t do_vhadd_u(uint32_t n, uint32_t m)
 +{
-+    uint16_t *pal = opaque;
++    return ((uint64_t)n + m) >> 1;
 +    uint8_t v, r, g, b;
 +
 +    do {
 +        v = ldub_p((void *) s);
 +        r = (pal[v & 3] >> 4) & 0xf0;
 +        g = pal[v & 3] & 0xf0;
 +        b = (pal[v & 3] << 4) & 0xf0;
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 +        d += 4;
 +        v >>= 2;
 +        r = (pal[v & 3] >> 4) & 0xf0;
 +        g = pal[v & 3] & 0xf0;
 +        b = (pal[v & 3] << 4) & 0xf0;
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 +        d += 4;
 +        v >>= 2;
 +        r = (pal[v & 3] >> 4) & 0xf0;
 +        g = pal[v & 3] & 0xf0;
 +        b = (pal[v & 3] << 4) & 0xf0;
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 +        d += 4;
 +        v >>= 2;
 +        r = (pal[v & 3] >> 4) & 0xf0;
 +        g = pal[v & 3] & 0xf0;
 +        b = (pal[v & 3] << 4) & 0xf0;
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 +        d += 4;
 +        s++;
 +        width -= 4;
 +    } while (width > 0);
 +}
 +
-+/*
++static inline int32_t do_vhadd_s(int32_t n, int32_t m)
 + * 4-bit colour
 + */
 +static void draw_line4_32(void *opaque, uint8_t *d, const uint8_t *s,
 +                          int width, int deststep)
 +{
-+    uint16_t *pal = opaque;
++    return ((int64_t)n + m) >> 1;
 +    uint8_t v, r, g, b;
 +
 +    do {
 +        v = ldub_p((void *) s);
 +        r = (pal[v & 0xf] >> 4) & 0xf0;
 +        g = pal[v & 0xf] & 0xf0;
 +        b = (pal[v & 0xf] << 4) & 0xf0;
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 +        d += 4;
 +        v >>= 4;
 +        r = (pal[v & 0xf] >> 4) & 0xf0;
 +        g = pal[v & 0xf] & 0xf0;
 +        b = (pal[v & 0xf] << 4) & 0xf0;
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 +        d += 4;
 +        s++;
 +        width -= 2;
 +    } while (width > 0);
 +}
 +
-+/*
++static inline uint32_t do_vhsub_u(uint32_t n, uint32_t m)
 + * 8-bit colour
 + */
 +static void draw_line8_32(void *opaque, uint8_t *d, const uint8_t *s,
 +                          int width, int deststep)
 +{
-+    uint16_t *pal = opaque;
++    return ((uint64_t)n - m) >> 1;
 +    uint8_t v, r, g, b;
 +
 +    do {
 +        v = ldub_p((void *) s);
 +        r = (pal[v] >> 4) & 0xf0;
 +        g = pal[v] & 0xf0;
 +        b = (pal[v] << 4) & 0xf0;
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 +        s++;
 +        d += 4;
 +    } while (-- width != 0);
 +}
 +
-+/*
++static inline int32_t do_vhsub_s(int32_t n, int32_t m)
 + * 12-bit colour
 + */
 +static void draw_line12_32(void *opaque, uint8_t *d, const uint8_t *s,
 +                           int width, int deststep)
 +{
-+    uint16_t v;
++    return ((int64_t)n - m) >> 1;
 +    uint8_t r, g, b;
 +
 +    do {
 +        v = lduw_le_p((void *) s);
 +        r = (v >> 4) & 0xf0;
 +        g = v & 0xf0;
 +        b = (v << 4) & 0xf0;
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 +        s += 2;
 +        d += 4;
 +    } while (-- width != 0);
 +}
 +
-+/*
++DO_2OP_S(vhadds, do_vhadd_s)
-+ * 16-bit colour
++DO_2OP_U(vhaddu, do_vhadd_u)
-+ */
++DO_2OP_S(vhsubs, do_vhsub_s)
-+static void draw_line16_32(void *opaque, uint8_t *d, const uint8_t *s,
++DO_2OP_U(vhsubu, do_vhsub_u)
-+                           int width, int deststep)
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-+{
+index XXXXXXX..XXXXXXX 100644
-+    uint16_t v;
+--- a/target/arm/translate-mve.c
-+    uint8_t r, g, b;
++++ b/target/arm/translate-mve.c
-+
+@@ -XXX,XX +XXX,XX @@ DO_2OP(VMIN_S, vmins)
-+    do {
+ DO_2OP(VMIN_U, vminu)
-+        v = lduw_le_p((void *) s);
+ DO_2OP(VABD_S, vabds)
-+        r = (v >> 8) & 0xf8;
+ DO_2OP(VABD_U, vabdu)
-+        g = (v >> 3) & 0xfc;
++DO_2OP(VHADD_S, vhadds)
-+        b = (v << 3) & 0xf8;
++DO_2OP(VHADD_U, vhaddu)
-+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
++DO_2OP(VHSUB_S, vhsubs)
-+        s += 2;
++DO_2OP(VHSUB_U, vhsubu)
 +        d += 4;
 +    } while (-- width != 0);
 +}
  static void omap_update_display(void *opaque)
  {
 --
 .20.1

-[PULL 40/49] hw/arm/mps2-tz: Allow boards to have different PPCInfo data
+[PULL 29/57] target/arm: Implement MVE VMULL
-The AN505 and AN521 have the same device layout, but the AN524 is
+Implement the MVE VMULL insn, which multiplies two single
-somewhat different.  Allow for more than one PPCInfo array, which can
+width integer elements to produce a double width result.
 be selected based on the board type.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-16-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-19-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 16 ++++++++++++++--
+ target/arm/helper-mve.h    | 14 ++++++++++++++
-file changed, 14 insertions(+), 2 deletions(-)
+ target/arm/mve.decode      |  5 +++++
  target/arm/mve_helper.c    | 34 ++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  4 ++++
 files changed, 57 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vhsubsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     MemoryRegion *system_memory = get_system_memory();
+ DEF_HELPER_FLAGS_4(mve_vhsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     DeviceState *iotkitdev;
+ DEF_HELPER_FLAGS_4(mve_vhsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     DeviceState *dev_splitter;
+ DEF_HELPER_FLAGS_4(mve_vhsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    const PPCInfo *ppcs;
++
-+    int num_ppcs;
++DEF_HELPER_FLAGS_4(mve_vmullbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     int i;
++DEF_HELPER_FLAGS_4(mve_vmullbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++DEF_HELPER_FLAGS_4(mve_vmullbsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     if (strcmp(machine->cpu_type, mc->default_cpu_type) != 0) {
++DEF_HELPER_FLAGS_4(mve_vmullbub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
++DEF_HELPER_FLAGS_4(mve_vmullbuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-      *  + wire up the PPC's control lines to the IoTKit object
++DEF_HELPER_FLAGS_4(mve_vmullbuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-      */
++
++DEF_HELPER_FLAGS_4(mve_vmulltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--    const PPCInfo ppcs[] = { {
++DEF_HELPER_FLAGS_4(mve_vmulltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    const PPCInfo an505_ppcs[] = { {
++DEF_HELPER_FLAGS_4(mve_vmulltsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-             .name = "apb_ppcexp0",
++DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-             .ports = {
++DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-                 { "ssram-0", make_mpc, &mms->ssram_mpc[0], 0x58007000, 0x1000 },
++DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-         },
+index XXXXXXX..XXXXXXX 100644
-     };
+--- a/target/arm/mve.decode
++++ b/target/arm/mve.decode
--    for (i = 0; i < ARRAY_SIZE(ppcs); i++) {
+@@ -XXX,XX +XXX,XX @@ VHADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
-+    switch (mmc->fpga_type) {
+ VHSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
-+    case FPGA_AN505:
+ VHSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
-+    case FPGA_AN521:
-+        ppcs = an505_ppcs;
++VMULL_BS         111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
-+        num_ppcs = ARRAY_SIZE(an505_ppcs);
++VMULL_BU         111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
-+        break;
++VMULL_TS         111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
-+    default:
++VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
-+        g_assert_not_reached();
++
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
      DO_2OP(OP##h, 2, int16_t, FN)               \
      DO_2OP(OP##w, 4, int32_t, FN)
 +/*
 + * "Long" operations where two half-sized inputs (taken from either the
 + * top or the bottom of the input vector) produce a double-width result.
 + * Here ESIZE, TYPE are for the input, and LESIZE, LTYPE for the output.
 + */
 +#define DO_2OP_L(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)               \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \
 +    {                                                                   \
 +        LTYPE *d = vd;                                                  \
 +        TYPE *n = vn, *m = vm;                                          \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned le;                                                    \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
 +            LTYPE r = FN((LTYPE)n[H##ESIZE(le * 2 + TOP)],              \
 +                         m[H##ESIZE(le * 2 + TOP)]);                    \
 +            mergemask(&d[H##LESIZE(le)], r, mask);                      \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-+    for (i = 0; i < num_ppcs; i++) {
+ #define DO_AND(N, M)  ((N) & (M))
-         const PPCInfo *ppcinfo = &ppcs[i];
+ #define DO_BIC(N, M)  ((N) & ~(M))
-         TZPPC *ppc = &mms->ppc[i];
+ #define DO_ORR(N, M)  ((N) | (M))
-         DeviceState *ppcdev;
+@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vadd, DO_ADD)
  DO_2OP_U(vsub, DO_SUB)
  DO_2OP_U(vmul, DO_MUL)
 +DO_2OP_L(vmullbsb, 0, 1, int8_t, 2, int16_t, DO_MUL)
 +DO_2OP_L(vmullbsh, 0, 2, int16_t, 4, int32_t, DO_MUL)
 +DO_2OP_L(vmullbsw, 0, 4, int32_t, 8, int64_t, DO_MUL)
 +DO_2OP_L(vmullbub, 0, 1, uint8_t, 2, uint16_t, DO_MUL)
 +DO_2OP_L(vmullbuh, 0, 2, uint16_t, 4, uint32_t, DO_MUL)
 +DO_2OP_L(vmullbuw, 0, 4, uint32_t, 8, uint64_t, DO_MUL)
 +
 +DO_2OP_L(vmulltsb, 1, 1, int8_t, 2, int16_t, DO_MUL)
 +DO_2OP_L(vmulltsh, 1, 2, int16_t, 4, int32_t, DO_MUL)
 +DO_2OP_L(vmulltsw, 1, 4, int32_t, 8, int64_t, DO_MUL)
 +DO_2OP_L(vmulltub, 1, 1, uint8_t, 2, uint16_t, DO_MUL)
 +DO_2OP_L(vmulltuh, 1, 2, uint16_t, 4, uint32_t, DO_MUL)
 +DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
 +
  /*
   * Because the computation type is at least twice as large as required,
   * these work for both signed and unsigned source types.
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VHADD_S, vhadds)
  DO_2OP(VHADD_U, vhaddu)
  DO_2OP(VHSUB_S, vhsubs)
  DO_2OP(VHSUB_U, vhsubu)
 +DO_2OP(VMULL_BS, vmullbs)
 +DO_2OP(VMULL_BU, vmullbu)
 +DO_2OP(VMULL_TS, vmullts)
 +DO_2OP(VMULL_TU, vmulltu)
 --
 .20.1

-[PULL 29/49] hw/arm/mps2-tz: Make the OSCCLK settings be configurable per-board
+[PULL 30/57] target/arm: Implement MVE VMLALDAV
-The AN505 and AN511 happen to share the same OSCCLK values, but the
+Implement the MVE VMLALDAV insn, which multiplies pairs of integer
-AN524 will have a different set (and more of them), so split the
+elements, accumulating them into a 64-bit result in a pair of
-settings out to be per-board.
+general-purpose registers.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-5-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-20-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 23 ++++++++++++++++++-----
+ target/arm/helper-mve.h    |  8 ++++
-file changed, 18 insertions(+), 5 deletions(-)
+ target/arm/translate.h     | 10 ++++
+ target/arm/mve.decode      | 15 ++++++
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+ target/arm/mve_helper.c    | 34 ++++++++++++++
-index XXXXXXX..XXXXXXX 100644
+ target/arm/translate-mve.c | 96 ++++++++++++++++++++++++++++++++++++++
---- a/hw/arm/mps2-tz.c
+files changed, 163 insertions(+)
-+++ b/hw/arm/mps2-tz.c
-@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineClass {
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-     MPS2TZFPGAType fpga_type;
+index XXXXXXX..XXXXXXX 100644
-     uint32_t scc_id;
+--- a/target/arm/helper-mve.h
-     uint32_t sysclk_frq; /* Main SYSCLK frequency in Hz */
++++ b/target/arm/helper-mve.h
-+    uint32_t len_oscclk;
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    const uint32_t *oscclk;
+ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     const char *armsse_type;
+ DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- };
+ DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++
-@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_TYPE(MPS2TZMachineState, MPS2TZMachineClass, MPS2TZ_MACHINE)
++DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- /* Slow 32Khz S32KCLK frequency in Hz */
++DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- #define S32KCLK_FRQ (32 * 1000)
++DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
++DEF_HELPER_FLAGS_4(mve_vmlaldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+static const uint32_t an505_oscclk[] = {
++
-+    40000000,
++DEF_HELPER_FLAGS_4(mve_vmlaldavuh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+    24580000,
++DEF_HELPER_FLAGS_4(mve_vmlaldavuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+    25000000,
+diff --git a/target/arm/translate.h b/target/arm/translate.h
-+};
+index XXXXXXX..XXXXXXX 100644
-+
+--- a/target/arm/translate.h
- /* Create an alias of an entire original MemoryRegion @orig
++++ b/target/arm/translate.h
-  * located at @base in the memory map.
+@@ -XXX,XX +XXX,XX @@ static inline int negate(DisasContext *s, int x)
-  */
+     return -x;
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
      MPS2SCC *scc = opaque;
      DeviceState *sccdev;
      MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
 +    uint32_t i;
      object_initialize_child(OBJECT(mms), "scc", scc, TYPE_MPS2_SCC);
      sccdev = DEVICE(scc);
      qdev_prop_set_uint32(sccdev, "scc-cfg4", 0x2);
      qdev_prop_set_uint32(sccdev, "scc-aid", 0x00200008);
      qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
 -    /* This will need to be per-FPGA image eventually */
 -    qdev_prop_set_uint32(sccdev, "len-oscclk", 3);
 -    qdev_prop_set_uint32(sccdev, "oscclk[0]", 40000000);
 -    qdev_prop_set_uint32(sccdev, "oscclk[1]", 24580000);
 -    qdev_prop_set_uint32(sccdev, "oscclk[2]", 25000000);
 +    qdev_prop_set_uint32(sccdev, "len-oscclk", mmc->len_oscclk);
 +    for (i = 0; i < mmc->len_oscclk; i++) {
 +        g_autofree char *propname = g_strdup_printf("oscclk[%u]", i);
 +        qdev_prop_set_uint32(sccdev, propname, mmc->oscclk[i]);
 +    }
      sysbus_realize(SYS_BUS_DEVICE(scc), &error_fatal);
      return sysbus_mmio_get_region(SYS_BUS_DEVICE(sccdev), 0);
  }
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
-     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
++static inline int plus_1(DisasContext *s, int x)
-     mmc->scc_id = 0x41045050;
++{
-     mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
++    return x + 1;
-+    mmc->oscclk = an505_oscclk;
++}
-+    mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
++
-     mmc->armsse_type = TYPE_IOTKIT;
+ static inline int plus_2(DisasContext *s, int x)
  {
      return x + 2;
@@ -XXX,XX +XXX,XX @@ static inline int times_4(DisasContext *s, int x)
      return x * 4;
  }
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
++static inline int times_2_plus_1(DisasContext *s, int x)
-     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
++{
-     mmc->scc_id = 0x41045210;
++    return x * 2 + 1;
-     mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
++}
-+    mmc->oscclk = an505_oscclk; /* AN521 is the same as AN505 here */
++
-+    mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
+ static inline int arm_dc_feature(DisasContext *dc, int feature)
-     mmc->armsse_type = TYPE_SSE200;
+ {
      return (dc->features & (1ULL << feature)) != 0;
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
  VDUP             1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
  VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
  VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
 +
 +# multiply-add long dual accumulate
 +# rdahi: bits [3:1] from insn, bit 0 is 1
 +# rdalo: bits [3:1] from insn, bit 0 is 0
 +%rdahi 20:3 !function=times_2_plus_1
 +%rdalo 13:3 !function=times_2
 +# size bit is 0 for 16 bit, 1 for 32 bit
 +%size_16 16:1 !function=plus_1
 +
 +&vmlaldav rdahi rdalo size qn qm x a
 +
 +@vmlaldav        .... .... . ... ... . ... . .... .... qm:3 . \
 +                 qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
 +VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
 +VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_S(vhadds, do_vhadd_s)
  DO_2OP_U(vhaddu, do_vhadd_u)
  DO_2OP_S(vhsubs, do_vhsub_s)
  DO_2OP_U(vhsubu, do_vhsub_u)
 +
 +
 +/*
 + * Multiply add long dual accumulate ops.
 + */
 +#define DO_LDAV(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC)                 \
 +    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
 +                                    void *vm, uint64_t a)               \
 +    {                                                                   \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        TYPE *n = vn, *m = vm;                                          \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            if (mask & 1) {                                             \
 +                if (e & 1) {                                            \
 +                    a ODDACC                                            \
 +                        (int64_t)n[H##ESIZE(e - 1 * XCHG)] * m[H##ESIZE(e)]; \
 +                } else {                                                \
 +                    a EVENACC                                           \
 +                        (int64_t)n[H##ESIZE(e + 1 * XCHG)] * m[H##ESIZE(e)]; \
 +                }                                                       \
 +            }                                                           \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +        return a;                                                       \
 +    }
 +
 +DO_LDAV(vmlaldavsh, 2, int16_t, false, +=, +=)
 +DO_LDAV(vmlaldavxsh, 2, int16_t, true, +=, +=)
 +DO_LDAV(vmlaldavsw, 4, int32_t, false, +=, +=)
 +DO_LDAV(vmlaldavxsw, 4, int32_t, true, +=, +=)
 +
 +DO_LDAV(vmlaldavuh, 2, uint16_t, false, +=, +=)
 +DO_LDAV(vmlaldavuw, 4, uint32_t, false, +=, +=)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
  typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 +typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
      }
  }
++static bool mve_skip_first_beat(DisasContext *s)
++{
++    /* Return true if PSR.ECI says we must skip the first beat of this insn */
++    switch (s->eci) {
++    case ECI_NONE:
++        return false;
++    case ECI_A0:
++    case ECI_A0A1:
++    case ECI_A0A1A2:
++    case ECI_A0A1A2B0:
++        return true;
++    default:
++        g_assert_not_reached();
++    }
++}
++
+ static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
+ {
+     TCGv_i32 addr;
+@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULL_BS, vmullbs)
+ DO_2OP(VMULL_BU, vmullbu)
+ DO_2OP(VMULL_TS, vmullts)
+ DO_2OP(VMULL_TU, vmulltu)
++
++static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
++                             MVEGenDualAccOpFn *fn)
++{
++    TCGv_ptr qn, qm;
++    TCGv_i64 rda;
++    TCGv_i32 rdalo, rdahi;
++
++    if (!dc_isar_feature(aa32_mve, s) ||
++        !mve_check_qreg_bank(s, a->qn | a->qm) ||
++        !fn) {
++        return false;
++    }
++    /*
++     * rdahi == 13 is UNPREDICTABLE; rdahi == 15 is a related
++     * encoding; rdalo always has bit 0 clear so cannot be 13 or 15.
++     */
++    if (a->rdahi == 13 || a->rdahi == 15) {
++        return false;
++    }
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
++        return true;
++    }
++
++    qn = mve_qreg_ptr(a->qn);
++    qm = mve_qreg_ptr(a->qm);
++
++    /*
++     * This insn is subject to beat-wise execution. Partial execution
++     * of an A=0 (no-accumulate) insn which does not execute the first
++     * beat must start with the current rda value, not 0.
++     */
++    if (a->a || mve_skip_first_beat(s)) {
++        rda = tcg_temp_new_i64();
++        rdalo = load_reg(s, a->rdalo);
++        rdahi = load_reg(s, a->rdahi);
++        tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
++        tcg_temp_free_i32(rdalo);
++        tcg_temp_free_i32(rdahi);
++    } else {
++        rda = tcg_const_i64(0);
++    }
++
++    fn(rda, cpu_env, qn, qm, rda);
++    tcg_temp_free_ptr(qn);
++    tcg_temp_free_ptr(qm);
++
++    rdalo = tcg_temp_new_i32();
++    rdahi = tcg_temp_new_i32();
++    tcg_gen_extrl_i64_i32(rdalo, rda);
++    tcg_gen_extrh_i64_i32(rdahi, rda);
++    store_reg(s, a->rdalo, rdalo);
++    store_reg(s, a->rdahi, rdahi);
++    tcg_temp_free_i64(rda);
++    mve_update_eci(s);
++    return true;
++}
++
++static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
++{
++    static MVEGenDualAccOpFn * const fns[4][2] = {
++        { NULL, NULL },
++        { gen_helper_mve_vmlaldavsh, gen_helper_mve_vmlaldavxsh },
++        { gen_helper_mve_vmlaldavsw, gen_helper_mve_vmlaldavxsw },
++        { NULL, NULL },
++    };
++    return do_long_dual_acc(s, a, fns[a->size][a->x]);
++}
++
++static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
++{
++    static MVEGenDualAccOpFn * const fns[4][2] = {
++        { NULL, NULL },
++        { gen_helper_mve_vmlaldavuh, NULL },
++        { gen_helper_mve_vmlaldavuw, NULL },
++        { NULL, NULL },
++    };
++    return do_long_dual_acc(s, a, fns[a->size][a->x]);
++}
 --
 .20.1

-[PULL 28/49] hw/arm/mps2-tz: Correct the OSCCLK settings for mps2-an505 and mps2-an511
+[PULL 31/57] target/arm: Implement MVE VMLSLDAV
-We were previously using the default OSCCLK settings, which are
+Implement the MVE insn VMLSLDAV, which multiplies source elements,
-correct for the older MPS2 boards (mps2-an385, mps2-an386,
+alternately adding and subtracting them, and accumulates into a
-mps2-an500, mps2-an511), but wrong for the mps2-an505 and mps2-511
+-bit result in a pair of general purpose registers.
 implemented in mps2-tz.c.  Now we're setting the values explicitly we
 can fix them to be correct.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-4-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-21-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 4 ++--
+ target/arm/helper-mve.h    |  5 +++++
-file changed, 2 insertions(+), 2 deletions(-)
+ target/arm/mve.decode      |  2 ++
  target/arm/mve_helper.c    |  5 +++++
  target/arm/translate-mve.c | 11 +++++++++++
 files changed, 23 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmlaldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-     qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
-     /* This will need to be per-FPGA image eventually */
+ DEF_HELPER_FLAGS_4(mve_vmlaldavuh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-     qdev_prop_set_uint32(sccdev, "len-oscclk", 3);
+ DEF_HELPER_FLAGS_4(mve_vmlaldavuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
--    qdev_prop_set_uint32(sccdev, "oscclk[0]", 50000000);
++
--    qdev_prop_set_uint32(sccdev, "oscclk[1]", 24576000);
++DEF_HELPER_FLAGS_4(mve_vmlsldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+    qdev_prop_set_uint32(sccdev, "oscclk[0]", 40000000);
++DEF_HELPER_FLAGS_4(mve_vmlsldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+    qdev_prop_set_uint32(sccdev, "oscclk[1]", 24580000);
++DEF_HELPER_FLAGS_4(mve_vmlsldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-     qdev_prop_set_uint32(sccdev, "oscclk[2]", 25000000);
++DEF_HELPER_FLAGS_4(mve_vmlsldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-     sysbus_realize(SYS_BUS_DEVICE(scc), &error_fatal);
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-     return sysbus_mmio_get_region(SYS_BUS_DEVICE(sccdev), 0);
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
                   qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
  VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
  VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
 +
 +VMLSLDAV         1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlaldavxsw, 4, int32_t, true, +=, +=)
  DO_LDAV(vmlaldavuh, 2, uint16_t, false, +=, +=)
  DO_LDAV(vmlaldavuw, 4, uint32_t, false, +=, +=)
 +
 +DO_LDAV(vmlsldavsh, 2, int16_t, false, +=, -=)
 +DO_LDAV(vmlsldavxsh, 2, int16_t, true, +=, -=)
 +DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
 +DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
      };
      return do_long_dual_acc(s, a, fns[a->size][a->x]);
  }
 +
 +static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
 +{
 +    static MVEGenDualAccOpFn * const fns[4][2] = {
 +        { NULL, NULL },
 +        { gen_helper_mve_vmlsldavsh, gen_helper_mve_vmlsldavxsh },
 +        { gen_helper_mve_vmlsldavsw, gen_helper_mve_vmlsldavxsw },
 +        { NULL, NULL },
 +    };
 +    return do_long_dual_acc(s, a, fns[a->size][a->x]);
 +}
 --
 .20.1

-[PULL 20/49] hw/display/omap_lcdc: Expand out macros in template header
+[PULL 32/57] target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
-The omap_lcdc template header is already only included once, for
+Implement the MVE VRMLALDAVH and VRMLSLDAVH insns, which accumulate
-DEPTH==32, but it still has all the macro-driven parameterization
+the results of a rounded multiply of pairs of elements into a 72-bit
-for other depths. Expand out all the macros in the header.
+accumulator, returning the top 64 bits in a pair of general purpose
 registers.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210617121628.20116-22-peter.maydell@linaro.org
 Message-id: 20210215103215.4944-6-peter.maydell@linaro.org
 ---
- hw/display/omap_lcd_template.h | 67 ++++++++++++++--------------------
+ target/arm/helper-mve.h    |  8 ++++++++
-file changed, 28 insertions(+), 39 deletions(-)
+ target/arm/mve.decode      |  7 +++++++
  target/arm/mve_helper.c    | 37 +++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c | 24 ++++++++++++++++++++++++
 files changed, 76 insertions(+)
-diff --git a/hw/display/omap_lcd_template.h b/hw/display/omap_lcd_template.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/omap_lcd_template.h
+--- a/target/arm/helper-mve.h
-+++ b/hw/display/omap_lcd_template.h
++++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmlsldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlsldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlsldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlsldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 +
 +DEF_HELPER_FLAGS_4(mve_vrmlaldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 +DEF_HELPER_FLAGS_4(mve_vrmlaldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 +
 +DEF_HELPER_FLAGS_4(mve_vrmlaldavhuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 +
 +DEF_HELPER_FLAGS_4(mve_vrmlsldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 +DEF_HELPER_FLAGS_4(mve_vrmlsldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
  @vmlaldav        .... .... . ... ... . ... . .... .... qm:3 . \
                   qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
 +@vmlaldav_nosz   .... .... . ... ... . ... . .... .... qm:3 . \
 +                 qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
  VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
  VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
  VMLSLDAV         1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav
 +
 +VRMLALDAVH_S     1110 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
 +VRMLALDAVH_U     1111 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
 +
 +VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_nosz
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
 @@ -XXX,XX +XXX,XX @@
-  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
   */
--#if DEPTH == 32
+ #include "qemu/osdep.h"
--# define BPP 4
++#include "qemu/int128.h"
--# define PIXEL_TYPE uint32_t
+ #include "cpu.h"
--#else
+ #include "internals.h"
--# error unsupport depth
+ #include "vec_internal.h"
--#endif
+@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavsh, 2, int16_t, false, +=, -=)
--
+ DO_LDAV(vmlsldavxsh, 2, int16_t, true, +=, -=)
- /*
+ DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
-  * 2-bit colour
+ DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
-  */
++
--static void glue(draw_line2_, DEPTH)(void *opaque,
++/*
--                uint8_t *d, const uint8_t *s, int width, int deststep)
++ * Rounding multiply add long dual accumulate high: we must keep
-+static void draw_line2_32(void *opaque, uint8_t *d, const uint8_t *s,
++ * a 72-bit internal accumulator value and return the top 64 bits.
-+                          int width, int deststep)
++ */
- {
++#define DO_LDAVH(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC, TO128)         \
-     uint16_t *pal = opaque;
++    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
-     uint8_t v, r, g, b;
++                                    void *vm, uint64_t a)               \
-@@ -XXX,XX +XXX,XX @@ static void glue(draw_line2_, DEPTH)(void *opaque,
++    {                                                                   \
-         r = (pal[v & 3] >> 4) & 0xf0;
++        uint16_t mask = mve_element_mask(env);                          \
-         g = pal[v & 3] & 0xf0;
++        unsigned e;                                                     \
-         b = (pal[v & 3] << 4) & 0xf0;
++        TYPE *n = vn, *m = vm;                                          \
--        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
++        Int128 acc = int128_lshift(TO128(a), 8);                        \
--        d += BPP;
++        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
-+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
++            if (mask & 1) {                                             \
-+        d += 4;
++                if (e & 1) {                                            \
-         v >>= 2;
++                    acc = ODDACC(acc, TO128(n[H##ESIZE(e - 1 * XCHG)] * \
-         r = (pal[v & 3] >> 4) & 0xf0;
++                                            m[H##ESIZE(e)]));           \
-         g = pal[v & 3] & 0xf0;
++                } else {                                                \
-         b = (pal[v & 3] << 4) & 0xf0;
++                    acc = EVENACC(acc, TO128(n[H##ESIZE(e + 1 * XCHG)] * \
--        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
++                                             m[H##ESIZE(e)]));          \
--        d += BPP;
++                }                                                       \
-+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
++                acc = int128_add(acc, 1 << 7);                          \
-+        d += 4;
++            }                                                           \
-         v >>= 2;
++        }                                                               \
-         r = (pal[v & 3] >> 4) & 0xf0;
++        mve_advance_vpt(env);                                           \
-         g = pal[v & 3] & 0xf0;
++        return int128_getlo(int128_rshift(acc, 8));                     \
-         b = (pal[v & 3] << 4) & 0xf0;
++    }
--        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
++
--        d += BPP;
++DO_LDAVH(vrmlaldavhsw, 4, int32_t, false, int128_add, int128_add, int128_makes64)
-+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
++DO_LDAVH(vrmlaldavhxsw, 4, int32_t, true, int128_add, int128_add, int128_makes64)
-+        d += 4;
++
-         v >>= 2;
++DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64)
-         r = (pal[v & 3] >> 4) & 0xf0;
++
-         g = pal[v & 3] & 0xf0;
++DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
-         b = (pal[v & 3] << 4) & 0xf0;
++DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
--        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
--        d += BPP;
+index XXXXXXX..XXXXXXX 100644
-+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+--- a/target/arm/translate-mve.c
-+        d += 4;
++++ b/target/arm/translate-mve.c
-         s ++;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
-         width -= 4;
+     };
-     } while (width > 0);
+     return do_long_dual_acc(s, a, fns[a->size][a->x]);
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line2_, DEPTH)(void *opaque,
  /*
   * 4-bit colour
   */
 -static void glue(draw_line4_, DEPTH)(void *opaque,
 -                uint8_t *d, const uint8_t *s, int width, int deststep)
 +static void draw_line4_32(void *opaque, uint8_t *d, const uint8_t *s,
 +                          int width, int deststep)
  {
      uint16_t *pal = opaque;
      uint8_t v, r, g, b;
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line4_, DEPTH)(void *opaque,
          r = (pal[v & 0xf] >> 4) & 0xf0;
          g = pal[v & 0xf] & 0xf0;
          b = (pal[v & 0xf] << 4) & 0xf0;
 -        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
 -        d += BPP;
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 +        d += 4;
          v >>= 4;
          r = (pal[v & 0xf] >> 4) & 0xf0;
          g = pal[v & 0xf] & 0xf0;
          b = (pal[v & 0xf] << 4) & 0xf0;
 -        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
 -        d += BPP;
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
 +        d += 4;
          s ++;
          width -= 2;
      } while (width > 0);
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line4_, DEPTH)(void *opaque,
  /*
   * 8-bit colour
   */
 -static void glue(draw_line8_, DEPTH)(void *opaque,
 -                uint8_t *d, const uint8_t *s, int width, int deststep)
 +static void draw_line8_32(void *opaque, uint8_t *d, const uint8_t *s,
 +                          int width, int deststep)
  {
      uint16_t *pal = opaque;
      uint8_t v, r, g, b;
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line8_, DEPTH)(void *opaque,
          r = (pal[v] >> 4) & 0xf0;
          g = pal[v] & 0xf0;
          b = (pal[v] << 4) & 0xf0;
 -        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
          s ++;
 -        d += BPP;
 +        d += 4;
      } while (-- width != 0);
  }
++
- /*
++static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
-  * 12-bit colour
++{
-  */
++    static MVEGenDualAccOpFn * const fns[] = {
--static void glue(draw_line12_, DEPTH)(void *opaque,
++        gen_helper_mve_vrmlaldavhsw, gen_helper_mve_vrmlaldavhxsw,
--                uint8_t *d, const uint8_t *s, int width, int deststep)
++    };
-+static void draw_line12_32(void *opaque, uint8_t *d, const uint8_t *s,
++    return do_long_dual_acc(s, a, fns[a->x]);
-+                           int width, int deststep)
++}
- {
++
-     uint16_t v;
++static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
-     uint8_t r, g, b;
++{
-@@ -XXX,XX +XXX,XX @@ static void glue(draw_line12_, DEPTH)(void *opaque,
++    static MVEGenDualAccOpFn * const fns[] = {
-         r = (v >> 4) & 0xf0;
++        gen_helper_mve_vrmlaldavhuw, NULL,
-         g = v & 0xf0;
++    };
-         b = (v << 4) & 0xf0;
++    return do_long_dual_acc(s, a, fns[a->x]);
--        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
++}
-+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
++
-         s += 2;
++static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
--        d += BPP;
++{
-+        d += 4;
++    static MVEGenDualAccOpFn * const fns[] = {
-     } while (-- width != 0);
++        gen_helper_mve_vrmlsldavhsw, gen_helper_mve_vrmlsldavhxsw,
- }
++    };
++    return do_long_dual_acc(s, a, fns[a->x]);
- /*
++}
   * 16-bit colour
   */
 -static void glue(draw_line16_, DEPTH)(void *opaque,
 -                uint8_t *d, const uint8_t *s, int width, int deststep)
 +static void draw_line16_32(void *opaque, uint8_t *d, const uint8_t *s,
 +                           int width, int deststep)
  {
  #if defined(HOST_WORDS_BIGENDIAN) == defined(TARGET_WORDS_BIGENDIAN)
      memcpy(d, s, width * 2);
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line16_, DEPTH)(void *opaque,
          r = (v >> 8) & 0xf8;
          g = (v >> 3) & 0xfc;
          b = (v << 3) & 0xf8;
 -        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
 +        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
          s += 2;
 -        d += BPP;
 +        d += 4;
      } while (-- width != 0);
  #endif
  }
 -
 -#undef DEPTH
 -#undef BPP
 -#undef PIXEL_TYPE
 --
 .20.1

-[PULL 30/49] hw/misc/mps2-fpgaio: Make number of LEDs configurable by board
+[PULL 33/57] target/arm: Implement MVE VADD (scalar)
-The MPS2 board has 2 LEDs, but the MPS3 board has 10 LEDs.  The
+Implement the scalar form of the MVE VADD insn. This takes the
-FPGAIO device is similar on both sets of boards, but the LED0
+scalar operand from a general purpose register.
 register has correspondingly more bits that have an effect.  Add a
 device property for number of LEDs.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-6-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-23-peter.maydell@linaro.org
 ---
- include/hw/misc/mps2-fpgaio.h |  5 ++++-
+ target/arm/helper-mve.h    |  4 ++++
- hw/misc/mps2-fpgaio.c         | 31 +++++++++++++++++++++++--------
+ target/arm/mve.decode      |  7 ++++++
-files changed, 27 insertions(+), 9 deletions(-)
+ target/arm/mve_helper.c    | 22 +++++++++++++++++++
  target/arm/translate-mve.c | 45 ++++++++++++++++++++++++++++++++++++++
 files changed, 78 insertions(+)
-diff --git a/include/hw/misc/mps2-fpgaio.h b/include/hw/misc/mps2-fpgaio.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/mps2-fpgaio.h
+--- a/target/arm/helper-mve.h
-+++ b/include/hw/misc/mps2-fpgaio.h
++++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
 @@ -XXX,XX +XXX,XX @@
- #define TYPE_MPS2_FPGAIO "mps2-fpgaio"
+ &vldr_vstr rn qd imm p a w size l u
- OBJECT_DECLARE_SIMPLE_TYPE(MPS2FPGAIO, MPS2_FPGAIO)
+ &1op qd qm size
+ &2op qd qm qn size
-+#define MPS2FPGAIO_MAX_LEDS 32
++&2scalar qd qn rm size
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
  @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
  @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
 +@2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 +
- struct MPS2FPGAIO {
+ # Vector loads and stores
-     /*< private >*/
-     SysBusDevice parent_obj;
+ # Widening loads and narrowing stores:
+@@ -XXX,XX +XXX,XX @@ VRMLALDAVH_S     1110 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_no
-     /*< public >*/
+ VRMLALDAVH_U     1111 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
-     MemoryRegion iomem;
--    LEDState *led[2];
+ VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_nosz
-+    LEDState *led[MPS2FPGAIO_MAX_LEDS];
++
-+    uint32_t num_leds;
++# Scalar operations
++
-     uint32_t led0;
++VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
-     uint32_t prescale;
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 diff --git a/hw/misc/mps2-fpgaio.c b/hw/misc/mps2-fpgaio.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/mps2-fpgaio.c
+--- a/target/arm/mve_helper.c
-+++ b/hw/misc/mps2-fpgaio.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void mps2_fpgaio_write(void *opaque, hwaddr offset, uint64_t value,
+@@ -XXX,XX +XXX,XX @@ DO_2OP_S(vhsubs, do_vhsub_s)
+ DO_2OP_U(vhsubu, do_vhsub_u)
-     switch (offset) {
-     case A_LED0:
--        s->led0 = value & 0x3;
++#define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
--        led_set_state(s->led[0], value & 0x01);
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
--        led_set_state(s->led[1], value & 0x02);
++                                uint32_t rm)                            \
-+        if (s->num_leds != 0) {
++    {                                                                   \
-+            uint32_t i;
++        TYPE *d = vd, *n = vn;                                          \
-+
++        TYPE m = rm;                                                    \
-+            s->led0 = value & MAKE_64BIT_MASK(0, s->num_leds);
++        uint16_t mask = mve_element_mask(env);                          \
-+            for (i = 0; i < s->num_leds; i++) {
++        unsigned e;                                                     \
-+                led_set_state(s->led[i], value & (1 << i));
++        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
-+            }
++            mergemask(&d[H##ESIZE(e)], FN(n[H##ESIZE(e)], m), mask);    \
-+        }
++        }                                                               \
-         break;
++        mve_advance_vpt(env);                                           \
      case A_PRESCALE:
          resync_counter(s);
@@ -XXX,XX +XXX,XX @@ static void mps2_fpgaio_reset(DeviceState *dev)
      s->pscntr = 0;
      s->pscntr_sync_ticks = now;
 -    for (size_t i = 0; i < ARRAY_SIZE(s->led); i++) {
 +    for (size_t i = 0; i < s->num_leds; i++) {
          device_cold_reset(DEVICE(s->led[i]));
      }
  }
@@ -XXX,XX +XXX,XX @@ static void mps2_fpgaio_init(Object *obj)
  static void mps2_fpgaio_realize(DeviceState *dev, Error **errp)
  {
      MPS2FPGAIO *s = MPS2_FPGAIO(dev);
 +    uint32_t i;
 -    s->led[0] = led_create_simple(OBJECT(dev), GPIO_POLARITY_ACTIVE_HIGH,
 -                                  LED_COLOR_GREEN, "USERLED0");
 -    s->led[1] = led_create_simple(OBJECT(dev), GPIO_POLARITY_ACTIVE_HIGH,
 -                                  LED_COLOR_GREEN, "USERLED1");
 +    if (s->num_leds > MPS2FPGAIO_MAX_LEDS) {
 +        error_setg(errp, "num-leds cannot be greater than %d",
 +                   MPS2FPGAIO_MAX_LEDS);
 +        return;
 +    }
 +
-+    for (i = 0; i < s->num_leds; i++) {
++/* provide unsigned 2-op scalar helpers for all sizes */
-+        g_autofree char *ledname = g_strdup_printf("USERLED%d", i);
++#define DO_2OP_SCALAR_U(OP, FN)                 \
-+        s->led[i] = led_create_simple(OBJECT(dev), GPIO_POLARITY_ACTIVE_HIGH,
++    DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
-+                                      LED_COLOR_GREEN, ledname);
++    DO_2OP_SCALAR(OP##h, 2, uint16_t, FN)       \
 +    DO_2OP_SCALAR(OP##w, 4, uint32_t, FN)
 +
 +DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
 +
  /*
   * Multiply add long dual accumulate ops.
   */
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
  typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 +typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULL_BU, vmullbu)
  DO_2OP(VMULL_TS, vmullts)
  DO_2OP(VMULL_TU, vmulltu)
 +static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
 +                          MVEGenTwoOpScalarFn fn)
 +{
 +    TCGv_ptr qd, qn;
 +    TCGv_i32 rm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd | a->qn) ||
 +        !fn) {
 +        return false;
 +    }
- }
++    if (a->rm == 13 || a->rm == 15) {
++        /* UNPREDICTABLE */
- static bool mps2_fpgaio_counters_needed(void *opaque)
++        return false;
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription mps2_fpgaio_vmstate = {
++    }
- static Property mps2_fpgaio_properties[] = {
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-     /* Frequency of the prescale counter */
++        return true;
-     DEFINE_PROP_UINT32("prescale-clk", MPS2FPGAIO, prescale_clk, 20000000),
++    }
-+    /* Number of LEDs controlled by LED0 register */
++
-+    DEFINE_PROP_UINT32("num-leds", MPS2FPGAIO, num_leds, 2),
++    qd = mve_qreg_ptr(a->qd);
-     DEFINE_PROP_END_OF_LIST(),
++    qn = mve_qreg_ptr(a->qn);
- };
++    rm = load_reg(s, a->rm);
++    fn(cpu_env, qd, qn, rm);
 +    tcg_temp_free_i32(rm);
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_ptr(qn);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +#define DO_2OP_SCALAR(INSN, FN) \
 +    static bool trans_##INSN(DisasContext *s, arg_2scalar *a)   \
 +    {                                                           \
 +        static MVEGenTwoOpScalarFn * const fns[] = {            \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +            gen_helper_mve_##FN##w,                             \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_2op_scalar(s, a, fns[a->size]);               \
 +    }
 +
 +DO_2OP_SCALAR(VADD_scalar, vadd_scalar)
 +
  static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                               MVEGenDualAccOpFn *fn)
  {
 --
 .20.1

-[PULL 27/49] hw/misc/mps2-scc: Support configurable number of OSCCLK values
+[PULL 34/57] target/arm: Implement MVE VSUB, VMUL (scalar)
-Currently the MPS2 SCC device implements a fixed number of OSCCLK
+Implement the scalar forms of the MVE VSUB and VMUL insns.
 values (3).  The variant of this device in the MPS3 AN524 board has 6
 OSCCLK values.  Switch to using a PROP_ARRAY, which allows board code
 to specify how large the OSCCLK array should be as well as its
 values.
 With a variable-length property array, the SCC no longer specifies
 default values for the OSCCLKs, so we must set them explicitly in the
 board code.  This defaults are actually incorrect for the an521 and
 an505; we will correct this bug in a following patch.
 This is a migration compatibility break for all the mps boards.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-3-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-24-peter.maydell@linaro.org
 ---
- include/hw/misc/mps2-scc.h |  7 +++----
+ target/arm/helper-mve.h    | 8 ++++++++
- hw/arm/mps2-tz.c           |  5 +++++
+ target/arm/mve.decode      | 2 ++
- hw/arm/mps2.c              |  5 +++++
+ target/arm/mve_helper.c    | 2 ++
- hw/misc/mps2-scc.c         | 24 +++++++++++++-----------
+ target/arm/translate-mve.c | 2 ++
-files changed, 26 insertions(+), 15 deletions(-)
+files changed, 14 insertions(+)
-diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/mps2-scc.h
+--- a/target/arm/helper-mve.h
-+++ b/include/hw/misc/mps2-scc.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define TYPE_MPS2_SCC "mps2-scc"
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- OBJECT_DECLARE_SIMPLE_TYPE(MPS2SCC, MPS2_SCC)
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--#define NUM_OSCCLK 3
++DEF_HELPER_FLAGS_4(mve_vsub_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--
++DEF_HELPER_FLAGS_4(mve_vsub_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- struct MPS2SCC {
++DEF_HELPER_FLAGS_4(mve_vsub_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     /*< private >*/
++
-     SysBusDevice parent_obj;
++DEF_HELPER_FLAGS_4(mve_vmul_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ struct MPS2SCC {
++DEF_HELPER_FLAGS_4(mve_vmul_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     uint32_t dll;
++DEF_HELPER_FLAGS_4(mve_vmul_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     uint32_t aid;
++
-     uint32_t id;
+ DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
--    uint32_t oscclk[NUM_OSCCLK];
+ DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
--    uint32_t oscclk_reset[NUM_OSCCLK];
+ DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+    uint32_t num_oscclk;
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 +    uint32_t *oscclk;
 +    uint32_t *oscclk_reset;
  };
  #endif
 diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/mve.decode
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
+@@ -XXX,XX +XXX,XX @@ VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_no
-     qdev_prop_set_uint32(sccdev, "scc-cfg4", 0x2);
+ # Scalar operations
-     qdev_prop_set_uint32(sccdev, "scc-aid", 0x00200008);
-     qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
+ VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
-+    /* This will need to be per-FPGA image eventually */
++VSUB_scalar      1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
-+    qdev_prop_set_uint32(sccdev, "len-oscclk", 3);
++VMUL_scalar      1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
-+    qdev_prop_set_uint32(sccdev, "oscclk[0]", 50000000);
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 +    qdev_prop_set_uint32(sccdev, "oscclk[1]", 24576000);
 +    qdev_prop_set_uint32(sccdev, "oscclk[2]", 25000000);
      sysbus_realize(SYS_BUS_DEVICE(scc), &error_fatal);
      return sysbus_mmio_get_region(SYS_BUS_DEVICE(sccdev), 0);
  }
 diff --git a/hw/arm/mps2.c b/hw/arm/mps2.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2.c
+--- a/target/arm/mve_helper.c
-+++ b/hw/arm/mps2.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void mps2_common_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhsubu, do_vhsub_u)
-     qdev_prop_set_uint32(sccdev, "scc-cfg4", 0x2);
+     DO_2OP_SCALAR(OP##w, 4, uint32_t, FN)
-     qdev_prop_set_uint32(sccdev, "scc-aid", 0x00200008);
-     qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
+ DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
-+    /* All these FPGA images have the same OSCCLK configuration */
++DO_2OP_SCALAR_U(vsub_scalar, DO_SUB)
-+    qdev_prop_set_uint32(sccdev, "len-oscclk", 3);
++DO_2OP_SCALAR_U(vmul_scalar, DO_MUL)
-+    qdev_prop_set_uint32(sccdev, "oscclk[0]", 50000000);
-+    qdev_prop_set_uint32(sccdev, "oscclk[1]", 24576000);
+ /*
-+    qdev_prop_set_uint32(sccdev, "oscclk[2]", 25000000);
+  * Multiply add long dual accumulate ops.
-     sysbus_realize(SYS_BUS_DEVICE(&mms->scc), &error_fatal);
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
      sysbus_mmio_map(SYS_BUS_DEVICE(sccdev), 0, 0x4002f000);
      object_initialize_child(OBJECT(mms), "fpgaio",
 diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/mps2-scc.c
+--- a/target/arm/translate-mve.c
-+++ b/hw/misc/mps2-scc.c
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool scc_cfg_write(MPS2SCC *s, unsigned function,
+@@ -XXX,XX +XXX,XX @@ static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
  {
      trace_mps2_scc_cfg_write(function, device, value);
 -    if (function != 1 || device >= NUM_OSCCLK) {
 +    if (function != 1 || device >= s->num_oscclk) {
          qemu_log_mask(LOG_GUEST_ERROR,
                        "MPS2 SCC config write: bad function %d device %d\n",
                        function, device);
@@ -XXX,XX +XXX,XX @@ static bool scc_cfg_write(MPS2SCC *s, unsigned function,
  static bool scc_cfg_read(MPS2SCC *s, unsigned function,
                           unsigned device, uint32_t *value)
  {
 -    if (function != 1 || device >= NUM_OSCCLK) {
 +    if (function != 1 || device >= s->num_oscclk) {
          qemu_log_mask(LOG_GUEST_ERROR,
                        "MPS2 SCC config read: bad function %d device %d\n",
                        function, device);
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_reset(DeviceState *dev)
      s->cfgctrl = 0x100000;
      s->cfgstat = 0;
      s->dll = 0xffff0001;
 -    for (i = 0; i < NUM_OSCCLK; i++) {
 +    for (i = 0; i < s->num_oscclk; i++) {
          s->oscclk[i] = s->oscclk_reset[i];
      }
-     for (i = 0; i < ARRAY_SIZE(s->led); i++) {
-@@ -XXX,XX +XXX,XX @@ static void mps2_scc_realize(DeviceState *dev, Error **errp)
+ DO_2OP_SCALAR(VADD_scalar, vadd_scalar)
-                                       LED_COLOR_GREEN, name);
++DO_2OP_SCALAR(VSUB_scalar, vsub_scalar)
-         g_free(name);
++DO_2OP_SCALAR(VMUL_scalar, vmul_scalar)
-     }
-+
+ static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
-+    s->oscclk = g_new0(uint32_t, s->num_oscclk);
+                              MVEGenDualAccOpFn *fn)
  }
  static const VMStateDescription mps2_scc_vmstate = {
      .name = "mps2-scc",
 -    .version_id = 1,
 -    .minimum_version_id = 1,
 +    .version_id = 2,
 +    .minimum_version_id = 2,
      .fields = (VMStateField[]) {
          VMSTATE_UINT32(cfg0, MPS2SCC),
          VMSTATE_UINT32(cfg1, MPS2SCC),
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription mps2_scc_vmstate = {
          VMSTATE_UINT32(cfgctrl, MPS2SCC),
          VMSTATE_UINT32(cfgstat, MPS2SCC),
          VMSTATE_UINT32(dll, MPS2SCC),
 -        VMSTATE_UINT32_ARRAY(oscclk, MPS2SCC, NUM_OSCCLK),
 +        VMSTATE_VARRAY_UINT32(oscclk, MPS2SCC, num_oscclk,
 +                              0, vmstate_info_uint32, uint32_t),
          VMSTATE_END_OF_LIST()
      }
  };
@@ -XXX,XX +XXX,XX @@ static Property mps2_scc_properties[] = {
      DEFINE_PROP_UINT32("scc-cfg4", MPS2SCC, cfg4, 0),
      DEFINE_PROP_UINT32("scc-aid", MPS2SCC, aid, 0),
      DEFINE_PROP_UINT32("scc-id", MPS2SCC, id, 0),
 -    /* These are the initial settings for the source clocks on the board.
 +    /*
 +     * These are the initial settings for the source clocks on the board.
       * In hardware they can be configured via a config file read by the
       * motherboard configuration controller to suit the FPGA image.
 -     * These default values are used by most of the standard FPGA images.
       */
 -    DEFINE_PROP_UINT32("oscclk0", MPS2SCC, oscclk_reset[0], 50000000),
 -    DEFINE_PROP_UINT32("oscclk1", MPS2SCC, oscclk_reset[1], 24576000),
 -    DEFINE_PROP_UINT32("oscclk2", MPS2SCC, oscclk_reset[2], 25000000),
 +    DEFINE_PROP_ARRAY("oscclk", MPS2SCC, num_oscclk, oscclk_reset,
 +                      qdev_prop_uint32, uint32_t),
      DEFINE_PROP_END_OF_LIST(),
  };
 --
 .20.1

-[PULL 25/49] hw/display/tcx: Drop unnecessary code for handling BGR format outputs
+[PULL 35/57] target/arm: Implement MVE VHADD, VHSUB (scalar)
-For a long time now the UI layer has guaranteed that the console
+Implement the scalar variants of the MVE VHADD and VHSUB insns.
 surface is always 32 bits per pixel, RGB. The TCX code already
 assumes 32bpp, but it still has some checks of is_surface_bgr()
 in an attempt to support 32bpp BGR. is_surface_bgr() will always
 return false for the qemu_console_surface(), unless the display
 device itself has deliberately created an alternate-format
 surface via a function like qemu_create_displaysurface_from().
 Drop the never-used BGR-handling code, and assert that we have
 a 32-bit surface rather than just doing nothing if it isn't.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215102149.20513-1-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-25-peter.maydell@linaro.org
 ---
- hw/display/tcx.c | 31 ++++++++-----------------------
+ target/arm/helper-mve.h    | 16 ++++++++++++++++
-file changed, 8 insertions(+), 23 deletions(-)
+ target/arm/mve.decode      |  4 ++++
  target/arm/mve_helper.c    |  8 ++++++++
  target/arm/translate-mve.c |  4 ++++
 files changed, 32 insertions(+)
-diff --git a/hw/display/tcx.c b/hw/display/tcx.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/tcx.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/display/tcx.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static int tcx_check_dirty(TCXState *s, DirtyBitmapSnapshot *snap,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmul_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vmul_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- static void update_palette_entries(TCXState *s, int start, int end)
+ DEF_HELPER_FLAGS_4(mve_vmul_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- {
--    DisplaySurface *surface = qemu_console_surface(s->con);
++DEF_HELPER_FLAGS_4(mve_vhadds_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     int i;
++DEF_HELPER_FLAGS_4(mve_vhadds_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vhadds_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     for (i = start; i < end; i++) {
++
--        if (is_surface_bgr(surface)) {
++DEF_HELPER_FLAGS_4(mve_vhaddu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--            s->palette[i] = rgb_to_pixel32bgr(s->r[i], s->g[i], s->b[i]);
++DEF_HELPER_FLAGS_4(mve_vhaddu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--        } else {
++DEF_HELPER_FLAGS_4(mve_vhaddu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--            s->palette[i] = rgb_to_pixel32(s->r[i], s->g[i], s->b[i]);
++
--        }
++DEF_HELPER_FLAGS_4(mve_vhsubs_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        s->palette[i] = rgb_to_pixel32(s->r[i], s->g[i], s->b[i]);
++DEF_HELPER_FLAGS_4(mve_vhsubs_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     }
++DEF_HELPER_FLAGS_4(mve_vhsubs_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     tcx_set_dirty(s, 0, memory_region_size(&s->vram_mem));
++
- }
++DEF_HELPER_FLAGS_4(mve_vhsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ static void tcx_draw_cursor32(TCXState *s1, uint8_t *d,
++DEF_HELPER_FLAGS_4(mve_vhsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- }
++DEF_HELPER_FLAGS_4(mve_vhsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_no
  VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
  VSUB_scalar      1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
  VMUL_scalar      1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 +VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 +VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 +VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 +VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhsubu, do_vhsub_u)
      DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
      DO_2OP_SCALAR(OP##h, 2, uint16_t, FN)       \
      DO_2OP_SCALAR(OP##w, 4, uint32_t, FN)
 +#define DO_2OP_SCALAR_S(OP, FN)                 \
 +    DO_2OP_SCALAR(OP##b, 1, int8_t, FN)         \
 +    DO_2OP_SCALAR(OP##h, 2, int16_t, FN)        \
 +    DO_2OP_SCALAR(OP##w, 4, int32_t, FN)
  DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
  DO_2OP_SCALAR_U(vsub_scalar, DO_SUB)
  DO_2OP_SCALAR_U(vmul_scalar, DO_MUL)
 +DO_2OP_SCALAR_S(vhadds_scalar, do_vhadd_s)
 +DO_2OP_SCALAR_U(vhaddu_scalar, do_vhadd_u)
 +DO_2OP_SCALAR_S(vhsubs_scalar, do_vhsub_s)
 +DO_2OP_SCALAR_U(vhsubu_scalar, do_vhsub_u)
  /*
--  XXX Could be much more optimal:
+  * Multiply add long dual accumulate ops.
--  * detect if line/page/whole screen is in 24 bit mode
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
--  * if destination is also BGR, use memcpy
+index XXXXXXX..XXXXXXX 100644
--  */
+--- a/target/arm/translate-mve.c
-+ * XXX Could be much more optimal:
++++ b/target/arm/translate-mve.c
-+ * detect if line/page/whole screen is in 24 bit mode
+@@ -XXX,XX +XXX,XX @@ static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
-+ */
+ DO_2OP_SCALAR(VADD_scalar, vadd_scalar)
- static inline void tcx24_draw_line32(TCXState *s1, uint8_t *d,
+ DO_2OP_SCALAR(VSUB_scalar, vsub_scalar)
-                                      const uint8_t *s, int width,
+ DO_2OP_SCALAR(VMUL_scalar, vmul_scalar)
-                                      const uint32_t *cplane,
++DO_2OP_SCALAR(VHADD_S_scalar, vhadds_scalar)
-                                      const uint32_t *s24)
++DO_2OP_SCALAR(VHADD_U_scalar, vhaddu_scalar)
- {
++DO_2OP_SCALAR(VHSUB_S_scalar, vhsubs_scalar)
--    DisplaySurface *surface = qemu_console_surface(s1->con);
++DO_2OP_SCALAR(VHSUB_U_scalar, vhsubu_scalar)
--    int x, bgr, r, g, b;
-+    int x, r, g, b;
+ static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
-     uint8_t val, *p8;
+                              MVEGenDualAccOpFn *fn)
      uint32_t *p = (uint32_t *)d;
      uint32_t dval;
 -    bgr = is_surface_bgr(surface);
      for(x = 0; x < width; x++, s++, s24++) {
          if (be32_to_cpu(*cplane) & 0x03000000) {
              /* 24-bit direct, BGR order */
@@ -XXX,XX +XXX,XX @@ static inline void tcx24_draw_line32(TCXState *s1, uint8_t *d,
              b = *p8++;
              g = *p8++;
              r = *p8;
 -            if (bgr)
 -                dval = rgb_to_pixel32bgr(r, g, b);
 -            else
 -                dval = rgb_to_pixel32(r, g, b);
 +            dval = rgb_to_pixel32(r, g, b);
          } else {
              /* 8-bit pseudocolor */
              val = *s;
@@ -XXX,XX +XXX,XX @@ static void tcx_update_display(void *opaque)
      int y, y_start, dd, ds;
      uint8_t *d, *s;
 -    if (surface_bits_per_pixel(surface) != 32) {
 -        return;
 -    }
 +    assert(surface_bits_per_pixel(surface) == 32);
      page = 0;
      y_start = -1;
@@ -XXX,XX +XXX,XX @@ static void tcx24_update_display(void *opaque)
      uint8_t *d, *s;
      uint32_t *cptr, *s24;
 -    if (surface_bits_per_pixel(surface) != 32) {
 -            return;
 -    }
 +    assert(surface_bits_per_pixel(surface) == 32);
      page = 0;
      y_start = -1;
 --
 .20.1

-[PULL 42/49] hw/arm/mps2-tz: Set MachineClass default_ram info from RAMInfo data
+[PULL 36/57] target/arm: Implement MVE VBRSR
-Instead of hardcoding the MachineClass default_ram_size and
+Implement the MVE VBRSR insn, which reverses a specified
-default_ram_id fields, set them on class creation by finding the
+number of bits in each element, setting the rest to zero.
 entry in the RAMInfo array which is marked as being the QEMU system
 RAM.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-18-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-26-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 24 ++++++++++++++++++++++--
+ target/arm/helper-mve.h    |  4 ++++
-file changed, 22 insertions(+), 2 deletions(-)
+ target/arm/mve.decode      |  1 +
  target/arm/mve_helper.c    | 43 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  1 +
 files changed, 49 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_class_init(ObjectClass *oc, void *data)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     mc->init = mps2tz_common_init;
+ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     iic->check = mps2_tz_idau_check;
--    mc->default_ram_size = 16 * MiB;
++DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--    mc->default_ram_id = "mps.ram";
++DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
  VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
  VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
  VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 +VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR_U(vhaddu_scalar, do_vhadd_u)
  DO_2OP_SCALAR_S(vhsubs_scalar, do_vhsub_s)
  DO_2OP_SCALAR_U(vhsubu_scalar, do_vhsub_u)
 +static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
 +{
 +    m &= 0xff;
 +    if (m == 0) {
 +        return 0;
 +    }
 +    n = revbit8(n);
 +    if (m < 8) {
 +        n >>= 8 - m;
 +    }
 +    return n;
 +}
 +
-+static void mps2tz_set_default_ram_info(MPS2TZMachineClass *mmc)
++static inline uint32_t do_vbrsrh(uint32_t n, uint32_t m)
 +{
-+    /*
++    m &= 0xff;
-+     * Set mc->default_ram_size and default_ram_id from the
++    if (m == 0) {
-+     * information in mmc->raminfo.
++        return 0;
-+     */
++    }
-+    MachineClass *mc = MACHINE_CLASS(mmc);
++    n = revbit16(n);
-+    const RAMInfo *p;
++    if (m < 16) {
 +        n >>= 16 - m;
 +    }
 +    return n;
 +}
 +
-+    for (p = mmc->raminfo; p->name; p++) {
++static inline uint32_t do_vbrsrw(uint32_t n, uint32_t m)
-+        if (p->mrindex < 0) {
++{
-+            /* Found the entry for "system memory" */
++    m &= 0xff;
-+            mc->default_ram_size = p->size;
++    if (m == 0) {
-+            mc->default_ram_id = p->name;
++        return 0;
 +            return;
 +        }
 +    }
-+    g_assert_not_reached();
++    n = revbit32(n);
- }
++    if (m < 32) {
++        n >>= 32 - m;
- static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
++    }
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
++    return n;
-     mmc->numirq = 92;
++}
-     mmc->raminfo = an505_raminfo;
++
-     mmc->armsse_type = TYPE_IOTKIT;
++DO_2OP_SCALAR(vbrsrb, 1, uint8_t, do_vbrsrb)
-+    mps2tz_set_default_ram_info(mmc);
++DO_2OP_SCALAR(vbrsrh, 2, uint16_t, do_vbrsrh)
- }
++DO_2OP_SCALAR(vbrsrw, 4, uint32_t, do_vbrsrw)
++
- static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
+ /*
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
+  * Multiply add long dual accumulate ops.
-     mmc->numirq = 92;
+  */
-     mmc->raminfo = an505_raminfo; /* AN521 is the same as AN505 here */
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-     mmc->armsse_type = TYPE_SSE200;
+index XXXXXXX..XXXXXXX 100644
-+    mps2tz_set_default_ram_info(mmc);
+--- a/target/arm/translate-mve.c
- }
++++ b/target/arm/translate-mve.c
+@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VHADD_S_scalar, vhadds_scalar)
- static const TypeInfo mps2tz_info = {
+ DO_2OP_SCALAR(VHADD_U_scalar, vhaddu_scalar)
  DO_2OP_SCALAR(VHSUB_S_scalar, vhsubs_scalar)
  DO_2OP_SCALAR(VHSUB_U_scalar, vhsubu_scalar)
 +DO_2OP_SCALAR(VBRSR, vbrsr)
  static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                               MVEGenDualAccOpFn *fn)
 --
 .20.1

-[PULL 33/49] hw/arm/mps2-tz: Condition IRQ splitting on number of CPUs, not board type
+[PULL 37/57] target/arm: Implement MVE VPST
-In the mps2-tz board code, we handle devices whose interrupt lines
+Implement the MVE VPST insn, which sets the predicate mask
-must be wired to all CPUs by creating IRQ splitter devices for the
+fields in the VPR to the immediate value encoded in the insn.
 AN521, because it has 2 CPUs, but wiring the device IRQ directly to
 the SSE/IoTKit input for the AN505, which has only 1 CPU.
 We can avoid making an explicit check on the board type constant by
 instead creating and using the IRQ splitters for any board with more
 than 1 CPU.  This avoids having to add extra cases to the
 conditionals every time we add new boards.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-9-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-27-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 19 +++++++++----------
+ target/arm/mve.decode      |  4 +++
-file changed, 9 insertions(+), 10 deletions(-)
+ target/arm/translate-mve.c | 59 ++++++++++++++++++++++++++++++++++++++
 files changed, 63 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/mve.decode
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ static void make_ram_alias(MemoryRegion *mr, const char *name,
+@@ -XXX,XX +XXX,XX @@ VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
- static qemu_irq get_sse_irq_in(MPS2TZMachineState *mms, int irqno)
+ VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
- {
+ VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
-     /* Return a qemu_irq which will signal IRQ n to all CPUs in the SSE. */
+ VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
--    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
++
-+    MachineClass *mc = MACHINE_GET_CLASS(mms);
++# Predicate operations
++%mask_22_13      22:1 13:3
-     assert(irqno < MPS2TZ_NUMIRQ);
++VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
--    switch (mmc->fpga_type) {
+index XXXXXXX..XXXXXXX 100644
--    case FPGA_AN505:
+--- a/target/arm/translate-mve.c
--        return qdev_get_gpio_in_named(DEVICE(&mms->iotkit), "EXP_IRQ", irqno);
++++ b/target/arm/translate-mve.c
--    case FPGA_AN521:
+@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
 +    if (mc->max_cpus > 1) {
          return qdev_get_gpio_in(DEVICE(&mms->cpu_irq_splitter[irqno]), 0);
 -    default:
 -        g_assert_not_reached();
 +    } else {
 +        return qdev_get_gpio_in_named(DEVICE(&mms->iotkit), "EXP_IRQ", irqno);
      }
  }
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
++static void mve_update_and_store_eci(DisasContext *s)
-     sysbus_realize(SYS_BUS_DEVICE(&mms->iotkit), &error_fatal);
++{
++    /*
-     /*
++     * For insns which don't call a helper function that will call
--     * The AN521 needs us to create splitters to feed the IRQ inputs
++     * mve_advance_vpt(), this version updates s->eci and also stores
--     * for each CPU in the SSE-200 from each device in the board.
++     * it out to the CPUState field.
-+     * If this board has more than one CPU, then we need to create splitters
++     */
-+     * to feed the IRQ inputs for each CPU in the SSE from each device in the
++    if (s->eci) {
-+     * board. If there is only one CPU, we can just wire the device IRQ
++        mve_update_eci(s);
-+     * directly to the SSE's IRQ input.
++        store_cpu_field(tcg_constant_i32(s->eci << 4), condexec_bits);
-      */
++    }
--    if (mmc->fpga_type == FPGA_AN521) {
++}
-+    if (mc->max_cpus > 1) {
++
-         for (i = 0; i < MPS2TZ_NUMIRQ; i++) {
+ static bool mve_skip_first_beat(DisasContext *s)
-             char *name = g_strdup_printf("mps2-irq-splitter%d", i);
+ {
-             SplitIRQ *splitter = &mms->cpu_irq_splitter[i];
+     /* Return true if PSR.ECI says we must skip the first beat of this insn */
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
      };
      return do_long_dual_acc(s, a, fns[a->x]);
  }
 +
 +static bool trans_VPST(DisasContext *s, arg_VPST *a)
 +{
 +    TCGv_i32 vpr;
 +
 +    /* mask == 0 is a "related encoding" */
 +    if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +    /*
 +     * Set the VPR mask fields. We take advantage of MASK01 and MASK23
 +     * being adjacent fields in the register.
 +     *
 +     * This insn is not predicated, but it is subject to beat-wise
 +     * execution, and the mask is updated on the odd-numbered beats.
 +     * So if PSR.ECI says we should skip beat 1, we mustn't update the
 +     * 01 mask field.
 +     */
 +    vpr = load_cpu_field(v7m.vpr);
 +    switch (s->eci) {
 +    case ECI_NONE:
 +    case ECI_A0:
 +        /* Update both 01 and 23 fields */
 +        tcg_gen_deposit_i32(vpr, vpr,
 +                            tcg_constant_i32(a->mask | (a->mask << 4)),
 +                            R_V7M_VPR_MASK01_SHIFT,
 +                            R_V7M_VPR_MASK01_LENGTH + R_V7M_VPR_MASK23_LENGTH);
 +        break;
 +    case ECI_A0A1:
 +    case ECI_A0A1A2:
 +    case ECI_A0A1A2B0:
 +        /* Update only the 23 mask field */
 +        tcg_gen_deposit_i32(vpr, vpr,
 +                            tcg_constant_i32(a->mask),
 +                            R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    store_cpu_field(vpr, v7m.vpr);
 +    mve_update_and_store_eci(s);
 +    return true;
 +}
 --
 .20.1

-[PULL 12/49] virtio-mmio: improve virtio-mmio get_dev_path alog
+[PULL 38/57] target/arm: Implement MVE VQADD and VQSUB
-From: schspa <schspa@gmail.com>
+Implement the MVE VQADD and VQSUB insns, which perform saturating
 addition of a scalar to each element.  Note that individual bytes of
 each result element are used or discarded according to the predicate
 mask, but FPSCR.QC is only set if the predicate mask for the lowest
 byte of the element is set.
-At the moment the following QEMU command line triggers an assertion
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-failure On xlnx-versal SOC:
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-  qemu-system-aarch64 \
+Message-id: 20210617121628.20116-28-peter.maydell@linaro.org
-      -machine xlnx-versal-virt -nographic -smp 2 -m 128 \
+---
-      -fsdev local,id=shareid,path=${HOME}/work,security_model=none \
+ target/arm/helper-mve.h    | 16 ++++++++++
-      -device virtio-9p-device,fsdev=shareid,mount_tag=share \
+ target/arm/mve.decode      |  5 +++
-      -fsdev local,id=shareid1,path=${HOME}/Music,security_model=none \
+ target/arm/mve_helper.c    | 62 ++++++++++++++++++++++++++++++++++++++
-      -device virtio-9p-device,fsdev=shareid1,mount_tag=share1
+ target/arm/translate-mve.c |  4 +++
 files changed, 87 insertions(+)
-  qemu-system-aarch64: ../migration/savevm.c:860:
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
   vmstate_register_with_alias_id:
   Assertion `!se->compat || se->instance_id == 0' failed.
 This problem was fixed on arm virt platform in commit f58b39d2d5b
 ("virtio-mmio: format transport base address in BusClass.get_dev_path")
 It works perfectly on arm virt platform. but there is still there on
 xlnx-versal SOC.
 The main difference between arm virt and xlnx-versal is they use
 different way to create virtio-mmio qdev. on arm virt, it calls
 sysbus_create_simple("virtio-mmio", base, pic[irq]); which will call
 sysbus_mmio_map internally and assign base address to subsys device
 mmio correctly. but xlnx-versal's implements won't do this.
 However, xlnx-versal can't switch to sysbus_create_simple() to create
 virtio-mmio device. It's because xlnx-versal's cpu use
 VersalVirt.soc.fpd.apu.mr as it's memory. which is subregion of
 system_memory. sysbus_create_simple will add virtio to system_memory,
 which can't be accessed by cpu.
 Besides, xlnx-versal can't add sysbus_mmio_map api call too, because
 this will add memory region to system_memory, and it can't be added
 to VersalVirt.soc.fpd.apu.mr again.
 We can solve this by assign correct base address offset on dev_path.
 This path was test on aarch64 virt & xlnx-versal platform.
 Signed-off-by: schspa <schspa@gmail.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/virtio/virtio-mmio.c | 13 +++++++------
 file changed, 7 insertions(+), 6 deletions(-)
 diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/virtio/virtio-mmio.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/virtio/virtio-mmio.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static char *virtio_mmio_bus_get_dev_path(DeviceState *dev)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     BusState *virtio_mmio_bus;
+ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     VirtIOMMIOProxy *virtio_mmio_proxy;
+ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     char *proxy_path;
--    SysBusDevice *proxy_sbd;
++DEF_HELPER_FLAGS_4(mve_vqadds_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     char *path;
++DEF_HELPER_FLAGS_4(mve_vqadds_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+    MemoryRegionSection section;
++DEF_HELPER_FLAGS_4(mve_vqadds_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++
-     virtio_mmio_bus = qdev_get_parent_bus(dev);
++DEF_HELPER_FLAGS_4(mve_vqaddu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     virtio_mmio_proxy = VIRTIO_MMIO(virtio_mmio_bus->parent);
++DEF_HELPER_FLAGS_4(mve_vqaddu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ static char *virtio_mmio_bus_get_dev_path(DeviceState *dev)
++DEF_HELPER_FLAGS_4(mve_vqaddu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqsubs_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqsubs_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqsubs_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
  VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
  VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
  VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 +
 +VQADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
 +VQADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
 +VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 +VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
  VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
  # Predicate operations
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhaddu, do_vhadd_u)
  DO_2OP_S(vhsubs, do_vhsub_s)
  DO_2OP_U(vhsubu, do_vhsub_u)
 +static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
 +{
 +    if (val > max) {
 +        *s = true;
 +        return max;
 +    } else if (val < min) {
 +        *s = true;
 +        return min;
 +    }
 +    return val;
 +}
 +
 +#define DO_SQADD_B(n, m, s) do_sat_bhw((int64_t)n + m, INT8_MIN, INT8_MAX, s)
 +#define DO_SQADD_H(n, m, s) do_sat_bhw((int64_t)n + m, INT16_MIN, INT16_MAX, s)
 +#define DO_SQADD_W(n, m, s) do_sat_bhw((int64_t)n + m, INT32_MIN, INT32_MAX, s)
 +
 +#define DO_UQADD_B(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT8_MAX, s)
 +#define DO_UQADD_H(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT16_MAX, s)
 +#define DO_UQADD_W(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT32_MAX, s)
 +
 +#define DO_SQSUB_B(n, m, s) do_sat_bhw((int64_t)n - m, INT8_MIN, INT8_MAX, s)
 +#define DO_SQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, INT16_MIN, INT16_MAX, s)
 +#define DO_SQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, INT32_MIN, INT32_MAX, s)
 +
 +#define DO_UQSUB_B(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT8_MAX, s)
 +#define DO_UQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT16_MAX, s)
 +#define DO_UQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT32_MAX, s)
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhsubu, do_vhsub_u)
          mve_advance_vpt(env);                                           \
      }
-     /* Otherwise, we append the base address of the transport. */
++#define DO_2OP_SAT_SCALAR(OP, ESIZE, TYPE, FN)                          \
--    proxy_sbd = SYS_BUS_DEVICE(virtio_mmio_proxy);
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
--    assert(proxy_sbd->num_mmio == 1);
++                                uint32_t rm)                            \
--    assert(proxy_sbd->mmio[0].memory == &virtio_mmio_proxy->iomem);
++    {                                                                   \
-+    section = memory_region_find(&virtio_mmio_proxy->iomem, 0, 0x200);
++        TYPE *d = vd, *n = vn;                                          \
-+    assert(section.mr);
++        TYPE m = rm;                                                    \
++        uint16_t mask = mve_element_mask(env);                          \
-     if (proxy_path) {
++        unsigned e;                                                     \
-         path = g_strdup_printf("%s/virtio-mmio@" TARGET_FMT_plx, proxy_path,
++        bool qc = false;                                                \
--                               proxy_sbd->mmio[0].addr);
++        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
-+                               section.offset_within_address_space);
++            bool sat = false;                                           \
-     } else {
++            mergemask(&d[H##ESIZE(e)], FN(n[H##ESIZE(e)], m, &sat),     \
-         path = g_strdup_printf("virtio-mmio@" TARGET_FMT_plx,
++                      mask);                                            \
--                               proxy_sbd->mmio[0].addr);
++            qc |= sat & mask & 1;                                       \
-+                               section.offset_within_address_space);
++        }                                                               \
-     }
++        if (qc) {                                                       \
-+    memory_region_unref(section.mr);
++            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-     g_free(proxy_path);
+ /* provide unsigned 2-op scalar helpers for all sizes */
-     return path;
+ #define DO_2OP_SCALAR_U(OP, FN)                 \
- }
+     DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR_U(vhaddu_scalar, do_vhadd_u)
  DO_2OP_SCALAR_S(vhsubs_scalar, do_vhsub_s)
  DO_2OP_SCALAR_U(vhsubu_scalar, do_vhsub_u)
 +DO_2OP_SAT_SCALAR(vqaddu_scalarb, 1, uint8_t, DO_UQADD_B)
 +DO_2OP_SAT_SCALAR(vqaddu_scalarh, 2, uint16_t, DO_UQADD_H)
 +DO_2OP_SAT_SCALAR(vqaddu_scalarw, 4, uint32_t, DO_UQADD_W)
 +DO_2OP_SAT_SCALAR(vqadds_scalarb, 1, int8_t, DO_SQADD_B)
 +DO_2OP_SAT_SCALAR(vqadds_scalarh, 2, int16_t, DO_SQADD_H)
 +DO_2OP_SAT_SCALAR(vqadds_scalarw, 4, int32_t, DO_SQADD_W)
 +
 +DO_2OP_SAT_SCALAR(vqsubu_scalarb, 1, uint8_t, DO_UQSUB_B)
 +DO_2OP_SAT_SCALAR(vqsubu_scalarh, 2, uint16_t, DO_UQSUB_H)
 +DO_2OP_SAT_SCALAR(vqsubu_scalarw, 4, uint32_t, DO_UQSUB_W)
 +DO_2OP_SAT_SCALAR(vqsubs_scalarb, 1, int8_t, DO_SQSUB_B)
 +DO_2OP_SAT_SCALAR(vqsubs_scalarh, 2, int16_t, DO_SQSUB_H)
 +DO_2OP_SAT_SCALAR(vqsubs_scalarw, 4, int32_t, DO_SQSUB_W)
 +
  static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
  {
      m &= 0xff;
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VHADD_S_scalar, vhadds_scalar)
  DO_2OP_SCALAR(VHADD_U_scalar, vhaddu_scalar)
  DO_2OP_SCALAR(VHSUB_S_scalar, vhsubs_scalar)
  DO_2OP_SCALAR(VHSUB_U_scalar, vhsubu_scalar)
 +DO_2OP_SCALAR(VQADD_S_scalar, vqadds_scalar)
 +DO_2OP_SCALAR(VQADD_U_scalar, vqaddu_scalar)
 +DO_2OP_SCALAR(VQSUB_S_scalar, vqsubs_scalar)
 +DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
  DO_2OP_SCALAR(VBRSR, vbrsr)
  static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
 --
 .20.1

-[PULL 24/49] hw/display/omap_lcdc: Delete unnecessary macro
+[PULL 39/57] target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
-The macro draw_line_func is used only once; just expand it.
+Implement the MVE VQDMULH and VQRDMULH scalar insns, which multiply
 elements by the scalar, double, possibly round, take the high half
 and saturate.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210617121628.20116-29-peter.maydell@linaro.org
 Message-id: 20210215103215.4944-10-peter.maydell@linaro.org
 ---
- hw/display/omap_lcdc.c | 4 +---
+ target/arm/helper-mve.h    |  8 ++++++++
-file changed, 1 insertion(+), 3 deletions(-)
+ target/arm/mve.decode      |  3 +++
  target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 38 insertions(+)
-diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/omap_lcdc.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/display/omap_lcdc.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     qemu_irq_lower(s->irq);
+ DEF_HELPER_FLAGS_4(mve_vqsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- }
+ DEF_HELPER_FLAGS_4(mve_vqsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--#define draw_line_func drawfn
++DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--
++DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- /*
++DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-  * 2-bit colour
++
-  */
++DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ static void omap_update_display(void *opaque)
++DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
  VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
  VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 +VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 +VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 +
  # Predicate operations
  %mask_22_13      22:1 13:3
  VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  #define DO_UQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT16_MAX, s)
  #define DO_UQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT32_MAX, s)
 +/*
 + * For QDMULH and QRDMULH we simplify "double and shift by esize" into
 + * "shift by esize-1", adjusting the QRDMULH rounding constant to match.
 + */
 +#define DO_QDMULH_B(n, m, s) do_sat_bhw(((int64_t)n * m) >> 7, \
 +                                        INT8_MIN, INT8_MAX, s)
 +#define DO_QDMULH_H(n, m, s) do_sat_bhw(((int64_t)n * m) >> 15, \
 +                                        INT16_MIN, INT16_MAX, s)
 +#define DO_QDMULH_W(n, m, s) do_sat_bhw(((int64_t)n * m) >> 31, \
 +                                        INT32_MIN, INT32_MAX, s)
 +
 +#define DO_QRDMULH_B(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 6)) >> 7, \
 +                                         INT8_MIN, INT8_MAX, s)
 +#define DO_QRDMULH_H(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 14)) >> 15, \
 +                                         INT16_MIN, INT16_MAX, s)
 +#define DO_QRDMULH_W(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 30)) >> 31, \
 +                                         INT32_MIN, INT32_MAX, s)
 +
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                  uint32_t rm)                            \
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqsubs_scalarb, 1, int8_t, DO_SQSUB_B)
  DO_2OP_SAT_SCALAR(vqsubs_scalarh, 2, int16_t, DO_SQSUB_H)
  DO_2OP_SAT_SCALAR(vqsubs_scalarw, 4, int32_t, DO_SQSUB_W)
 +DO_2OP_SAT_SCALAR(vqdmulh_scalarb, 1, int8_t, DO_QDMULH_B)
 +DO_2OP_SAT_SCALAR(vqdmulh_scalarh, 2, int16_t, DO_QDMULH_H)
 +DO_2OP_SAT_SCALAR(vqdmulh_scalarw, 4, int32_t, DO_QDMULH_W)
 +DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
 +DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
 +DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 +
  static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
  {
-     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
+     m &= 0xff;
-     DisplaySurface *surface;
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
--    draw_line_func draw_line;
+index XXXXXXX..XXXXXXX 100644
-+    drawfn draw_line;
+--- a/target/arm/translate-mve.c
-     int size, height, first, last;
++++ b/target/arm/translate-mve.c
-     int width, linesize, step, bpp, frame_offset;
+@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQADD_S_scalar, vqadds_scalar)
-     hwaddr frame_base;
+ DO_2OP_SCALAR(VQADD_U_scalar, vqaddu_scalar)
  DO_2OP_SCALAR(VQSUB_S_scalar, vqsubs_scalar)
  DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
 +DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
 +DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
  DO_2OP_SCALAR(VBRSR, vbrsr)
  static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
 --
 .20.1

-[PULL 22/49] hw/display/omap_lcdc: Fix coding style issues in template header
+[PULL 40/57] target/arm: Implement MVE VQDMULL scalar
-Fix some minor coding style issues in the template header,
+Implement the MVE VQDMULL scalar insn. This multiplies the top or
-so checkpatch doesn't complain when we move the code.
+bottom half of each element by the scalar, doubles and saturates
 to a double-width result.
 Note that this encoding overlaps with VQADD and VQSUB; it uses
 what in VQADD and VQSUB would be the 'size=0b11' encoding.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210617121628.20116-30-peter.maydell@linaro.org
 Message-id: 20210215103215.4944-8-peter.maydell@linaro.org
 ---
- hw/display/omap_lcd_template.h | 6 +++---
+ target/arm/helper-mve.h    |  5 +++
-file changed, 3 insertions(+), 3 deletions(-)
+ target/arm/mve.decode      | 23 +++++++++++---
+ target/arm/mve_helper.c    | 65 ++++++++++++++++++++++++++++++++++++++
-diff --git a/hw/display/omap_lcd_template.h b/hw/display/omap_lcd_template.h
+ target/arm/translate-mve.c | 30 ++++++++++++++++++
-index XXXXXXX..XXXXXXX 100644
+files changed, 119 insertions(+), 4 deletions(-)
---- a/hw/display/omap_lcd_template.h
-+++ b/hw/display/omap_lcd_template.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void draw_line2_32(void *opaque, uint8_t *d, const uint8_t *s,
+index XXXXXXX..XXXXXXX 100644
-         b = (pal[v & 3] << 4) & 0xf0;
+--- a/target/arm/helper-mve.h
-         ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
++++ b/target/arm/helper-mve.h
-         d += 4;
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--        s ++;
+ DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        s++;
+ DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         width -= 4;
-     } while (width > 0);
++DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- }
++DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ static void draw_line4_32(void *opaque, uint8_t *d, const uint8_t *s,
++DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         b = (pal[v & 0xf] << 4) & 0xf0;
++DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
++
-         d += 4;
+ DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
--        s ++;
+ DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-+        s++;
+ DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-         width -= 2;
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-     } while (width > 0);
+index XXXXXXX..XXXXXXX 100644
- }
+--- a/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ static void draw_line8_32(void *opaque, uint8_t *d, const uint8_t *s,
++++ b/target/arm/mve.decode
-         g = pal[v] & 0xf0;
+@@ -XXX,XX +XXX,XX @@
-         b = (pal[v] << 4) & 0xf0;
+ %qm 5:1 1:3
-         ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+ %qn 7:1 17:3
--        s ++;
-+        s++;
++# VQDMULL has size in bit 28: 0 for 16 bit, 1 for 32 bit
-         d += 4;
++%size_28 28:1 !function=plus_1
-     } while (-- width != 0);
++
- }
+ &vldr_vstr rn qd imm p a w size l u
  &1op qd qm size
  &2op qd qm qn size
@@ -XXX,XX +XXX,XX @@
  @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
  @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 +@2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
  # Vector loads and stores
@@ -XXX,XX +XXX,XX @@ VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
  VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
  VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 -VQADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
 -VQADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
 -VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 -VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 +{
 +  VQADD_S_scalar  1110  1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
 +  VQADD_U_scalar  1111  1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
 +  VQDMULLB_scalar 111 . 1110 0 . 11 ... 0 ... 0 1111 . 110 .... @2scalar_nosz \
 +                  size=%size_28
 +}
 +
 +{
 +  VQSUB_S_scalar  1110  1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 +  VQSUB_U_scalar  1111  1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 +  VQDMULLT_scalar 111 . 1110 0 . 11 ... 0 ... 1 1111 . 110 .... @2scalar_nosz \
 +                  size=%size_28
 +}
 +
  VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
  VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 +
  # Predicate operations
  %mask_22_13      22:1 13:3
  VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 +/*
 + * Long saturating scalar ops. As with DO_2OP_L, TYPE and H are for the
 + * input (smaller) type and LESIZE, LTYPE, LH for the output (long) type.
 + * SATMASK specifies which bits of the predicate mask matter for determining
 + * whether to propagate a saturation indication into FPSCR.QC -- for
 + * the 16x16->32 case we must check only the bit corresponding to the T or B
 + * half that we used, but for the 32x32->64 case we propagate if the mask
 + * bit is set for either half.
 + */
 +#define DO_2OP_SAT_SCALAR_L(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN, SATMASK) \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
 +                                uint32_t rm)                            \
 +    {                                                                   \
 +        LTYPE *d = vd;                                                  \
 +        TYPE *n = vn;                                                   \
 +        TYPE m = rm;                                                    \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned le;                                                    \
 +        bool qc = false;                                                \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
 +            bool sat = false;                                           \
 +            LTYPE r = FN((LTYPE)n[H##ESIZE(le * 2 + TOP)], m, &sat);    \
 +            mergemask(&d[H##LESIZE(le)], r, mask);                      \
 +            qc |= sat && (mask & SATMASK);                              \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +static inline int32_t do_qdmullh(int16_t n, int16_t m, bool *sat)
 +{
 +    int64_t r = ((int64_t)n * m) * 2;
 +    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat);
 +}
 +
 +static inline int64_t do_qdmullw(int32_t n, int32_t m, bool *sat)
 +{
 +    /* The multiply can't overflow, but the doubling might */
 +    int64_t r = (int64_t)n * m;
 +    if (r > INT64_MAX / 2) {
 +        *sat = true;
 +        return INT64_MAX;
 +    } else if (r < INT64_MIN / 2) {
 +        *sat = true;
 +        return INT64_MIN;
 +    } else {
 +        return r * 2;
 +    }
 +}
 +
 +#define SATMASK16B 1
 +#define SATMASK16T (1 << 2)
 +#define SATMASK32 ((1 << 4) | 1)
 +
 +DO_2OP_SAT_SCALAR_L(vqdmullb_scalarh, 0, 2, int16_t, 4, int32_t, \
 +                    do_qdmullh, SATMASK16B)
 +DO_2OP_SAT_SCALAR_L(vqdmullb_scalarw, 0, 4, int32_t, 8, int64_t, \
 +                    do_qdmullw, SATMASK32)
 +DO_2OP_SAT_SCALAR_L(vqdmullt_scalarh, 1, 2, int16_t, 4, int32_t, \
 +                    do_qdmullh, SATMASK16T)
 +DO_2OP_SAT_SCALAR_L(vqdmullt_scalarw, 1, 4, int32_t, 8, int64_t, \
 +                    do_qdmullw, SATMASK32)
 +
  static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
  {
      m &= 0xff;
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
  DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
  DO_2OP_SCALAR(VBRSR, vbrsr)
 +static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
 +{
 +    static MVEGenTwoOpScalarFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vqdmullb_scalarh,
 +        gen_helper_mve_vqdmullb_scalarw,
 +        NULL,
 +    };
 +    if (a->qd == a->qn && a->size == MO_32) {
 +        /* UNPREDICTABLE; we choose to undef */
 +        return false;
 +    }
 +    return do_2op_scalar(s, a, fns[a->size]);
 +}
 +
 +static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
 +{
 +    static MVEGenTwoOpScalarFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vqdmullt_scalarh,
 +        gen_helper_mve_vqdmullt_scalarw,
 +        NULL,
 +    };
 +    if (a->qd == a->qn && a->size == MO_32) {
 +        /* UNPREDICTABLE; we choose to undef */
 +        return false;
 +    }
 +    return do_2op_scalar(s, a, fns[a->size]);
 +}
 +
  static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                               MVEGenDualAccOpFn *fn)
  {
 --
 .20.1

-[PULL 10/49] target/arm: Speed up aarch64 TBL/TBX
+[PULL 41/57] target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the vector forms of the MVE VQDMULH and VQRDMULH insns.
-Always perform one call instead of two for 16-byte operands.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Use byte loads/stores directly into the vector register file
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-instead of extractions and deposits to a 64-bit local variable.
+Message-id: 20210617121628.20116-31-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  8 ++++++++
  target/arm/mve.decode      |  3 +++
  target/arm/mve_helper.c    | 27 +++++++++++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 40 insertions(+)
-In order to easily receive pointers into the vector register file,
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 convert the helper to the gvec out-of-line signature.  Move the
 helper into vec_helper.c, where it can make use of H1 and clear_tail.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Tested-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20210224230532.276878-1-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper-a64.h    |  2 +-
  target/arm/helper-a64.c    | 32 ---------------------
  target/arm/translate-a64.c | 58 +++++---------------------------------
  target/arm/vec_helper.c    | 48 +++++++++++++++++++++++++++++++
 files changed, 56 insertions(+), 84 deletions(-)
 diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-a64.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper-a64.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
+ DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_3(vfp_cmpd_a64, i64, f64, f64, ptr)
+ DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_3(vfp_cmped_a64, i64, f64, f64, ptr)
--DEF_HELPER_FLAGS_5(simd_tbl, TCG_CALL_NO_RWG_SE, i64, env, i64, i64, i32, i32)
++DEF_HELPER_FLAGS_4(mve_vqdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+DEF_HELPER_FLAGS_4(simd_tblx, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vqdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_3(vfp_mulxs, TCG_CALL_NO_RWG, f32, f32, f32, ptr)
++DEF_HELPER_FLAGS_4(mve_vqdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DEF_HELPER_FLAGS_3(vfp_mulxd, TCG_CALL_NO_RWG, f64, f64, f64, ptr)
++
- DEF_HELPER_FLAGS_3(neon_ceq_f64, TCG_CALL_NO_RWG, i64, i64, i64, ptr)
++DEF_HELPER_FLAGS_4(mve_vqrdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
++DEF_HELPER_FLAGS_4(mve_vqrdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-a64.c
+--- a/target/arm/mve.decode
-+++ b/target/arm/helper-a64.c
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ float64 HELPER(vfp_mulxd)(float64 a, float64 b, void *fpstp)
+@@ -XXX,XX +XXX,XX @@ VMULL_BU         111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
-     return float64_mul(a, b, fpst);
+ VMULL_TS         111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
- }
+ VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
--uint64_t HELPER(simd_tbl)(CPUARMState *env, uint64_t result, uint64_t indices,
++VQDMULH          1110 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
--                          uint32_t rn, uint32_t numregs)
++VQRDMULH         1111 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
--{
++
--    /* Helper function for SIMD TBL and TBX. We have to do the table
+ # Vector miscellaneous
--     * lookup part for the 64 bits worth of indices we're passed in.
--     * result is the initial results vector (either zeroes for TBL
+ VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
--     * or some guest values for TBX), rn the register number where
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 -     * the table starts, and numregs the number of registers in the table.
 -     * We return the results of the lookups.
 -     */
 -    int shift;
 -
 -    for (shift = 0; shift < 64; shift += 8) {
 -        int index = extract64(indices, shift, 8);
 -        if (index < 16 * numregs) {
 -            /* Convert index (a byte offset into the virtual table
 -             * which is a series of 128-bit vectors concatenated)
 -             * into the correct register element plus a bit offset
 -             * into that element, bearing in mind that the table
 -             * can wrap around from V31 to V0.
 -             */
 -            int elt = (rn * 2 + (index >> 3)) % 64;
 -            int bitidx = (index & 7) * 8;
 -            uint64_t *q = aa64_vfp_qreg(env, elt >> 1);
 -            uint64_t val = extract64(q[elt & 1], bitidx, 8);
 -
 -            result = deposit64(result, shift, 8, val);
 -        }
 -    }
 -    return result;
 -}
 -
  /* 64bit/double versions of the neon float compare functions */
  uint64_t HELPER(neon_ceq_f64)(float64 a, float64 b, void *fpstp)
  {
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/mve_helper.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_tb(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
-     int rm = extract32(insn, 16, 5);
+         mve_advance_vpt(env);                                           \
      int rn = extract32(insn, 5, 5);
      int rd = extract32(insn, 0, 5);
 -    int is_tblx = extract32(insn, 12, 1);
 -    int len = extract32(insn, 13, 2);
 -    TCGv_i64 tcg_resl, tcg_resh, tcg_idx;
 -    TCGv_i32 tcg_regno, tcg_numregs;
 +    int is_tbx = extract32(insn, 12, 1);
 +    int len = (extract32(insn, 13, 2) + 1) * 16;
      if (op2 != 0) {
          unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_tb(DisasContext *s, uint32_t insn)
          return;
      }
--    /* This does a table lookup: for every byte element in the input
++#define DO_2OP_SAT(OP, ESIZE, TYPE, FN)                                 \
--     * we index into a table formed from up to four vector registers,
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \
--     * and then the output is the result of the lookups. Our helper
++    {                                                                   \
--     * function does the lookup operation for a single 64 bit part of
++        TYPE *d = vd, *n = vn, *m = vm;                                 \
--     * the input.
++        uint16_t mask = mve_element_mask(env);                          \
--     */
++        unsigned e;                                                     \
--    tcg_resl = tcg_temp_new_i64();
++        bool qc = false;                                                \
--    tcg_resh = NULL;
++        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
--
++            bool sat = false;                                           \
--    if (is_tblx) {
++            TYPE r = FN(n[H##ESIZE(e)], m[H##ESIZE(e)], &sat);          \
--        read_vec_element(s, tcg_resl, rd, 0, MO_64);
++            mergemask(&d[H##ESIZE(e)], r, mask);                        \
--    } else {
++            qc |= sat & mask & 1;                                       \
--        tcg_gen_movi_i64(tcg_resl, 0);
++        }                                                               \
--    }
++        if (qc) {                                                       \
--
++            env->vfp.qc[0] = qc;                                        \
--    if (is_q) {
++        }                                                               \
--        tcg_resh = tcg_temp_new_i64();
++        mve_advance_vpt(env);                                           \
 -        if (is_tblx) {
 -            read_vec_element(s, tcg_resh, rd, 1, MO_64);
 -        } else {
 -            tcg_gen_movi_i64(tcg_resh, 0);
 -        }
 -    }
 -
 -    tcg_idx = tcg_temp_new_i64();
 -    tcg_regno = tcg_const_i32(rn);
 -    tcg_numregs = tcg_const_i32(len + 1);
 -    read_vec_element(s, tcg_idx, rm, 0, MO_64);
 -    gen_helper_simd_tbl(tcg_resl, cpu_env, tcg_resl, tcg_idx,
 -                        tcg_regno, tcg_numregs);
 -    if (is_q) {
 -        read_vec_element(s, tcg_idx, rm, 1, MO_64);
 -        gen_helper_simd_tbl(tcg_resh, cpu_env, tcg_resh, tcg_idx,
 -                            tcg_regno, tcg_numregs);
 -    }
 -    tcg_temp_free_i64(tcg_idx);
 -    tcg_temp_free_i32(tcg_regno);
 -    tcg_temp_free_i32(tcg_numregs);
 -
 -    write_vec_element(s, tcg_resl, rd, 0, MO_64);
 -    tcg_temp_free_i64(tcg_resl);
 -
 -    if (is_q) {
 -        write_vec_element(s, tcg_resh, rd, 1, MO_64);
 -        tcg_temp_free_i64(tcg_resh);
 -    }
 -    clear_vec_high(s, is_q, rd);
 +    tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
 +                       vec_full_reg_offset(s, rm), cpu_env,
 +                       is_q ? 16 : 8, vec_full_reg_size(s),
 +                       (len << 6) | (is_tbx << 5) | rn,
 +                       gen_helper_simd_tblx);
  }
  /* ZIP/UZP/TRN
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VRINT_RMODE(gvec_vrint_rm_h, helper_rinth, uint16_t)
  DO_VRINT_RMODE(gvec_vrint_rm_s, helper_rints, uint32_t)
  #undef DO_VRINT_RMODE
 +
 +#ifdef TARGET_AARCH64
 +void HELPER(simd_tblx)(void *vd, void *vm, void *venv, uint32_t desc)
 +{
 +    const uint8_t *indices = vm;
 +    CPUARMState *env = venv;
 +    size_t oprsz = simd_oprsz(desc);
 +    uint32_t rn = extract32(desc, SIMD_DATA_SHIFT, 5);
 +    bool is_tbx = extract32(desc, SIMD_DATA_SHIFT + 5, 1);
 +    uint32_t table_len = desc >> (SIMD_DATA_SHIFT + 6);
 +    union {
 +        uint8_t b[16];
 +        uint64_t d[2];
 +    } result;
 +
 +    /*
 +     * We must construct the final result in a temp, lest the output
 +     * overlaps the input table.  For TBL, begin with zero; for TBX,
 +     * begin with the original register contents.  Note that we always
 +     * copy 16 bytes here to avoid an extra branch; clearing the high
 +     * bits of the register for oprsz == 8 is handled below.
 +     */
 +    if (is_tbx) {
 +        memcpy(&result, vd, 16);
 +    } else {
 +        memset(&result, 0, 16);
 +    }
 +
-+    for (size_t i = 0; i < oprsz; ++i) {
+ #define DO_AND(N, M)  ((N) & (M))
-+        uint32_t index = indices[H1(i)];
+ #define DO_BIC(N, M)  ((N) & ~(M))
  #define DO_ORR(N, M)  ((N) | (M))
@@ -XXX,XX +XXX,XX @@ static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  #define DO_QRDMULH_W(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 30)) >> 31, \
                                           INT32_MIN, INT32_MAX, s)
 +DO_2OP_SAT(vqdmulhb, 1, int8_t, DO_QDMULH_B)
 +DO_2OP_SAT(vqdmulhh, 2, int16_t, DO_QDMULH_H)
 +DO_2OP_SAT(vqdmulhw, 4, int32_t, DO_QDMULH_W)
 +
-+        if (index < table_len) {
++DO_2OP_SAT(vqrdmulhb, 1, int8_t, DO_QRDMULH_B)
-+            /*
++DO_2OP_SAT(vqrdmulhh, 2, int16_t, DO_QRDMULH_H)
-+             * Convert index (a byte offset into the virtual table
++DO_2OP_SAT(vqrdmulhw, 4, int32_t, DO_QRDMULH_W)
 +             * which is a series of 128-bit vectors concatenated)
 +             * into the correct register element, bearing in mind
 +             * that the table can wrap around from V31 to V0.
 +             */
 +            const uint8_t *table = (const uint8_t *)
 +                aa64_vfp_qreg(env, (rn + (index >> 4)) % 32);
 +            result.b[H1(i)] = table[H1(index % 16)];
 +        }
 +    }
 +
-+    memcpy(vd, &result, 16);
+ #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
-+    clear_tail(vd, oprsz, simd_maxsz(desc));
+     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
-+}
+                                 uint32_t rm)                            \
-+#endif
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULL_BS, vmullbs)
  DO_2OP(VMULL_BU, vmullbu)
  DO_2OP(VMULL_TS, vmullts)
  DO_2OP(VMULL_TU, vmulltu)
 +DO_2OP(VQDMULH, vqdmulh)
 +DO_2OP(VQRDMULH, vqrdmulh)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
 .20.1

-[PULL 21/49] hw/display/omap_lcdc: Drop broken bigendian ifdef
+[PULL 42/57] target/arm: Implement MVE VQADD, VQSUB (vector)
-The draw_line16_32() function in the omap_lcdc template header
+Implement the vector forms of the MVE VQADD and VQSUB insns.
 includes an ifdef for the case where HOST_WORDS_BIGENDIAN matches
 TARGET_WORDS_BIGENDIAN.  This is trying to optimise for "source
 bitmap and destination bitmap format match", but it is broken,
 because in this function the formats don't match: the source is
 -bit colour and the destination is 32-bit colour, so a memcpy()
 will produce corrupted graphics output.  Drop the bogus ifdef.
-This bug was introduced in commit ea644cf343129, when we dropped
-support for DEPTH values other than 32 from the template header.
-The old #if line was
-  #if DEPTH == 16 && defined(HOST_WORDS_BIGENDIAN) == defined(TARGET_WORDS_BIGENDIAN)
-and this was mistakenly changed to
-  #if defined(HOST_WORDS_BIGENDIAN) == defined(TARGET_WORDS_BIGENDIAN)
-rather than deleting the #if as now having an always-false condition.
-Fixes: ea644cf343129
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210617121628.20116-32-peter.maydell@linaro.org
 Message-id: 20210215103215.4944-7-peter.maydell@linaro.org
 ---
- hw/display/omap_lcd_template.h | 4 ----
+ target/arm/helper-mve.h    | 16 ++++++++++++++++
-file changed, 4 deletions(-)
+ target/arm/mve.decode      |  5 +++++
  target/arm/mve_helper.c    | 14 ++++++++++++++
  target/arm/translate-mve.c |  4 ++++
 files changed, 39 insertions(+)
-diff --git a/hw/display/omap_lcd_template.h b/hw/display/omap_lcd_template.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/omap_lcd_template.h
+--- a/target/arm/helper-mve.h
-+++ b/hw/display/omap_lcd_template.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void draw_line12_32(void *opaque, uint8_t *d, const uint8_t *s,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- static void draw_line16_32(void *opaque, uint8_t *d, const uint8_t *s,
+ DEF_HELPER_FLAGS_4(mve_vqrdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-                            int width, int deststep)
+ DEF_HELPER_FLAGS_4(mve_vqrdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- {
--#if defined(HOST_WORDS_BIGENDIAN) == defined(TARGET_WORDS_BIGENDIAN)
++DEF_HELPER_FLAGS_4(mve_vqaddsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--    memcpy(d, s, width * 2);
++DEF_HELPER_FLAGS_4(mve_vqaddsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--#else
++DEF_HELPER_FLAGS_4(mve_vqaddsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     uint16_t v;
++
-     uint8_t r, g, b;
++DEF_HELPER_FLAGS_4(mve_vqaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++DEF_HELPER_FLAGS_4(mve_vqadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@ static void draw_line16_32(void *opaque, uint8_t *d, const uint8_t *s,
++DEF_HELPER_FLAGS_4(mve_vqadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         s += 2;
++
-         d += 4;
++DEF_HELPER_FLAGS_4(mve_vqsubsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     } while (-- width != 0);
++DEF_HELPER_FLAGS_4(mve_vqsubsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--#endif
++DEF_HELPER_FLAGS_4(mve_vqsubsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- }
++
 +DEF_HELPER_FLAGS_4(mve_vqsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
  VQDMULH          1110 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
  VQRDMULH         1111 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
 +VQADD_S          111 0 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
 +VQADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
 +VQSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 +VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqrdmulhb, 1, int8_t, DO_QRDMULH_B)
  DO_2OP_SAT(vqrdmulhh, 2, int16_t, DO_QRDMULH_H)
  DO_2OP_SAT(vqrdmulhw, 4, int32_t, DO_QRDMULH_W)
 +DO_2OP_SAT(vqaddub, 1, uint8_t, DO_UQADD_B)
 +DO_2OP_SAT(vqadduh, 2, uint16_t, DO_UQADD_H)
 +DO_2OP_SAT(vqadduw, 4, uint32_t, DO_UQADD_W)
 +DO_2OP_SAT(vqaddsb, 1, int8_t, DO_SQADD_B)
 +DO_2OP_SAT(vqaddsh, 2, int16_t, DO_SQADD_H)
 +DO_2OP_SAT(vqaddsw, 4, int32_t, DO_SQADD_W)
 +
 +DO_2OP_SAT(vqsubub, 1, uint8_t, DO_UQSUB_B)
 +DO_2OP_SAT(vqsubuh, 2, uint16_t, DO_UQSUB_H)
 +DO_2OP_SAT(vqsubuw, 4, uint32_t, DO_UQSUB_W)
 +DO_2OP_SAT(vqsubsb, 1, int8_t, DO_SQSUB_B)
 +DO_2OP_SAT(vqsubsh, 2, int16_t, DO_SQSUB_H)
 +DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
 +
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                  uint32_t rm)                            \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULL_TS, vmullts)
  DO_2OP(VMULL_TU, vmulltu)
  DO_2OP(VQDMULH, vqdmulh)
  DO_2OP(VQRDMULH, vqrdmulh)
 +DO_2OP(VQADD_S, vqadds)
 +DO_2OP(VQADD_U, vqaddu)
 +DO_2OP(VQSUB_S, vqsubs)
 +DO_2OP(VQSUB_U, vqsubu)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
 .20.1

-[PULL 09/49] hw/arm/xlnx-zynqmp: Remove obsolete 'has_rpu' property
+[PULL 43/57] target/arm: Implement MVE VQSHL (vector)
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Implement the MVE VQSHL insn (encoding T4, which is the
 vector-shift-by-vector version).
-We hint the 'has_rpu' property is no longer required since commit
+The DO_SQSHL_OP and DO_UQSHL_OP macros here are derived from
-ec448b4 ("xlnx-zynqmp: Properly support the smp command line
+the neon_helper.c code for qshl_u{8,16,32} and qshl_s{8,16,32}.
 option") which was released in QEMU v2.11.0.
-Beside, this device is marked 'user_creatable = false', so the
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-only thing that could be setting the property is the board code
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-that creates the device.
+Message-id: 20210617121628.20116-33-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  8 ++++++++
  target/arm/mve.decode      | 12 ++++++++++++
  target/arm/mve_helper.c    | 34 ++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 56 insertions(+)
-Since the property is not user-facing, we can remove it without
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 going through the deprecation process.
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20210219144350.1979905-1-f4bug@amsat.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  include/hw/arm/xlnx-zynqmp.h | 2 --
  hw/arm/xlnx-zynqmp.c         | 6 ------
 files changed, 8 deletions(-)
 diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/xlnx-zynqmp.h
+--- a/target/arm/helper-mve.h
-+++ b/include/hw/arm/xlnx-zynqmp.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ struct XlnxZynqMPState {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     bool secure;
+ DEF_HELPER_FLAGS_4(mve_vqsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     /* Has the ARM Virtualization extensions?  */
+ DEF_HELPER_FLAGS_4(mve_vqsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     bool virt;
--    /* Has the RPU subsystem?  */
++DEF_HELPER_FLAGS_4(mve_vqshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--    bool has_rpu;
++DEF_HELPER_FLAGS_4(mve_vqshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++DEF_HELPER_FLAGS_4(mve_vqshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     /* CAN bus. */
++
-     CanBusState *canbus[XLNX_ZYNQMP_NUM_CAN];
++DEF_HELPER_FLAGS_4(mve_vqshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
++DEF_HELPER_FLAGS_4(mve_vqshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-zynqmp.c
+--- a/target/arm/mve.decode
-+++ b/hw/arm/xlnx-zynqmp.c
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@
-         }
+ @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
  @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
 +# The _rev suffix indicates that Vn and Vm are reversed. This is
 +# the case for shifts. In the Arm ARM these insns are documented
 +# with the Vm and Vn fields in their usual places, but in the
 +# assembly the operands are listed "backwards", ie in the order
 +# Qd, Qm, Qn where other insns use Qd, Qn, Qm. For QEMU we choose
 +# to consider Vm and Vn as being in different fields in the insn.
 +# This gives us consistency with A64 and Neon.
 +@2op_rev .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qn qn=%qm
 +
  @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
  @2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
@@ -XXX,XX +XXX,XX @@ VQADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
  VQSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
  VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 +VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 +VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
          mve_advance_vpt(env);                                           \
      }
--    if (s->has_rpu) {
++/* provide unsigned 2-op helpers for all sizes */
--        info_report("The 'has_rpu' property is no longer required, to use the "
++#define DO_2OP_SAT_U(OP, FN)                    \
--                    "RPUs just use -smp 6.");
++    DO_2OP_SAT(OP##b, 1, uint8_t, FN)           \
--    }
++    DO_2OP_SAT(OP##h, 2, uint16_t, FN)          \
--
++    DO_2OP_SAT(OP##w, 4, uint32_t, FN)
-     xlnx_zynqmp_create_rpu(ms, s, boot_cpu, &err);
++
-     if (err) {
++/* provide signed 2-op helpers for all sizes */
-         error_propagate(errp, err);
++#define DO_2OP_SAT_S(OP, FN)                    \
-@@ -XXX,XX +XXX,XX @@ static Property xlnx_zynqmp_props[] = {
++    DO_2OP_SAT(OP##b, 1, int8_t, FN)            \
-     DEFINE_PROP_STRING("boot-cpu", XlnxZynqMPState, boot_cpu),
++    DO_2OP_SAT(OP##h, 2, int16_t, FN)           \
-     DEFINE_PROP_BOOL("secure", XlnxZynqMPState, secure, false),
++    DO_2OP_SAT(OP##w, 4, int32_t, FN)
-     DEFINE_PROP_BOOL("virtualization", XlnxZynqMPState, virt, false),
++
--    DEFINE_PROP_BOOL("has_rpu", XlnxZynqMPState, has_rpu, false),
+ #define DO_AND(N, M)  ((N) & (M))
-     DEFINE_PROP_LINK("ddr-ram", XlnxZynqMPState, ddr_ram, TYPE_MEMORY_REGION,
+ #define DO_BIC(N, M)  ((N) & ~(M))
-                      MemoryRegion *),
+ #define DO_ORR(N, M)  ((N) | (M))
-     DEFINE_PROP_LINK("canbus0", XlnxZynqMPState, canbus[0], TYPE_CAN_BUS,
+@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqsubsb, 1, int8_t, DO_SQSUB_B)
  DO_2OP_SAT(vqsubsh, 2, int16_t, DO_SQSUB_H)
  DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
 +/*
 + * This wrapper fixes up the impedance mismatch between do_sqrshl_bhs()
 + * and friends wanting a uint32_t* sat and our needing a bool*.
 + */
 +#define WRAP_QRSHL_HELPER(FN, N, M, ROUND, satp)                        \
 +    ({                                                                  \
 +        uint32_t su32 = 0;                                              \
 +        typeof(N) r = FN(N, (int8_t)(M), sizeof(N) * 8, ROUND, &su32);  \
 +        if (su32) {                                                     \
 +            *satp = true;                                               \
 +        }                                                               \
 +        r;                                                              \
 +    })
 +
 +#define DO_SQSHL_OP(N, M, satp) \
 +    WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, false, satp)
 +#define DO_UQSHL_OP(N, M, satp) \
 +    WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, false, satp)
 +
 +DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
 +DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
 +
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                  uint32_t rm)                            \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQADD_S, vqadds)
  DO_2OP(VQADD_U, vqaddu)
  DO_2OP(VQSUB_S, vqsubs)
  DO_2OP(VQSUB_U, vqsubu)
 +DO_2OP(VQSHL_S, vqshls)
 +DO_2OP(VQSHL_U, vqshlu)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
 .20.1

-[PULL 19/49] hw/display/tc6393xb: Inline tc6393xb_draw_graphic32() at its callsite
+[PULL 44/57] target/arm: Implement MVE VQRSHL
-The function tc6393xb_draw_graphic32() is called in exactly one place,
+Implement the MV VQRSHL (vector) insn.  Again, the code to perform
-so just inline the function body at its callsite. This allows us to
+the actual shifts is borrowed from neon_helper.c.
 drop the template header entirely.
 The code move includes a single added space after 'for' to fix
 the coding style.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210617121628.20116-34-peter.maydell@linaro.org
 Message-id: 20210215103215.4944-5-peter.maydell@linaro.org
 ---
- hw/display/tc6393xb_template.h | 45 ----------------------------------
+ target/arm/helper-mve.h    | 8 ++++++++
- hw/display/tc6393xb.c          | 23 ++++++++++++++---
+ target/arm/mve.decode      | 3 +++
-files changed, 19 insertions(+), 49 deletions(-)
+ target/arm/mve_helper.c    | 6 ++++++
- delete mode 100644 hw/display/tc6393xb_template.h
+ target/arm/translate-mve.c | 2 ++
 files changed, 19 insertions(+)
-diff --git a/hw/display/tc6393xb_template.h b/hw/display/tc6393xb_template.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 deleted file mode 100644
 index XXXXXXX..XXXXXXX
 --- a/hw/display/tc6393xb_template.h
 +++ /dev/null
@@ -XXX,XX +XXX,XX @@
 -/*
 - * Toshiba TC6393XB I/O Controller.
 - * Found in Sharp Zaurus SL-6000 (tosa) or some
 - * Toshiba e-Series PDAs.
 - *
 - * FB support code. Based on G364 fb emulator
 - *
 - * Copyright (c) 2007 Hervé Poussineau
 - *
 - * This program is free software; you can redistribute it and/or
 - * modify it under the terms of the GNU General Public License as
 - * published by the Free Software Foundation; either version 2 of
 - * the License, or (at your option) any later version.
 - *
 - * This program is distributed in the hope that it will be useful,
 - * but WITHOUT ANY WARRANTY; without even the implied warranty of
 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 - * GNU General Public License for more details.
 - *
 - * You should have received a copy of the GNU General Public License along
 - * with this program; if not, see <http://www.gnu.org/licenses/>.
 - */
 -
 -static void tc6393xb_draw_graphic32(TC6393xbState *s)
 -{
 -    DisplaySurface *surface = qemu_console_surface(s->con);
 -    int i;
 -    uint16_t *data_buffer;
 -    uint8_t *data_display;
 -
 -    data_buffer = s->vram_ptr;
 -    data_display = surface_data(surface);
 -    for(i = 0; i < s->scr_height; i++) {
 -        int j;
 -        for (j = 0; j < s->scr_width; j++, data_display += 4, data_buffer++) {
 -            uint16_t color = *data_buffer;
 -            uint32_t dest_color = rgb_to_pixel32(
 -                           ((color & 0xf800) * 0x108) >> 11,
 -                           ((color & 0x7e0) * 0x41) >> 9,
 -                           ((color & 0x1f) * 0x21) >> 2
 -                           );
 -            *(uint32_t *)data_display = dest_color;
 -        }
 -    }
 -}
 diff --git a/hw/display/tc6393xb.c b/hw/display/tc6393xb.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/tc6393xb.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/display/tc6393xb.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void tc6393xb_nand_writeb(TC6393xbState *s, hwaddr addr, uint32_t value)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-                                         (uint32_t) addr, value & 0xff);
+ DEF_HELPER_FLAGS_4(mve_vqshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- }
+ DEF_HELPER_FLAGS_4(mve_vqshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--#define BITS 32
++DEF_HELPER_FLAGS_4(mve_vqrshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--#include "tc6393xb_template.h"
++DEF_HELPER_FLAGS_4(mve_vqrshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--
++DEF_HELPER_FLAGS_4(mve_vqrshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  static void tc6393xb_draw_graphic(TC6393xbState *s, int full_update)
  {
 -    tc6393xb_draw_graphic32(s);
 +    DisplaySurface *surface = qemu_console_surface(s->con);
 +    int i;
 +    uint16_t *data_buffer;
 +    uint8_t *data_display;
 +
-+    data_buffer = s->vram_ptr;
++DEF_HELPER_FLAGS_4(mve_vqrshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    data_display = surface_data(surface);
++DEF_HELPER_FLAGS_4(mve_vqrshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    for (i = 0; i < s->scr_height; i++) {
++DEF_HELPER_FLAGS_4(mve_vqrshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+        int j;
++
-+        for (j = 0; j < s->scr_width; j++, data_display += 4, data_buffer++) {
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+            uint16_t color = *data_buffer;
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+            uint32_t dest_color = rgb_to_pixel32(
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+                           ((color & 0xf800) * 0x108) >> 11,
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-+                           ((color & 0x7e0) * 0x41) >> 9,
+index XXXXXXX..XXXXXXX 100644
-+                           ((color & 0x1f) * 0x21) >> 2
+--- a/target/arm/mve.decode
-+                           );
++++ b/target/arm/mve.decode
-+            *(uint32_t *)data_display = dest_color;
+@@ -XXX,XX +XXX,XX @@ VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
-+        }
+ VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
-+    }
+ VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
-     dpy_gfx_update_full(s->con);
- }
++VQRSHL_S         111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
++VQRSHL_U         111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
      WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, false, satp)
  #define DO_UQSHL_OP(N, M, satp) \
      WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, false, satp)
 +#define DO_SQRSHL_OP(N, M, satp) \
 +    WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, true, satp)
 +#define DO_UQRSHL_OP(N, M, satp) \
 +    WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, true, satp)
  DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
  DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
 +DO_2OP_SAT_S(vqrshls, DO_SQRSHL_OP)
 +DO_2OP_SAT_U(vqrshlu, DO_UQRSHL_OP)
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQSUB_S, vqsubs)
  DO_2OP(VQSUB_U, vqsubu)
  DO_2OP(VQSHL_S, vqshls)
  DO_2OP(VQSHL_U, vqshlu)
 +DO_2OP(VQRSHL_S, vqrshls)
 +DO_2OP(VQRSHL_U, vqrshlu)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
 .20.1

-[PULL 18/49] hw/display/tc6393xb: Expand out macros in template header
+[PULL 45/57] target/arm: Implement MVE VSHL insn
-Now the template header is included only for BITS==32, expand
+Implement the MVE VSHL insn (vector form).
 out all the macros that depended on the BITS setting.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215103215.4944-4-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-35-peter.maydell@linaro.org
 ---
- hw/display/tc6393xb_template.h | 35 ++++------------------------------
+ target/arm/helper-mve.h    | 8 ++++++++
-file changed, 4 insertions(+), 31 deletions(-)
+ target/arm/mve.decode      | 3 +++
  target/arm/mve_helper.c    | 6 ++++++
  target/arm/translate-mve.c | 2 ++
 files changed, 19 insertions(+)
-diff --git a/hw/display/tc6393xb_template.h b/hw/display/tc6393xb_template.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/tc6393xb_template.h
+--- a/target/arm/helper-mve.h
-+++ b/hw/display/tc6393xb_template.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-  * with this program; if not, see <http://www.gnu.org/licenses/>.
+ DEF_HELPER_FLAGS_4(mve_vqsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-  */
+ DEF_HELPER_FLAGS_4(mve_vqsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--#if BITS == 8
++DEF_HELPER_FLAGS_4(mve_vshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--# define SET_PIXEL(addr, color)  (*(uint8_t *)addr = color)
++DEF_HELPER_FLAGS_4(mve_vshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--#elif BITS == 15 || BITS == 16
++DEF_HELPER_FLAGS_4(mve_vshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--# define SET_PIXEL(addr, color)  (*(uint16_t *)addr = color)
++
--#elif BITS == 24
++DEF_HELPER_FLAGS_4(mve_vshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--# define SET_PIXEL(addr, color)  \
++DEF_HELPER_FLAGS_4(mve_vshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--    do {                         \
++DEF_HELPER_FLAGS_4(mve_vshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--        addr[0] = color;         \
++
--        addr[1] = (color) >> 8;  \
+ DEF_HELPER_FLAGS_4(mve_vqshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--        addr[2] = (color) >> 16; \
+ DEF_HELPER_FLAGS_4(mve_vqshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--    } while (0)
+ DEF_HELPER_FLAGS_4(mve_vqshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--#elif BITS == 32
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
--# define SET_PIXEL(addr, color)  (*(uint32_t *)addr = color)
+index XXXXXXX..XXXXXXX 100644
--#else
+--- a/target/arm/mve.decode
--# error unknown bit depth
++++ b/target/arm/mve.decode
--#endif
+@@ -XXX,XX +XXX,XX @@ VQADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
--
+ VQSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
--
+ VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
--static void glue(tc6393xb_draw_graphic, BITS)(TC6393xbState *s)
-+static void tc6393xb_draw_graphic32(TC6393xbState *s)
++VSHL_S           111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
 +VSHL_U           111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
 +
  VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
  VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhaddu, do_vhadd_u)
  DO_2OP_S(vhsubs, do_vhsub_s)
  DO_2OP_U(vhsubu, do_vhsub_u)
 +#define DO_VSHLS(N, M) do_sqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, false, NULL)
 +#define DO_VSHLU(N, M) do_uqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, false, NULL)
 +
 +DO_2OP_S(vshls, DO_VSHLS)
 +DO_2OP_U(vshlu, DO_VSHLU)
 +
  static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  {
-     DisplaySurface *surface = qemu_console_surface(s->con);
+     if (val > max) {
-     int i;
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static void glue(tc6393xb_draw_graphic, BITS)(TC6393xbState *s)
+index XXXXXXX..XXXXXXX 100644
-     data_buffer = s->vram_ptr;
+--- a/target/arm/translate-mve.c
-     data_display = surface_data(surface);
++++ b/target/arm/translate-mve.c
-     for(i = 0; i < s->scr_height; i++) {
+@@ -XXX,XX +XXX,XX @@ DO_2OP(VQADD_S, vqadds)
--#if (BITS == 16)
+ DO_2OP(VQADD_U, vqaddu)
--        memcpy(data_display, data_buffer, s->scr_width * 2);
+ DO_2OP(VQSUB_S, vqsubs)
--        data_buffer += s->scr_width;
+ DO_2OP(VQSUB_U, vqsubu)
--        data_display += surface_stride(surface);
++DO_2OP(VSHL_S, vshls)
--#else
++DO_2OP(VSHL_U, vshlu)
-         int j;
+ DO_2OP(VQSHL_S, vqshls)
--        for (j = 0; j < s->scr_width; j++, data_display += BITS / 8, data_buffer++) {
+ DO_2OP(VQSHL_U, vqshlu)
-+        for (j = 0; j < s->scr_width; j++, data_display += 4, data_buffer++) {
+ DO_2OP(VQRSHL_S, vqrshls)
              uint16_t color = *data_buffer;
 -            uint32_t dest_color = glue(rgb_to_pixel, BITS)(
 +            uint32_t dest_color = rgb_to_pixel32(
                             ((color & 0xf800) * 0x108) >> 11,
                             ((color & 0x7e0) * 0x41) >> 9,
                             ((color & 0x1f) * 0x21) >> 2
                             );
 -            SET_PIXEL(data_display, dest_color);
 +            *(uint32_t *)data_display = dest_color;
          }
 -#endif
      }
  }
 -
 -#undef BITS
 -#undef SET_PIXEL
 --
 .20.1

-[PULL 17/49] hw/display/tc6393xb: Remove dead code for handling non-32bpp surfaces
+[PULL 46/57] target/arm: Implement MVE VRSHL
-For a long time now the UI layer has guaranteed that the console
+Implement the MVE VRSHL insn (vector form).
 surface is always 32 bits per pixel RGB. Remove the legacy dead
 code from the tc6393xb display device which was handling the
 possibility that the console surface was some other format.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215103215.4944-3-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-36-peter.maydell@linaro.org
 ---
- include/ui/console.h  | 10 ----------
+ target/arm/helper-mve.h    | 8 ++++++++
- hw/display/tc6393xb.c | 33 +--------------------------------
+ target/arm/mve.decode      | 3 +++
-files changed, 1 insertion(+), 42 deletions(-)
+ target/arm/mve_helper.c    | 4 ++++
  target/arm/translate-mve.c | 2 ++
 files changed, 17 insertions(+)
-diff --git a/include/ui/console.h b/include/ui/console.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/ui/console.h
+--- a/target/arm/helper-mve.h
-+++ b/include/ui/console.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ PixelFormat qemu_default_pixelformat(int bpp);
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- DisplaySurface *qemu_create_displaysurface(int width, int height);
+ DEF_HELPER_FLAGS_4(mve_vshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- void qemu_free_displaysurface(DisplaySurface *surface);
+ DEF_HELPER_FLAGS_4(mve_vshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--static inline int is_surface_bgr(DisplaySurface *surface)
++DEF_HELPER_FLAGS_4(mve_vrshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--{
++DEF_HELPER_FLAGS_4(mve_vrshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--    if (PIXMAN_FORMAT_BPP(surface->format) == 32 &&
++DEF_HELPER_FLAGS_4(mve_vrshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--        PIXMAN_FORMAT_TYPE(surface->format) == PIXMAN_TYPE_ABGR) {
++
--        return 1;
++DEF_HELPER_FLAGS_4(mve_vrshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--    } else {
++DEF_HELPER_FLAGS_4(mve_vrshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--        return 0;
++DEF_HELPER_FLAGS_4(mve_vrshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--    }
++
--}
+ DEF_HELPER_FLAGS_4(mve_vqshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
--
+ DEF_HELPER_FLAGS_4(mve_vqshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- static inline int is_buffer_shared(DisplaySurface *surface)
+ DEF_HELPER_FLAGS_4(mve_vqshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
  VSHL_S           111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
  VSHL_U           111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
 +VRSHL_S          111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 0 ... 0 @2op_rev
 +VRSHL_U          111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 0 ... 0 @2op_rev
 +
  VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
  VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhsubu, do_vhsub_u)
  #define DO_VSHLS(N, M) do_sqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, false, NULL)
  #define DO_VSHLU(N, M) do_uqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, false, NULL)
 +#define DO_VRSHLS(N, M) do_sqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, true, NULL)
 +#define DO_VRSHLU(N, M) do_uqrshl_bhs(N, (int8_t)(M), sizeof(N) * 8, true, NULL)
  DO_2OP_S(vshls, DO_VSHLS)
  DO_2OP_U(vshlu, DO_VSHLU)
 +DO_2OP_S(vrshls, DO_VRSHLS)
 +DO_2OP_U(vrshlu, DO_VRSHLU)
  static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  {
-     return !(surface->flags & QEMU_ALLOCATED_FLAG);
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 diff --git a/hw/display/tc6393xb.c b/hw/display/tc6393xb.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/tc6393xb.c
+--- a/target/arm/translate-mve.c
-+++ b/hw/display/tc6393xb.c
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static void tc6393xb_nand_writeb(TC6393xbState *s, hwaddr addr, uint32_t value)
+@@ -XXX,XX +XXX,XX @@ DO_2OP(VQSUB_S, vqsubs)
-                                         (uint32_t) addr, value & 0xff);
+ DO_2OP(VQSUB_U, vqsubu)
- }
+ DO_2OP(VSHL_S, vshls)
+ DO_2OP(VSHL_U, vshlu)
--#define BITS 8
++DO_2OP(VRSHL_S, vrshls)
--#include "tc6393xb_template.h"
++DO_2OP(VRSHL_U, vrshlu)
--#define BITS 15
+ DO_2OP(VQSHL_S, vqshls)
--#include "tc6393xb_template.h"
+ DO_2OP(VQSHL_U, vqshlu)
--#define BITS 16
+ DO_2OP(VQRSHL_S, vqrshls)
 -#include "tc6393xb_template.h"
 -#define BITS 24
 -#include "tc6393xb_template.h"
  #define BITS 32
  #include "tc6393xb_template.h"
  static void tc6393xb_draw_graphic(TC6393xbState *s, int full_update)
  {
 -    DisplaySurface *surface = qemu_console_surface(s->con);
 -
 -    switch (surface_bits_per_pixel(surface)) {
 -        case 8:
 -            tc6393xb_draw_graphic8(s);
 -            break;
 -        case 15:
 -            tc6393xb_draw_graphic15(s);
 -            break;
 -        case 16:
 -            tc6393xb_draw_graphic16(s);
 -            break;
 -        case 24:
 -            tc6393xb_draw_graphic24(s);
 -            break;
 -        case 32:
 -            tc6393xb_draw_graphic32(s);
 -            break;
 -        default:
 -            printf("tc6393xb: unknown depth %d\n",
 -                   surface_bits_per_pixel(surface));
 -            return;
 -    }
 -
 +    tc6393xb_draw_graphic32(s);
      dpy_gfx_update_full(s->con);
  }
 --
 .20.1

-[PULL 41/49] hw/arm/mps2-tz: Make RAM arrangement board-specific
+[PULL 47/57] target/arm: Implement MVE VQDMLADH and VQRDMLADH
-The AN505 and AN521 have the same layout of RAM; the AN524 does not.
+Implement the MVE VQDMLADH and VQRDMLADH insns.  These multiply
-Replace the current hard-coding of where the RAM is and which parts
+elements, and then add pairs of products, double, possibly round,
-of it are behind which MPCs with a data-driven approach.
+saturate and return the high half of the result.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-17-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-37-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 175 +++++++++++++++++++++++++++++++++++++----------
+ target/arm/helper-mve.h    | 16 +++++++
-file changed, 138 insertions(+), 37 deletions(-)
+ target/arm/mve.decode      |  5 +++
  target/arm/mve_helper.c    | 89 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  4 ++
 files changed, 114 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #include "qom/object.h"
+ DEF_HELPER_FLAGS_4(mve_vqrshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ DEF_HELPER_FLAGS_4(mve_vqrshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #define MPS2TZ_NUMIRQ_MAX 92
-+#define MPS2TZ_RAM_MAX 4
++DEF_HELPER_FLAGS_4(mve_vqdmladhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++DEF_HELPER_FLAGS_4(mve_vqdmladhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- typedef enum MPS2TZFPGAType {
++DEF_HELPER_FLAGS_4(mve_vqdmladhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     FPGA_AN505,
++
-     FPGA_AN521,
++DEF_HELPER_FLAGS_4(mve_vqdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- } MPS2TZFPGAType;
++DEF_HELPER_FLAGS_4(mve_vqdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrdmladhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmladhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmladhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
  VQRSHL_S         111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
  VQRSHL_U         111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
 +VQDMLADH         1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
 +VQDMLADHX        1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
 +VQRDMLADH        1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
 +VQRDMLADHX       1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
  DO_2OP_SAT_S(vqrshls, DO_SQRSHL_OP)
  DO_2OP_SAT_U(vqrshlu, DO_UQRSHL_OP)
 +/*
-+ * Define the layout of RAM in a board, including which parts are
++ * Multiply add dual returning high half
-+ * behind which MPCs.
++ * The 'FN' here takes four inputs A, B, C, D, a 0/1 indicator of
-+ * mrindex specifies the index into mms->ram[] to use for the backing RAM;
++ * whether to add the rounding constant, and the pointer to the
-+ * -1 means "use the system RAM".
++ * saturation flag, and should do "(A * B + C * D) * 2 + rounding constant",
 + * saturate to twice the input size and return the high half; or
 + * (A * B - C * D) etc for VQDMLSDH.
 + */
-+typedef struct RAMInfo {
++#define DO_VQDMLADH_OP(OP, ESIZE, TYPE, XCHG, ROUND, FN)                \
-+    const char *name;
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
-+    uint32_t base;
++                                void *vm)                               \
-+    uint32_t size;
++    {                                                                   \
-+    int mpc; /* MPC number, -1 for "not behind an MPC" */
++        TYPE *d = vd, *n = vn, *m = vm;                                 \
-+    int mrindex;
++        uint16_t mask = mve_element_mask(env);                          \
-+    int flags;
++        unsigned e;                                                     \
-+} RAMInfo;
++        bool qc = false;                                                \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            bool sat = false;                                           \
 +            if ((e & 1) == XCHG) {                                      \
 +                TYPE r = FN(n[H##ESIZE(e)],                             \
 +                            m[H##ESIZE(e - XCHG)],                      \
 +                            n[H##ESIZE(e + (1 - 2 * XCHG))],            \
 +                            m[H##ESIZE(e + (1 - XCHG))],                \
 +                            ROUND, &sat);                               \
 +                mergemask(&d[H##ESIZE(e)], r, mask);                    \
 +                qc |= sat & mask & 1;                                   \
 +            }                                                           \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-+/*
++static int8_t do_vqdmladh_b(int8_t a, int8_t b, int8_t c, int8_t d,
-+ * Flag values:
++                            int round, bool *sat)
 + *  IS_ALIAS: this RAM area is an alias to the upstream end of the
 + *    MPC specified by its .mpc value
 + */
 +#define IS_ALIAS 1
 +
  struct MPS2TZMachineClass {
      MachineClass parent;
      MPS2TZFPGAType fpga_type;
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineClass {
      uint32_t fpgaio_num_leds; /* Number of LEDs in FPGAIO LED0 register */
      bool fpgaio_has_switches; /* Does FPGAIO have SWITCH register? */
      int numirq; /* Number of external interrupts */
 +    const RAMInfo *raminfo;
      const char *armsse_type;
  };
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
      MachineState parent;
      ARMSSE iotkit;
 -    MemoryRegion ssram[3];
 -    MemoryRegion ssram1_m;
 +    MemoryRegion ram[MPS2TZ_RAM_MAX];
      MPS2SCC scc;
      MPS2FPGAIO fpgaio;
      TZPPC ppc[5];
 -    TZMPC ssram_mpc[3];
 +    TZMPC mpc[3];
      PL022State spi[5];
      ArmSbconI2CState i2c[4];
      UnimplementedDeviceState i2s_audio;
@@ -XXX,XX +XXX,XX @@ static const uint32_t an505_oscclk[] = {
      25000000,
  };
 +static const RAMInfo an505_raminfo[] = { {
 +        .name = "ssram-0",
 +        .base = 0x00000000,
 +        .size = 0x00400000,
 +        .mpc = 0,
 +        .mrindex = 0,
 +    }, {
 +        .name = "ssram-1",
 +        .base = 0x28000000,
 +        .size = 0x00200000,
 +        .mpc = 1,
 +        .mrindex = 1,
 +    }, {
 +        .name = "ssram-2",
 +        .base = 0x28200000,
 +        .size = 0x00200000,
 +        .mpc = 2,
 +        .mrindex = 2,
 +    }, {
 +        .name = "ssram-0-alias",
 +        .base = 0x00400000,
 +        .size = 0x00400000,
 +        .mpc = 0,
 +        .mrindex = 3,
 +        .flags = IS_ALIAS,
 +    }, {
 +        /* Use the largest bit of contiguous RAM as our "system memory" */
 +        .name = "mps.ram",
 +        .base = 0x80000000,
 +        .size = 16 * MiB,
 +        .mpc = -1,
 +        .mrindex = -1,
 +    }, {
 +        .name = NULL,
 +    },
 +};
 +
 +static const RAMInfo *find_raminfo_for_mpc(MPS2TZMachineState *mms, int mpc)
 +{
-+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
++    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 7);
-+    const RAMInfo *p;
++    return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
 +
 +    for (p = mmc->raminfo; p->name; p++) {
 +        if (p->mpc == mpc && !(p->flags & IS_ALIAS)) {
 +            return p;
 +        }
 +    }
 +    /* if raminfo array doesn't have an entry for each MPC this is a bug */
 +    g_assert_not_reached();
 +}
 +
-+static MemoryRegion *mr_for_raminfo(MPS2TZMachineState *mms,
++static int16_t do_vqdmladh_h(int16_t a, int16_t b, int16_t c, int16_t d,
-+                                    const RAMInfo *raminfo)
++                             int round, bool *sat)
 +{
-+    /* Return an initialized MemoryRegion for the RAMInfo. */
++    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 15);
-+    MemoryRegion *ram;
++    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
 +
 +    if (raminfo->mrindex < 0) {
 +        /* Means this RAMInfo is for QEMU's "system memory" */
 +        MachineState *machine = MACHINE(mms);
 +        return machine->ram;
 +    }
 +
 +    assert(raminfo->mrindex < MPS2TZ_RAM_MAX);
 +    ram = &mms->ram[raminfo->mrindex];
 +
 +    memory_region_init_ram(ram, NULL, raminfo->name,
 +                           raminfo->size, &error_fatal);
 +    return ram;
 +}
 +
- /* Create an alias of an entire original MemoryRegion @orig
++static int32_t do_vqdmladh_w(int32_t a, int32_t b, int32_t c, int32_t d,
-  * located at @base in the memory map.
++                             int round, bool *sat)
   */
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
                                const int *irqs)
  {
      TZMPC *mpc = opaque;
 -    int i = mpc - &mms->ssram_mpc[0];
 -    MemoryRegion *ssram = &mms->ssram[i];
 +    int i = mpc - &mms->mpc[0];
      MemoryRegion *upstream;
 -    char *mpcname = g_strdup_printf("%s-mpc", name);
 -    static uint32_t ramsize[] = { 0x00400000, 0x00200000, 0x00200000 };
 -    static uint32_t rambase[] = { 0x00000000, 0x28000000, 0x28200000 };
 +    const RAMInfo *raminfo = find_raminfo_for_mpc(mms, i);
 +    MemoryRegion *ram = mr_for_raminfo(mms, raminfo);
 -    memory_region_init_ram(ssram, NULL, name, ramsize[i], &error_fatal);
 -
 -    object_initialize_child(OBJECT(mms), mpcname, mpc, TYPE_TZ_MPC);
 -    object_property_set_link(OBJECT(mpc), "downstream", OBJECT(ssram),
 +    object_initialize_child(OBJECT(mms), name, mpc, TYPE_TZ_MPC);
 +    object_property_set_link(OBJECT(mpc), "downstream", OBJECT(ram),
                               &error_fatal);
      sysbus_realize(SYS_BUS_DEVICE(mpc), &error_fatal);
      /* Map the upstream end of the MPC into system memory */
      upstream = sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 1);
 -    memory_region_add_subregion(get_system_memory(), rambase[i], upstream);
 +    memory_region_add_subregion(get_system_memory(), raminfo->base, upstream);
      /* and connect its interrupt to the IoTKit */
      qdev_connect_gpio_out_named(DEVICE(mpc), "irq", 0,
                                  qdev_get_gpio_in_named(DEVICE(&mms->iotkit),
                                                         "mpcexp_status", i));
 -    /* The first SSRAM is a special case as it has an alias; accesses to
 -     * the alias region at 0x00400000 must also go to the MPC upstream.
 -     */
 -    if (i == 0) {
 -        make_ram_alias(&mms->ssram1_m, "mps.ssram1_m", upstream, 0x00400000);
 -    }
 -
 -    g_free(mpcname);
      /* Return the register interface MR for our caller to map behind the PPC */
      return sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 0);
  }
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_i2c(MPS2TZMachineState *mms, void *opaque,
      return sysbus_mmio_get_region(s, 0);
  }
 +static void create_non_mpc_ram(MPS2TZMachineState *mms)
 +{
++    int64_t m1 = (int64_t)a * b;
++    int64_t m2 = (int64_t)c * d;
++    int64_t r;
 +    /*
-+     * Handle the RAMs which are either not behind MPCs or which are
++     * Architecturally we should do the entire add, double, round
-+     * aliases to another MPC.
++     * and then check for saturation. We do three saturating adds,
 +     * but we need to be careful about the order. If the first
 +     * m1 + m2 saturates then it's impossible for the *2+rc to
 +     * bring it back into the non-saturated range. However, if
 +     * m1 + m2 is negative then it's possible that doing the doubling
 +     * would take the intermediate result below INT64_MAX and the
 +     * addition of the rounding constant then brings it back in range.
 +     * So we add half the rounding constant before doubling rather
 +     * than adding the rounding constant after the doubling.
 +     */
-+    const RAMInfo *p;
++    if (sadd64_overflow(m1, m2, &r) ||
-+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
++        sadd64_overflow(r, (round << 30), &r) ||
-+
++        sadd64_overflow(r, r, &r)) {
-+    for (p = mmc->raminfo; p->name; p++) {
++        *sat = true;
-+        if (p->flags & IS_ALIAS) {
++        return r < 0 ? INT32_MAX : INT32_MIN;
 +            SysBusDevice *mpc_sbd = SYS_BUS_DEVICE(&mms->mpc[p->mpc]);
 +            MemoryRegion *upstream = sysbus_mmio_get_region(mpc_sbd, 1);
 +            make_ram_alias(&mms->ram[p->mrindex], p->name, upstream, p->base);
 +        } else if (p->mpc == -1) {
 +            /* RAM not behind an MPC */
 +            MemoryRegion *mr = mr_for_raminfo(mms, p);
 +            memory_region_add_subregion(get_system_memory(), p->base, mr);
 +        }
 +    }
++    return r >> 32;
 +}
 +
- static void mps2tz_common_init(MachineState *machine)
++DO_VQDMLADH_OP(vqdmladhb, 1, int8_t, 0, 0, do_vqdmladh_b)
- {
++DO_VQDMLADH_OP(vqdmladhh, 2, int16_t, 0, 0, do_vqdmladh_h)
-     MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
++DO_VQDMLADH_OP(vqdmladhw, 4, int32_t, 0, 0, do_vqdmladh_w)
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
++DO_VQDMLADH_OP(vqdmladhxb, 1, int8_t, 1, 0, do_vqdmladh_b)
-     qdev_connect_gpio_out_named(iotkitdev, "sec_resp_cfg", 0,
++DO_VQDMLADH_OP(vqdmladhxh, 2, int16_t, 1, 0, do_vqdmladh_h)
-                                 qdev_get_gpio_in(dev_splitter, 0));
++DO_VQDMLADH_OP(vqdmladhxw, 4, int32_t, 1, 0, do_vqdmladh_w)
 -    /* The IoTKit sets up much of the memory layout, including
 +    /*
 +     * The IoTKit sets up much of the memory layout, including
       * the aliases between secure and non-secure regions in the
 -     * address space. The FPGA itself contains:
 -     *
 -     * 0x00000000..0x003fffff  SSRAM1
 -     * 0x00400000..0x007fffff  alias of SSRAM1
 -     * 0x28000000..0x283fffff  4MB SSRAM2 + SSRAM3
 -     * 0x40100000..0x4fffffff  AHB Master Expansion 1 interface devices
 -     * 0x80000000..0x80ffffff  16MB PSRAM
 -     */
 -
 -    /* The FPGA images have an odd combination of different RAMs,
 +     * address space, and also most of the devices in the system.
 +     * The FPGA itself contains various RAMs and some additional devices.
 +     * The FPGA images have an odd combination of different RAMs,
       * because in hardware they are different implementations and
       * connected to different buses, giving varying performance/size
       * tradeoffs. For QEMU they're all just RAM, though. We arbitrarily
 -     * call the 16MB our "system memory", as it's the largest lump.
 +     * call the largest lump our "system memory".
       */
 -    memory_region_add_subregion(system_memory, 0x80000000, machine->ram);
      /*
       * The overflow IRQs for all UARTs are ORed together.
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
      const PPCInfo an505_ppcs[] = { {
              .name = "apb_ppcexp0",
              .ports = {
 -                { "ssram-0", make_mpc, &mms->ssram_mpc[0], 0x58007000, 0x1000 },
 -                { "ssram-1", make_mpc, &mms->ssram_mpc[1], 0x58008000, 0x1000 },
 -                { "ssram-2", make_mpc, &mms->ssram_mpc[2], 0x58009000, 0x1000 },
 +                { "ssram-0-mpc", make_mpc, &mms->mpc[0], 0x58007000, 0x1000 },
 +                { "ssram-1-mpc", make_mpc, &mms->mpc[1], 0x58008000, 0x1000 },
 +                { "ssram-2-mpc", make_mpc, &mms->mpc[2], 0x58009000, 0x1000 },
              },
          }, {
              .name = "apb_ppcexp1",
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
      create_unimplemented_device("FPGA NS PC", 0x48007000, 0x1000);
 +    create_non_mpc_ram(mms);
 +
-     armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename, 0x400000);
++DO_VQDMLADH_OP(vqrdmladhb, 1, int8_t, 0, 1, do_vqdmladh_b)
- }
++DO_VQDMLADH_OP(vqrdmladhh, 2, int16_t, 0, 1, do_vqdmladh_h)
++DO_VQDMLADH_OP(vqrdmladhw, 4, int32_t, 0, 1, do_vqdmladh_w)
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
++DO_VQDMLADH_OP(vqrdmladhxb, 1, int8_t, 1, 1, do_vqdmladh_b)
-     mmc->fpgaio_num_leds = 2;
++DO_VQDMLADH_OP(vqrdmladhxh, 2, int16_t, 1, 1, do_vqdmladh_h)
-     mmc->fpgaio_has_switches = false;
++DO_VQDMLADH_OP(vqrdmladhxw, 4, int32_t, 1, 1, do_vqdmladh_w)
-     mmc->numirq = 92;
++
-+    mmc->raminfo = an505_raminfo;
+ #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
-     mmc->armsse_type = TYPE_IOTKIT;
+     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
- }
+                                 uint32_t rm)                            \
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
+index XXXXXXX..XXXXXXX 100644
-     mmc->fpgaio_num_leds = 2;
+--- a/target/arm/translate-mve.c
-     mmc->fpgaio_has_switches = false;
++++ b/target/arm/translate-mve.c
-     mmc->numirq = 92;
+@@ -XXX,XX +XXX,XX @@ DO_2OP(VQSHL_S, vqshls)
-+    mmc->raminfo = an505_raminfo; /* AN521 is the same as AN505 here */
+ DO_2OP(VQSHL_U, vqshlu)
-     mmc->armsse_type = TYPE_SSE200;
+ DO_2OP(VQRSHL_S, vqrshls)
- }
+ DO_2OP(VQRSHL_U, vqrshlu)
++DO_2OP(VQDMLADH, vqdmladh)
 +DO_2OP(VQDMLADHX, vqdmladhx)
 +DO_2OP(VQRDMLADH, vqrdmladh)
 +DO_2OP(VQRDMLADHX, vqrdmladhx)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
 .20.1

-[PULL 11/49] hw/i2c/npcm7xx_smbus: Simplify npcm7xx_smbus_init()
+[PULL 48/57] target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Implement the MVE VQDMLSDH and VQRDMLSDH insns, which are
 like VQDMLADH and VQRDMLADH except that products are subtracted
 rather than added.
-The STATUS register will be reset to IDLE in
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-cnpcm7xx_smbus_enter_reset(), no need to preset
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-it in instance_init().
+Message-id: 20210617121628.20116-38-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    | 16 ++++++++++++++
  target/arm/mve.decode      |  5 +++++
  target/arm/mve_helper.c    | 44 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  4 ++++
 files changed, 69 insertions(+)
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Reviewed-by: Hao Wu <wuhaotsh@google.com>
 Message-id: 20210228224813.312532-1-f4bug@amsat.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/i2c/npcm7xx_smbus.c | 1 -
 file changed, 1 deletion(-)
 diff --git a/hw/i2c/npcm7xx_smbus.c b/hw/i2c/npcm7xx_smbus.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/i2c/npcm7xx_smbus.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/i2c/npcm7xx_smbus.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void npcm7xx_smbus_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     sysbus_init_mmio(sbd, &s->iomem);
+ DEF_HELPER_FLAGS_4(mve_vqrdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ DEF_HELPER_FLAGS_4(mve_vqrdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     s->bus = i2c_init_bus(DEVICE(s), "i2c-bus");
--    s->status = NPCM7XX_SMBUS_STATUS_IDLE;
++DEF_HELPER_FLAGS_4(mve_vqdmlsdhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmlsdhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmlsdhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vqdmlsdhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmlsdhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmlsdhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQDMLADHX        1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
  VQRDMLADH        1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
  VQRDMLADHX       1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
 +VQDMLSDH         1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
 +VQDMLSDHX        1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
 +VQRDMLSDH        1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
 +VQRDMLSDHX       1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static int32_t do_vqdmladh_w(int32_t a, int32_t b, int32_t c, int32_t d,
      return r >> 32;
  }
- static const VMStateDescription vmstate_npcm7xx_smbus = {
++static int8_t do_vqdmlsdh_b(int8_t a, int8_t b, int8_t c, int8_t d,
 +                            int round, bool *sat)
 +{
 +    int64_t r = ((int64_t)a * b - (int64_t)c * d) * 2 + (round << 7);
 +    return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
 +}
 +
 +static int16_t do_vqdmlsdh_h(int16_t a, int16_t b, int16_t c, int16_t d,
 +                             int round, bool *sat)
 +{
 +    int64_t r = ((int64_t)a * b - (int64_t)c * d) * 2 + (round << 15);
 +    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
 +}
 +
 +static int32_t do_vqdmlsdh_w(int32_t a, int32_t b, int32_t c, int32_t d,
 +                             int round, bool *sat)
 +{
 +    int64_t m1 = (int64_t)a * b;
 +    int64_t m2 = (int64_t)c * d;
 +    int64_t r;
 +    /* The same ordering issue as in do_vqdmladh_w applies here too */
 +    if (ssub64_overflow(m1, m2, &r) ||
 +        sadd64_overflow(r, (round << 30), &r) ||
 +        sadd64_overflow(r, r, &r)) {
 +        *sat = true;
 +        return r < 0 ? INT32_MAX : INT32_MIN;
 +    }
 +    return r >> 32;
 +}
 +
  DO_VQDMLADH_OP(vqdmladhb, 1, int8_t, 0, 0, do_vqdmladh_b)
  DO_VQDMLADH_OP(vqdmladhh, 2, int16_t, 0, 0, do_vqdmladh_h)
  DO_VQDMLADH_OP(vqdmladhw, 4, int32_t, 0, 0, do_vqdmladh_w)
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmladhxb, 1, int8_t, 1, 1, do_vqdmladh_b)
  DO_VQDMLADH_OP(vqrdmladhxh, 2, int16_t, 1, 1, do_vqdmladh_h)
  DO_VQDMLADH_OP(vqrdmladhxw, 4, int32_t, 1, 1, do_vqdmladh_w)
 +DO_VQDMLADH_OP(vqdmlsdhb, 1, int8_t, 0, 0, do_vqdmlsdh_b)
 +DO_VQDMLADH_OP(vqdmlsdhh, 2, int16_t, 0, 0, do_vqdmlsdh_h)
 +DO_VQDMLADH_OP(vqdmlsdhw, 4, int32_t, 0, 0, do_vqdmlsdh_w)
 +DO_VQDMLADH_OP(vqdmlsdhxb, 1, int8_t, 1, 0, do_vqdmlsdh_b)
 +DO_VQDMLADH_OP(vqdmlsdhxh, 2, int16_t, 1, 0, do_vqdmlsdh_h)
 +DO_VQDMLADH_OP(vqdmlsdhxw, 4, int32_t, 1, 0, do_vqdmlsdh_w)
 +
 +DO_VQDMLADH_OP(vqrdmlsdhb, 1, int8_t, 0, 1, do_vqdmlsdh_b)
 +DO_VQDMLADH_OP(vqrdmlsdhh, 2, int16_t, 0, 1, do_vqdmlsdh_h)
 +DO_VQDMLADH_OP(vqrdmlsdhw, 4, int32_t, 0, 1, do_vqdmlsdh_w)
 +DO_VQDMLADH_OP(vqrdmlsdhxb, 1, int8_t, 1, 1, do_vqdmlsdh_b)
 +DO_VQDMLADH_OP(vqrdmlsdhxh, 2, int16_t, 1, 1, do_vqdmlsdh_h)
 +DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
 +
  #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                  uint32_t rm)                            \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQDMLADH, vqdmladh)
  DO_2OP(VQDMLADHX, vqdmladhx)
  DO_2OP(VQRDMLADH, vqrdmladh)
  DO_2OP(VQRDMLADHX, vqrdmladhx)
 +DO_2OP(VQDMLSDH, vqdmlsdh)
 +DO_2OP(VQDMLSDHX, vqdmlsdhx)
 +DO_2OP(VQRDMLSDH, vqrdmlsdh)
 +DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
 --
 .20.1

-[PULL 05/49] target/arm: Set ID_PFR2.SSBS to 1 for "max" 32-bit CPU
+[PULL 49/57] target/arm: Implement MVE VQDMULL (vector)
-From: Rebecca Cran <rebecca@nuviainc.com>
+Implement the vector form of the MVE VQDMULL insn.
-Enable FEAT_SSBS for the "max" 32-bit CPU.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210617121628.20116-39-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  5 +++++
  target/arm/mve.decode      |  5 +++++
  target/arm/mve_helper.c    | 30 ++++++++++++++++++++++++++++++
  target/arm/translate-mve.c | 30 ++++++++++++++++++++++++++++++
 files changed, 70 insertions(+)
-Signed-off-by: Rebecca Cran <rebecca@nuviainc.com>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210216224543.16142-4-rebecca@nuviainc.com
 [PMM: fix typo causing compilation failure]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/cpu.c | 4 ++++
 file changed, 4 insertions(+)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/cpu.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         t = cpu->isar.id_pfr0;
+ DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         t = FIELD_DP32(t, ID_PFR0, DIT, 1);
+ DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         cpu->isar.id_pfr0 = t;
 +DEF_HELPER_FLAGS_4(mve_vqdmullbh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmullbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmullth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vqdmulltw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
-+        t = cpu->isar.id_pfr2;
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        t = FIELD_DP32(t, ID_PFR2, SSBS, 1);
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        cpu->isar.id_pfr2 = t;
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     }
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
- #endif
+index XXXXXXX..XXXXXXX 100644
- }
+--- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
  @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
  @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
 +@2op_sz28 .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn \
 +     size=%size_28
  # The _rev suffix indicates that Vn and Vm are reversed. This is
  # the case for shifts. In the Arm ARM these insns are documented
@@ -XXX,XX +XXX,XX @@ VQDMLSDHX        1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
  VQRDMLSDH        1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
  VQRDMLSDHX       1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
 +VQDMULLB         111 . 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 1 @2op_sz28
 +VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR_L(vqdmullt_scalarh, 1, 2, int16_t, 4, int32_t, \
  DO_2OP_SAT_SCALAR_L(vqdmullt_scalarw, 1, 4, int32_t, 8, int64_t, \
                      do_qdmullw, SATMASK32)
 +/*
 + * Long saturating ops
 + */
 +#define DO_2OP_SAT_L(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN, SATMASK)  \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
 +                                void *vm)                               \
 +    {                                                                   \
 +        LTYPE *d = vd;                                                  \
 +        TYPE *n = vn, *m = vm;                                          \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned le;                                                    \
 +        bool qc = false;                                                \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
 +            bool sat = false;                                           \
 +            LTYPE op1 = n[H##ESIZE(le * 2 + TOP)];                      \
 +            LTYPE op2 = m[H##ESIZE(le * 2 + TOP)];                      \
 +            mergemask(&d[H##LESIZE(le)], FN(op1, op2, &sat), mask);     \
 +            qc |= sat && (mask & SATMASK);                              \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +DO_2OP_SAT_L(vqdmullbh, 0, 2, int16_t, 4, int32_t, do_qdmullh, SATMASK16B)
 +DO_2OP_SAT_L(vqdmullbw, 0, 4, int32_t, 8, int64_t, do_qdmullw, SATMASK32)
 +DO_2OP_SAT_L(vqdmullth, 1, 2, int16_t, 4, int32_t, do_qdmullh, SATMASK16T)
 +DO_2OP_SAT_L(vqdmulltw, 1, 4, int32_t, 8, int64_t, do_qdmullw, SATMASK32)
 +
  static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
  {
      m &= 0xff;
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQDMLSDHX, vqdmlsdhx)
  DO_2OP(VQRDMLSDH, vqrdmlsdh)
  DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
 +static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
 +{
 +    static MVEGenTwoOpFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vqdmullbh,
 +        gen_helper_mve_vqdmullbw,
 +        NULL,
 +    };
 +    if (a->size == MO_32 && (a->qd == a->qm || a->qd == a->qn)) {
 +        /* UNPREDICTABLE; we choose to undef */
 +        return false;
 +    }
 +    return do_2op(s, a, fns[a->size]);
 +}
 +
 +static bool trans_VQDMULLT(DisasContext *s, arg_2op *a)
 +{
 +    static MVEGenTwoOpFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vqdmullth,
 +        gen_helper_mve_vqdmulltw,
 +        NULL,
 +    };
 +    if (a->size == MO_32 && (a->qd == a->qm || a->qd == a->qn)) {
 +        /* UNPREDICTABLE; we choose to undef */
 +        return false;
 +    }
 +    return do_2op(s, a, fns[a->size]);
 +}
 +
  static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                            MVEGenTwoOpScalarFn fn)
  {
 --
 .20.1

-[PULL 04/49] target/arm: Enable FEAT_SSBS for "max" AARCH64 CPU
+[PULL 50/57] target/arm: Implement MVE VRHADD
-From: Rebecca Cran <rebecca@nuviainc.com>
+Implement the MVE VRHADD insn, which performs a rounded halving
 addition.
-Set ID_AA64PFR1_EL1.SSBS to 2 and ID_PFR2.SSBS to 1.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210617121628.20116-40-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    | 8 ++++++++
  target/arm/mve.decode      | 3 +++
  target/arm/mve_helper.c    | 6 ++++++
  target/arm/translate-mve.c | 2 ++
 files changed, 19 insertions(+)
-Signed-off-by: Rebecca Cran <rebecca@nuviainc.com>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210216224543.16142-3-rebecca@nuviainc.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/cpu64.c | 5 +++++
 file changed, 5 insertions(+)
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu64.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/cpu64.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqdmullbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ DEF_HELPER_FLAGS_4(mve_vqdmullth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         t = cpu->isar.id_aa64pfr1;
+ DEF_HELPER_FLAGS_4(mve_vqdmulltw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         t = FIELD_DP64(t, ID_AA64PFR1, BT, 1);
-+        t = FIELD_DP64(t, ID_AA64PFR1, SSBS, 2);
++DEF_HELPER_FLAGS_4(mve_vrhaddsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         /*
++DEF_HELPER_FLAGS_4(mve_vrhaddsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-          * Begin with full support for MTE. This will be downgraded to MTE=0
++DEF_HELPER_FLAGS_4(mve_vrhaddsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
           * during realize if the board provides no tag memory, much like
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          u = FIELD_DP32(u, ID_PFR0, DIT, 1);
          cpu->isar.id_pfr0 = u;
 +        u = cpu->isar.id_pfr2;
 +        u = FIELD_DP32(u, ID_PFR2, SSBS, 1);
 +        cpu->isar.id_pfr2 = u;
 +
-         u = cpu->isar.id_mmfr3;
++DEF_HELPER_FLAGS_4(mve_vrhaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         u = FIELD_DP32(u, ID_MMFR3, PAN, 2); /* ATS1E1 */
++DEF_HELPER_FLAGS_4(mve_vrhadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-         cpu->isar.id_mmfr3 = u;
++DEF_HELPER_FLAGS_4(mve_vrhadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQRDMLSDHX       1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
  VQDMULLB         111 . 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 1 @2op_sz28
  VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
 +VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 +VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 +
  # Vector miscellaneous
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vshlu, DO_VSHLU)
  DO_2OP_S(vrshls, DO_VRSHLS)
  DO_2OP_U(vrshlu, DO_VRSHLU)
 +#define DO_RHADD_S(N, M) (((int64_t)(N) + (M) + 1) >> 1)
 +#define DO_RHADD_U(N, M) (((uint64_t)(N) + (M) + 1) >> 1)
 +
 +DO_2OP_S(vrhadds, DO_RHADD_S)
 +DO_2OP_U(vrhaddu, DO_RHADD_U)
 +
  static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  {
      if (val > max) {
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQDMLSDH, vqdmlsdh)
  DO_2OP(VQDMLSDHX, vqdmlsdhx)
  DO_2OP(VQRDMLSDH, vqrdmlsdh)
  DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
 +DO_2OP(VRHADD_S, vrhadds)
 +DO_2OP(VRHADD_U, vrhaddu)
  static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
  {
 --
 .20.1

-[PULL 08/49] tests/qtests: Add npcm7xx emc model test
+[PULL 51/57] target/arm: Implement MVE VADC, VSBC
-From: Doug Evans <dje@google.com>
+Implement the MVE VADC and VSBC insns.  These perform an
 add-with-carry or subtract-with-carry of the 32-bit elements in each
 lane of the input vectors, where the carry-out of each add is the
 carry-in of the next.  The initial carry input is either 1 or is from
 FPSCR.C; the carry out at the end is written back to FPSCR.C.
-Reviewed-by: Hao Wu <wuhaotsh@google.com>
-Reviewed-by: Avi Fishman <avi.fishman@nuvoton.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Doug Evans <dje@google.com>
-Message-id: 20210218212453.831406-4-dje@google.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210617121628.20116-41-peter.maydell@linaro.org
 ---
- tests/qtest/npcm7xx_emc-test.c | 862 +++++++++++++++++++++++++++++++++
+ target/arm/helper-mve.h    |  5 ++++
- tests/qtest/meson.build        |   3 +-
+ target/arm/mve.decode      |  5 ++++
-files changed, 864 insertions(+), 1 deletion(-)
+ target/arm/mve_helper.c    | 52 ++++++++++++++++++++++++++++++++++++++
- create mode 100644 tests/qtest/npcm7xx_emc-test.c
+ target/arm/translate-mve.c | 37 +++++++++++++++++++++++++++
 files changed, 99 insertions(+)
-diff --git a/tests/qtest/npcm7xx_emc-test.c b/tests/qtest/npcm7xx_emc-test.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-new file mode 100644
+index XXXXXXX..XXXXXXX 100644
-index XXXXXXX..XXXXXXX
+--- a/target/arm/helper-mve.h
---- /dev/null
++++ b/target/arm/helper-mve.h
-+++ b/tests/qtest/npcm7xx_emc-test.c
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrhaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@
+ DEF_HELPER_FLAGS_4(mve_vrhadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+/*
+ DEF_HELPER_FLAGS_4(mve_vrhadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+ * QTests for Nuvoton NPCM7xx EMC Modules.
-+ *
++DEF_HELPER_FLAGS_4(mve_vadc, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+ * Copyright 2020 Google LLC
++DEF_HELPER_FLAGS_4(mve_vadci, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+ *
++DEF_HELPER_FLAGS_4(mve_vsbc, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+ * This program is free software; you can redistribute it and/or modify it
++DEF_HELPER_FLAGS_4(mve_vsbci, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 + * under the terms of the GNU General Public License as published by the
 + * Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
 + * for more details.
 + */
 +
-+#include "qemu/osdep.h"
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+#include "qemu-common.h"
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+#include "libqos/libqos.h"
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+#include "qapi/qmp/qdict.h"
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-+#include "qapi/qmp/qnum.h"
+index XXXXXXX..XXXXXXX 100644
-+#include "qemu/bitops.h"
+--- a/target/arm/mve.decode
-+#include "qemu/iov.h"
++++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
  VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
  VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 +VADC             1110 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
 +VSBC             1111 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
 +VADCI            1110 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
 +VSBCI            1111 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
 +
-+/* Name of the emc device. */
+ # Vector miscellaneous
-+#define TYPE_NPCM7XX_EMC "npcm7xx-emc"
  VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vrshlu, DO_VRSHLU)
  DO_2OP_S(vrhadds, DO_RHADD_S)
  DO_2OP_U(vrhaddu, DO_RHADD_U)
 +static void do_vadc(CPUARMState *env, uint32_t *d, uint32_t *n, uint32_t *m,
 +                    uint32_t inv, uint32_t carry_in, bool update_flags)
 +{
 +    uint16_t mask = mve_element_mask(env);
 +    unsigned e;
 +
-+/* Timeout for various operations, in seconds. */
++    /* If any additions trigger, we will update flags. */
-+#define TIMEOUT_SECONDS 10
++    if (mask & 0x1111) {
 +        update_flags = true;
 +    }
 +
-+/* Address in memory of the descriptor. */
++    for (e = 0; e < 16 / 4; e++, mask >>= 4) {
-+#define DESC_ADDR (1 << 20) /* 1 MiB */
++        uint64_t r = carry_in;
 +        r += n[H4(e)];
 +        r += m[H4(e)] ^ inv;
 +        if (mask & 1) {
 +            carry_in = r >> 32;
 +        }
 +        mergemask(&d[H4(e)], r, mask);
 +    }
 +
-+/* Address in memory of the data packet. */
++    if (update_flags) {
-+#define DATA_ADDR (DESC_ADDR + 4096)
++        /* Store C, clear NZV. */
-+
++        env->vfp.xregs[ARM_VFP_FPSCR] &= ~FPCR_NZCV_MASK;
-+#define CRC_LENGTH 4
++        env->vfp.xregs[ARM_VFP_FPSCR] |= carry_in * FPCR_C;
 +
 +#define NUM_TX_DESCRIPTORS 3
 +#define NUM_RX_DESCRIPTORS 2
 +
 +/* Size of tx,rx test buffers. */
 +#define TX_DATA_LEN 64
 +#define RX_DATA_LEN 64
 +
 +#define TX_STEP_COUNT 10000
 +#define RX_STEP_COUNT 10000
 +
 +/* 32-bit register indices. */
 +typedef enum NPCM7xxPWMRegister {
 +    /* Control registers. */
 +    REG_CAMCMR,
 +    REG_CAMEN,
 +
 +    /* There are 16 CAMn[ML] registers. */
 +    REG_CAMM_BASE,
 +    REG_CAML_BASE,
 +
 +    REG_TXDLSA = 0x22,
 +    REG_RXDLSA,
 +    REG_MCMDR,
 +    REG_MIID,
 +    REG_MIIDA,
 +    REG_FFTCR,
 +    REG_TSDR,
 +    REG_RSDR,
 +    REG_DMARFC,
 +    REG_MIEN,
 +
 +    /* Status registers. */
 +    REG_MISTA,
 +    REG_MGSTA,
 +    REG_MPCNT,
 +    REG_MRPC,
 +    REG_MRPCC,
 +    REG_MREPC,
 +    REG_DMARFS,
 +    REG_CTXDSA,
 +    REG_CTXBSA,
 +    REG_CRXDSA,
 +    REG_CRXBSA,
 +
 +    NPCM7XX_NUM_EMC_REGS,
 +} NPCM7xxPWMRegister;
 +
 +enum { NUM_CAMML_REGS = 16 };
 +
 +/* REG_CAMCMR fields */
 +/* Enable CAM Compare */
 +#define REG_CAMCMR_ECMP (1 << 4)
 +/* Accept Unicast Packet */
 +#define REG_CAMCMR_AUP (1 << 0)
 +
 +/* REG_MCMDR fields */
 +/* Software Reset */
 +#define REG_MCMDR_SWR (1 << 24)
 +/* Frame Transmission On */
 +#define REG_MCMDR_TXON (1 << 8)
 +/* Accept Long Packet */
 +#define REG_MCMDR_ALP (1 << 1)
 +/* Frame Reception On */
 +#define REG_MCMDR_RXON (1 << 0)
 +
 +/* REG_MIEN fields */
 +/* Enable Transmit Completion Interrupt */
 +#define REG_MIEN_ENTXCP (1 << 18)
 +/* Enable Transmit Interrupt */
 +#define REG_MIEN_ENTXINTR (1 << 16)
 +/* Enable Receive Good Interrupt */
 +#define REG_MIEN_ENRXGD (1 << 4)
 +/* ENable Receive Interrupt */
 +#define REG_MIEN_ENRXINTR (1 << 0)
 +
 +/* REG_MISTA fields */
 +/* Transmit Bus Error Interrupt */
 +#define REG_MISTA_TXBERR (1 << 24)
 +/* Transmit Descriptor Unavailable Interrupt */
 +#define REG_MISTA_TDU (1 << 23)
 +/* Transmit Completion Interrupt */
 +#define REG_MISTA_TXCP (1 << 18)
 +/* Transmit Interrupt */
 +#define REG_MISTA_TXINTR (1 << 16)
 +/* Receive Bus Error Interrupt */
 +#define REG_MISTA_RXBERR (1 << 11)
 +/* Receive Descriptor Unavailable Interrupt */
 +#define REG_MISTA_RDU (1 << 10)
 +/* DMA Early Notification Interrupt */
 +#define REG_MISTA_DENI (1 << 9)
 +/* Maximum Frame Length Interrupt */
 +#define REG_MISTA_DFOI (1 << 8)
 +/* Receive Good Interrupt */
 +#define REG_MISTA_RXGD (1 << 4)
 +/* Packet Too Long Interrupt */
 +#define REG_MISTA_PTLE (1 << 3)
 +/* Receive Interrupt */
 +#define REG_MISTA_RXINTR (1 << 0)
 +
 +typedef struct NPCM7xxEMCTxDesc NPCM7xxEMCTxDesc;
 +typedef struct NPCM7xxEMCRxDesc NPCM7xxEMCRxDesc;
 +
 +struct NPCM7xxEMCTxDesc {
 +    uint32_t flags;
 +    uint32_t txbsa;
 +    uint32_t status_and_length;
 +    uint32_t ntxdsa;
 +};
 +
 +struct NPCM7xxEMCRxDesc {
 +    uint32_t status_and_length;
 +    uint32_t rxbsa;
 +    uint32_t reserved;
 +    uint32_t nrxdsa;
 +};
 +
 +/* NPCM7xxEMCTxDesc.flags values */
 +/* Owner: 0 = cpu, 1 = emc */
 +#define TX_DESC_FLAG_OWNER_MASK (1 << 31)
 +/* Transmit interrupt enable */
 +#define TX_DESC_FLAG_INTEN (1 << 2)
 +
 +/* NPCM7xxEMCTxDesc.status_and_length values */
 +/* Transmission complete */
 +#define TX_DESC_STATUS_TXCP (1 << 19)
 +/* Transmit interrupt */
 +#define TX_DESC_STATUS_TXINTR (1 << 16)
 +
 +/* NPCM7xxEMCRxDesc.status_and_length values */
 +/* Owner: 0b00 = cpu, 0b10 = emc */
 +#define RX_DESC_STATUS_OWNER_SHIFT 30
 +#define RX_DESC_STATUS_OWNER_MASK 0xc0000000
 +/* Frame Reception Complete */
 +#define RX_DESC_STATUS_RXGD (1 << 20)
 +/* Packet too long */
 +#define RX_DESC_STATUS_PTLE (1 << 19)
 +/* Receive Interrupt */
 +#define RX_DESC_STATUS_RXINTR (1 << 16)
 +
 +#define RX_DESC_PKT_LEN(word) ((uint32_t) (word) & 0xffff)
 +
 +typedef struct EMCModule {
 +    int rx_irq;
 +    int tx_irq;
 +    uint64_t base_addr;
 +} EMCModule;
 +
 +typedef struct TestData {
 +    const EMCModule *module;
 +} TestData;
 +
 +static const EMCModule emc_module_list[] = {
 +    {
 +        .rx_irq     = 15,
 +        .tx_irq     = 16,
 +        .base_addr  = 0xf0825000
 +    },
 +    {
 +        .rx_irq     = 114,
 +        .tx_irq     = 115,
 +        .base_addr  = 0xf0826000
 +    }
-+};
++    mve_advance_vpt(env);
 +
 +/* Returns the index of the EMC module. */
 +static int emc_module_index(const EMCModule *mod)
 +{
 +    ptrdiff_t diff = mod - emc_module_list;
 +
 +    g_assert_true(diff >= 0 && diff < ARRAY_SIZE(emc_module_list));
 +
 +    return diff;
 +}
 +
-+static void packet_test_clear(void *sockets)
++void HELPER(mve_vadc)(CPUARMState *env, void *vd, void *vn, void *vm)
 +{
-+    int *test_sockets = sockets;
++    bool carry_in = env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_C;
-+
++    do_vadc(env, vd, vn, vm, 0, carry_in, false);
 +    close(test_sockets[0]);
 +    g_free(test_sockets);
 +}
 +
-+static int *packet_test_init(int module_num, GString *cmd_line)
++void HELPER(mve_vsbc)(CPUARMState *env, void *vd, void *vn, void *vm)
 +{
-+    int *test_sockets = g_new(int, 2);
++    bool carry_in = env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_C;
-+    int ret = socketpair(PF_UNIX, SOCK_STREAM, 0, test_sockets);
++    do_vadc(env, vd, vn, vm, -1, carry_in, false);
 +    g_assert_cmpint(ret, != , -1);
 +
 +    /*
 +     * KISS and use -nic. We specify two nics (both emc{0,1}) because there's
 +     * currently no way to specify only emc1: The driver implicitly relies on
 +     * emc[i] == nd_table[i].
 +     */
 +    if (module_num == 0) {
 +        g_string_append_printf(cmd_line,
 +                               " -nic socket,fd=%d,model=" TYPE_NPCM7XX_EMC " "
 +                               " -nic user,model=" TYPE_NPCM7XX_EMC " ",
 +                               test_sockets[1]);
 +    } else {
 +        g_string_append_printf(cmd_line,
 +                               " -nic user,model=" TYPE_NPCM7XX_EMC " "
 +                               " -nic socket,fd=%d,model=" TYPE_NPCM7XX_EMC " ",
 +                               test_sockets[1]);
 +    }
 +
 +    g_test_queue_destroy(packet_test_clear, test_sockets);
 +    return test_sockets;
 +}
 +
-+static uint32_t emc_read(QTestState *qts, const EMCModule *mod,
++
-+                         NPCM7xxPWMRegister regno)
++void HELPER(mve_vadci)(CPUARMState *env, void *vd, void *vn, void *vm)
 +{
-+    return qtest_readl(qts, mod->base_addr + regno * sizeof(uint32_t));
++    do_vadc(env, vd, vn, vm, 0, 0, true);
 +}
 +
-+static void emc_write(QTestState *qts, const EMCModule *mod,
++void HELPER(mve_vsbci)(CPUARMState *env, void *vd, void *vn, void *vm)
 +                      NPCM7xxPWMRegister regno, uint32_t value)
 +{
-+    qtest_writel(qts, mod->base_addr + regno * sizeof(uint32_t), value);
++    do_vadc(env, vd, vn, vm, -1, 1, true);
 +}
 +
-+static void emc_read_tx_desc(QTestState *qts, uint32_t addr,
+ static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
-+                             NPCM7xxEMCTxDesc *desc)
+ {
      if (val > max) {
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMULLT(DisasContext *s, arg_2op *a)
      return do_2op(s, a, fns[a->size]);
  }
 +/*
 + * VADC and VSBC: these perform an add-with-carry or subtract-with-carry
 + * of the 32-bit elements in each lane of the input vectors, where the
 + * carry-out of each add is the carry-in of the next.  The initial carry
 + * input is either fixed (0 for VADCI, 1 for VSBCI) or is from FPSCR.C
 + * (for VADC and VSBC); the carry out at the end is written back to FPSCR.C.
 + * These insns are subject to beat-wise execution.  Partial execution
 + * of an I=1 (initial carry input fixed) insn which does not
 + * execute the first beat must start with the current FPSCR.NZCV
 + * value, not the fixed constant input.
 + */
 +static bool trans_VADC(DisasContext *s, arg_2op *a)
 +{
-+    qtest_memread(qts, addr, desc, sizeof(*desc));
++    return do_2op(s, a, gen_helper_mve_vadc);
 +    desc->flags = le32_to_cpu(desc->flags);
 +    desc->txbsa = le32_to_cpu(desc->txbsa);
 +    desc->status_and_length = le32_to_cpu(desc->status_and_length);
 +    desc->ntxdsa = le32_to_cpu(desc->ntxdsa);
 +}
 +
-+static void emc_write_tx_desc(QTestState *qts, const NPCM7xxEMCTxDesc *desc,
++static bool trans_VADCI(DisasContext *s, arg_2op *a)
 +                              uint32_t addr)
 +{
-+    NPCM7xxEMCTxDesc le_desc;
++    if (mve_skip_first_beat(s)) {
-+
++        return trans_VADC(s, a);
-+    le_desc.flags = cpu_to_le32(desc->flags);
++    }
-+    le_desc.txbsa = cpu_to_le32(desc->txbsa);
++    return do_2op(s, a, gen_helper_mve_vadci);
 +    le_desc.status_and_length = cpu_to_le32(desc->status_and_length);
 +    le_desc.ntxdsa = cpu_to_le32(desc->ntxdsa);
 +    qtest_memwrite(qts, addr, &le_desc, sizeof(le_desc));
 +}
 +
-+static void emc_read_rx_desc(QTestState *qts, uint32_t addr,
++static bool trans_VSBC(DisasContext *s, arg_2op *a)
 +                             NPCM7xxEMCRxDesc *desc)
 +{
-+    qtest_memread(qts, addr, desc, sizeof(*desc));
++    return do_2op(s, a, gen_helper_mve_vsbc);
 +    desc->status_and_length = le32_to_cpu(desc->status_and_length);
 +    desc->rxbsa = le32_to_cpu(desc->rxbsa);
 +    desc->reserved = le32_to_cpu(desc->reserved);
 +    desc->nrxdsa = le32_to_cpu(desc->nrxdsa);
 +}
 +
-+static void emc_write_rx_desc(QTestState *qts, const NPCM7xxEMCRxDesc *desc,
++static bool trans_VSBCI(DisasContext *s, arg_2op *a)
 +                              uint32_t addr)
 +{
-+    NPCM7xxEMCRxDesc le_desc;
++    if (mve_skip_first_beat(s)) {
-+
++        return trans_VSBC(s, a);
-+    le_desc.status_and_length = cpu_to_le32(desc->status_and_length);
++    }
-+    le_desc.rxbsa = cpu_to_le32(desc->rxbsa);
++    return do_2op(s, a, gen_helper_mve_vsbci);
 +    le_desc.reserved = cpu_to_le32(desc->reserved);
 +    le_desc.nrxdsa = cpu_to_le32(desc->nrxdsa);
 +    qtest_memwrite(qts, addr, &le_desc, sizeof(le_desc));
 +}
 +
-+/*
+ static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
-+ * Reset the EMC module.
+                           MVEGenTwoOpScalarFn fn)
-+ * The module must be reset before, e.g., TXDLSA,RXDLSA are changed.
+ {
 + */
 +static bool emc_soft_reset(QTestState *qts, const EMCModule *mod)
 +{
 +    uint32_t val;
 +    uint64_t end_time;
 +
 +    emc_write(qts, mod, REG_MCMDR, REG_MCMDR_SWR);
 +
 +    /*
 +     * Wait for device to reset as the linux driver does.
 +     * During reset the AHB reads 0 for all registers. So first wait for
 +     * something that resets to non-zero, and then wait for SWR becoming 0.
 +     */
 +    end_time = g_get_monotonic_time() + TIMEOUT_SECONDS * G_TIME_SPAN_SECOND;
 +
 +    do {
 +        qtest_clock_step(qts, 100);
 +        val = emc_read(qts, mod, REG_FFTCR);
 +    } while (val == 0 && g_get_monotonic_time() < end_time);
 +    if (val != 0) {
 +        do {
 +            qtest_clock_step(qts, 100);
 +            val = emc_read(qts, mod, REG_MCMDR);
 +            if ((val & REG_MCMDR_SWR) == 0) {
 +                /*
 +                 * N.B. The CAMs have been reset here, so macaddr matching of
 +                 * incoming packets will not work.
 +                 */
 +                return true;
 +            }
 +        } while (g_get_monotonic_time() < end_time);
 +    }
 +
 +    g_message("%s: Timeout expired", __func__);
 +    return false;
 +}
 +
 +/* Check emc registers are reset to default value. */
 +static void test_init(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +    const EMCModule *mod = td->module;
 +    QTestState *qts = qtest_init("-machine quanta-gsj");
 +    int i;
 +
 +#define CHECK_REG(regno, value) \
 +  do { \
 +    g_assert_cmphex(emc_read(qts, mod, (regno)), ==, (value)); \
 +  } while (0)
 +
 +    CHECK_REG(REG_CAMCMR, 0);
 +    CHECK_REG(REG_CAMEN, 0);
 +    CHECK_REG(REG_TXDLSA, 0xfffffffc);
 +    CHECK_REG(REG_RXDLSA, 0xfffffffc);
 +    CHECK_REG(REG_MCMDR, 0);
 +    CHECK_REG(REG_MIID, 0);
 +    CHECK_REG(REG_MIIDA, 0x00900000);
 +    CHECK_REG(REG_FFTCR, 0x0101);
 +    CHECK_REG(REG_DMARFC, 0x0800);
 +    CHECK_REG(REG_MIEN, 0);
 +    CHECK_REG(REG_MISTA, 0);
 +    CHECK_REG(REG_MGSTA, 0);
 +    CHECK_REG(REG_MPCNT, 0x7fff);
 +    CHECK_REG(REG_MRPC, 0);
 +    CHECK_REG(REG_MRPCC, 0);
 +    CHECK_REG(REG_MREPC, 0);
 +    CHECK_REG(REG_DMARFS, 0);
 +    CHECK_REG(REG_CTXDSA, 0);
 +    CHECK_REG(REG_CTXBSA, 0);
 +    CHECK_REG(REG_CRXDSA, 0);
 +    CHECK_REG(REG_CRXBSA, 0);
 +
 +#undef CHECK_REG
 +
 +    for (i = 0; i < NUM_CAMML_REGS; ++i) {
 +        g_assert_cmpuint(emc_read(qts, mod, REG_CAMM_BASE + i * 2), ==,
 +                         0);
 +        g_assert_cmpuint(emc_read(qts, mod, REG_CAML_BASE + i * 2), ==,
 +                         0);
 +    }
 +
 +    qtest_quit(qts);
 +}
 +
 +static bool emc_wait_irq(QTestState *qts, const EMCModule *mod, int step,
 +                         bool is_tx)
 +{
 +    uint64_t end_time =
 +        g_get_monotonic_time() + TIMEOUT_SECONDS * G_TIME_SPAN_SECOND;
 +
 +    do {
 +        if (qtest_get_irq(qts, is_tx ? mod->tx_irq : mod->rx_irq)) {
 +            return true;
 +        }
 +        qtest_clock_step(qts, step);
 +    } while (g_get_monotonic_time() < end_time);
 +
 +    g_message("%s: Timeout expired", __func__);
 +    return false;
 +}
 +
 +static bool emc_wait_mista(QTestState *qts, const EMCModule *mod, int step,
 +                           uint32_t flag)
 +{
 +    uint64_t end_time =
 +        g_get_monotonic_time() + TIMEOUT_SECONDS * G_TIME_SPAN_SECOND;
 +
 +    do {
 +        uint32_t mista = emc_read(qts, mod, REG_MISTA);
 +        if (mista & flag) {
 +            return true;
 +        }
 +        qtest_clock_step(qts, step);
 +    } while (g_get_monotonic_time() < end_time);
 +
 +    g_message("%s: Timeout expired", __func__);
 +    return false;
 +}
 +
 +static bool wait_socket_readable(int fd)
 +{
 +    fd_set read_fds;
 +    struct timeval tv;
 +    int rv;
 +
 +    FD_ZERO(&read_fds);
 +    FD_SET(fd, &read_fds);
 +    tv.tv_sec = TIMEOUT_SECONDS;
 +    tv.tv_usec = 0;
 +    rv = select(fd + 1, &read_fds, NULL, NULL, &tv);
 +    if (rv == -1) {
 +        perror("select");
 +    } else if (rv == 0) {
 +        g_message("%s: Timeout expired", __func__);
 +    }
 +    return rv == 1;
 +}
 +
 +/* Initialize *desc (in host endian format). */
 +static void init_tx_desc(NPCM7xxEMCTxDesc *desc, size_t count,
 +                         uint32_t desc_addr)
 +{
 +    g_assert(count >= 2);
 +    memset(&desc[0], 0, sizeof(*desc) * count);
 +    /* Leave the last one alone, owned by the cpu -> stops transmission. */
 +    for (size_t i = 0; i < count - 1; ++i) {
 +        desc[i].flags =
 +            (TX_DESC_FLAG_OWNER_MASK | /* owner = 1: emc */
 +             TX_DESC_FLAG_INTEN |
 +             0 | /* crc append = 0 */
 +             0 /* padding enable = 0 */);
 +        desc[i].status_and_length =
 +            (0 | /* collision count = 0 */
 +             0 | /* SQE = 0 */
 +             0 | /* PAU = 0 */
 +             0 | /* TXHA = 0 */
 +             0 | /* LC = 0 */
 +             0 | /* TXABT = 0 */
 +             0 | /* NCS = 0 */
 +             0 | /* EXDEF = 0 */
 +             0 | /* TXCP = 0 */
 +             0 | /* DEF = 0 */
 +             0 | /* TXINTR = 0 */
 +             0 /* length filled in later */);
 +        desc[i].ntxdsa = desc_addr + (i + 1) * sizeof(*desc);
 +    }
 +}
 +
 +static void enable_tx(QTestState *qts, const EMCModule *mod,
 +                      const NPCM7xxEMCTxDesc *desc, size_t count,
 +                      uint32_t desc_addr, uint32_t mien_flags)
 +{
 +    /* Write the descriptors to guest memory. */
 +    for (size_t i = 0; i < count; ++i) {
 +        emc_write_tx_desc(qts, desc + i, desc_addr + i * sizeof(*desc));
 +    }
 +
 +    /* Trigger sending the packet. */
 +    /* The module must be reset before changing TXDLSA. */
 +    g_assert(emc_soft_reset(qts, mod));
 +    emc_write(qts, mod, REG_TXDLSA, desc_addr);
 +    emc_write(qts, mod, REG_CTXDSA, ~0);
 +    emc_write(qts, mod, REG_MIEN, REG_MIEN_ENTXCP | mien_flags);
 +    {
 +        uint32_t mcmdr = emc_read(qts, mod, REG_MCMDR);
 +        mcmdr |= REG_MCMDR_TXON;
 +        emc_write(qts, mod, REG_MCMDR, mcmdr);
 +    }
 +
 +    /* Prod the device to send the packet. */
 +    emc_write(qts, mod, REG_TSDR, 1);
 +}
 +
 +static void emc_send_verify1(QTestState *qts, const EMCModule *mod, int fd,
 +                             bool with_irq, uint32_t desc_addr,
 +                             uint32_t next_desc_addr,
 +                             const char *test_data, int test_size)
 +{
 +    NPCM7xxEMCTxDesc result_desc;
 +    uint32_t expected_mask, expected_value, recv_len;
 +    int ret;
 +    char buffer[TX_DATA_LEN];
 +
 +    g_assert(wait_socket_readable(fd));
 +
 +    /* Read the descriptor back. */
 +    emc_read_tx_desc(qts, desc_addr, &result_desc);
 +    /* Descriptor should be owned by cpu now. */
 +    g_assert((result_desc.flags & TX_DESC_FLAG_OWNER_MASK) == 0);
 +    /* Test the status bits, ignoring the length field. */
 +    expected_mask = 0xffff << 16;
 +    expected_value = TX_DESC_STATUS_TXCP;
 +    if (with_irq) {
 +        expected_value |= TX_DESC_STATUS_TXINTR;
 +    }
 +    g_assert_cmphex((result_desc.status_and_length & expected_mask), ==,
 +                    expected_value);
 +
 +    /* Check data sent to the backend. */
 +    recv_len = ~0;
 +    ret = qemu_recv(fd, &recv_len, sizeof(recv_len), MSG_DONTWAIT);
 +    g_assert_cmpint(ret, == , sizeof(recv_len));
 +
 +    g_assert(wait_socket_readable(fd));
 +    memset(buffer, 0xff, sizeof(buffer));
 +    ret = qemu_recv(fd, buffer, test_size, MSG_DONTWAIT);
 +    g_assert_cmpmem(buffer, ret, test_data, test_size);
 +}
 +
 +static void emc_send_verify(QTestState *qts, const EMCModule *mod, int fd,
 +                            bool with_irq)
 +{
 +    NPCM7xxEMCTxDesc desc[NUM_TX_DESCRIPTORS];
 +    uint32_t desc_addr = DESC_ADDR;
 +    static const char test1_data[] = "TEST1";
 +    static const char test2_data[] = "Testing 1 2 3 ...";
 +    uint32_t data1_addr = DATA_ADDR;
 +    uint32_t data2_addr = data1_addr + sizeof(test1_data);
 +    bool got_tdu;
 +    uint32_t end_desc_addr;
 +
 +    /* Prepare test data buffer. */
 +    qtest_memwrite(qts, data1_addr, test1_data, sizeof(test1_data));
 +    qtest_memwrite(qts, data2_addr, test2_data, sizeof(test2_data));
 +
 +    init_tx_desc(&desc[0], NUM_TX_DESCRIPTORS, desc_addr);
 +    desc[0].txbsa = data1_addr;
 +    desc[0].status_and_length |= sizeof(test1_data);
 +    desc[1].txbsa = data2_addr;
 +    desc[1].status_and_length |= sizeof(test2_data);
 +
 +    enable_tx(qts, mod, &desc[0], NUM_TX_DESCRIPTORS, desc_addr,
 +              with_irq ? REG_MIEN_ENTXINTR : 0);
 +
 +    /*
 +     * It's problematic to observe the interrupt for each packet.
 +     * Instead just wait until all the packets go out.
 +     */
 +    got_tdu = false;
 +    while (!got_tdu) {
 +        if (with_irq) {
 +            g_assert_true(emc_wait_irq(qts, mod, TX_STEP_COUNT,
 +                                       /*is_tx=*/true));
 +        } else {
 +            g_assert_true(emc_wait_mista(qts, mod, TX_STEP_COUNT,
 +                                         REG_MISTA_TXINTR));
 +        }
 +        got_tdu = !!(emc_read(qts, mod, REG_MISTA) & REG_MISTA_TDU);
 +        /* If we don't have TDU yet, reset the interrupt. */
 +        if (!got_tdu) {
 +            emc_write(qts, mod, REG_MISTA,
 +                      emc_read(qts, mod, REG_MISTA) & 0xffff0000);
 +        }
 +    }
 +
 +    end_desc_addr = desc_addr + 2 * sizeof(desc[0]);
 +    g_assert_cmphex(emc_read(qts, mod, REG_CTXDSA), ==, end_desc_addr);
 +    g_assert_cmphex(emc_read(qts, mod, REG_MISTA), ==,
 +                    REG_MISTA_TXCP | REG_MISTA_TXINTR | REG_MISTA_TDU);
 +
 +    emc_send_verify1(qts, mod, fd, with_irq,
 +                     desc_addr, end_desc_addr,
 +                     test1_data, sizeof(test1_data));
 +    emc_send_verify1(qts, mod, fd, with_irq,
 +                     desc_addr + sizeof(desc[0]), end_desc_addr,
 +                     test2_data, sizeof(test2_data));
 +}
 +
 +/* Initialize *desc (in host endian format). */
 +static void init_rx_desc(NPCM7xxEMCRxDesc *desc, size_t count,
 +                         uint32_t desc_addr, uint32_t data_addr)
 +{
 +    g_assert_true(count >= 2);
 +    memset(desc, 0, sizeof(*desc) * count);
 +    desc[0].rxbsa = data_addr;
 +    desc[0].status_and_length =
 +        (0b10 << RX_DESC_STATUS_OWNER_SHIFT | /* owner = 10: emc */
 +         0 | /* RP = 0 */
 +         0 | /* ALIE = 0 */
 +         0 | /* RXGD = 0 */
 +         0 | /* PTLE = 0 */
 +         0 | /* CRCE = 0 */
 +         0 | /* RXINTR = 0 */
 +         0   /* length (filled in later) */);
 +    /* Leave the last one alone, owned by the cpu -> stops transmission. */
 +    desc[0].nrxdsa = desc_addr + sizeof(*desc);
 +}
 +
 +static void enable_rx(QTestState *qts, const EMCModule *mod,
 +                      const NPCM7xxEMCRxDesc *desc, size_t count,
 +                      uint32_t desc_addr, uint32_t mien_flags,
 +                      uint32_t mcmdr_flags)
 +{
 +    /*
 +     * Write the descriptor to guest memory.
 +     * FWIW, IWBN if the docs said the buffer needs to be at least DMARFC
 +     * bytes.
 +     */
 +    for (size_t i = 0; i < count; ++i) {
 +        emc_write_rx_desc(qts, desc + i, desc_addr + i * sizeof(*desc));
 +    }
 +
 +    /* Trigger receiving the packet. */
 +    /* The module must be reset before changing RXDLSA. */
 +    g_assert(emc_soft_reset(qts, mod));
 +    emc_write(qts, mod, REG_RXDLSA, desc_addr);
 +    emc_write(qts, mod, REG_MIEN, REG_MIEN_ENRXGD | mien_flags);
 +
 +    /*
 +     * We don't know what the device's macaddr is, so just accept all
 +     * unicast packets (AUP).
 +     */
 +    emc_write(qts, mod, REG_CAMCMR, REG_CAMCMR_AUP);
 +    emc_write(qts, mod, REG_CAMEN, 1 << 0);
 +    {
 +        uint32_t mcmdr = emc_read(qts, mod, REG_MCMDR);
 +        mcmdr |= REG_MCMDR_RXON | mcmdr_flags;
 +        emc_write(qts, mod, REG_MCMDR, mcmdr);
 +    }
 +
 +    /* Prod the device to accept a packet. */
 +    emc_write(qts, mod, REG_RSDR, 1);
 +}
 +
 +static void emc_recv_verify(QTestState *qts, const EMCModule *mod, int fd,
 +                            bool with_irq)
 +{
 +    NPCM7xxEMCRxDesc desc[NUM_RX_DESCRIPTORS];
 +    uint32_t desc_addr = DESC_ADDR;
 +    uint32_t data_addr = DATA_ADDR;
 +    int ret;
 +    uint32_t expected_mask, expected_value;
 +    NPCM7xxEMCRxDesc result_desc;
 +
 +    /* Prepare test data buffer. */
 +    const char test[RX_DATA_LEN] = "TEST";
 +    int len = htonl(sizeof(test));
 +    const struct iovec iov[] = {
 +        {
 +            .iov_base = &len,
 +            .iov_len = sizeof(len),
 +        },{
 +            .iov_base = (char *) test,
 +            .iov_len = sizeof(test),
 +        },
 +    };
 +
 +    /*
 +     * Reset the device BEFORE sending a test packet, otherwise the packet
 +     * may get swallowed by an active device of an earlier test.
 +     */
 +    init_rx_desc(&desc[0], NUM_RX_DESCRIPTORS, desc_addr, data_addr);
 +    enable_rx(qts, mod, &desc[0], NUM_RX_DESCRIPTORS, desc_addr,
 +              with_irq ? REG_MIEN_ENRXINTR : 0, 0);
 +
 +    /* Send test packet to device's socket. */
 +    ret = iov_send(fd, iov, 2, 0, sizeof(len) + sizeof(test));
 +    g_assert_cmpint(ret, == , sizeof(test) + sizeof(len));
 +
 +    /* Wait for RX interrupt. */
 +    if (with_irq) {
 +        g_assert_true(emc_wait_irq(qts, mod, RX_STEP_COUNT, /*is_tx=*/false));
 +    } else {
 +        g_assert_true(emc_wait_mista(qts, mod, RX_STEP_COUNT, REG_MISTA_RXGD));
 +    }
 +
 +    g_assert_cmphex(emc_read(qts, mod, REG_CRXDSA), ==,
 +                    desc_addr + sizeof(desc[0]));
 +
 +    expected_mask = 0xffff;
 +    expected_value = (REG_MISTA_DENI |
 +                      REG_MISTA_RXGD |
 +                      REG_MISTA_RXINTR);
 +    g_assert_cmphex((emc_read(qts, mod, REG_MISTA) & expected_mask),
 +                    ==, expected_value);
 +
 +    /* Read the descriptor back. */
 +    emc_read_rx_desc(qts, desc_addr, &result_desc);
 +    /* Descriptor should be owned by cpu now. */
 +    g_assert((result_desc.status_and_length & RX_DESC_STATUS_OWNER_MASK) == 0);
 +    /* Test the status bits, ignoring the length field. */
 +    expected_mask = 0xffff << 16;
 +    expected_value = RX_DESC_STATUS_RXGD;
 +    if (with_irq) {
 +        expected_value |= RX_DESC_STATUS_RXINTR;
 +    }
 +    g_assert_cmphex((result_desc.status_and_length & expected_mask), ==,
 +                    expected_value);
 +    g_assert_cmpint(RX_DESC_PKT_LEN(result_desc.status_and_length), ==,
 +                    RX_DATA_LEN + CRC_LENGTH);
 +
 +    {
 +        char buffer[RX_DATA_LEN];
 +        qtest_memread(qts, data_addr, buffer, sizeof(buffer));
 +        g_assert_cmpstr(buffer, == , "TEST");
 +    }
 +}
 +
 +static void emc_test_ptle(QTestState *qts, const EMCModule *mod, int fd)
 +{
 +    NPCM7xxEMCRxDesc desc[NUM_RX_DESCRIPTORS];
 +    uint32_t desc_addr = DESC_ADDR;
 +    uint32_t data_addr = DATA_ADDR;
 +    int ret;
 +    NPCM7xxEMCRxDesc result_desc;
 +    uint32_t expected_mask, expected_value;
 +
 +    /* Prepare test data buffer. */
 +#define PTLE_DATA_LEN 1600
 +    char test_data[PTLE_DATA_LEN];
 +    int len = htonl(sizeof(test_data));
 +    const struct iovec iov[] = {
 +        {
 +            .iov_base = &len,
 +            .iov_len = sizeof(len),
 +        },{
 +            .iov_base = (char *) test_data,
 +            .iov_len = sizeof(test_data),
 +        },
 +    };
 +    memset(test_data, 42, sizeof(test_data));
 +
 +    /*
 +     * Reset the device BEFORE sending a test packet, otherwise the packet
 +     * may get swallowed by an active device of an earlier test.
 +     */
 +    init_rx_desc(&desc[0], NUM_RX_DESCRIPTORS, desc_addr, data_addr);
 +    enable_rx(qts, mod, &desc[0], NUM_RX_DESCRIPTORS, desc_addr,
 +              REG_MIEN_ENRXINTR, REG_MCMDR_ALP);
 +
 +    /* Send test packet to device's socket. */
 +    ret = iov_send(fd, iov, 2, 0, sizeof(len) + sizeof(test_data));
 +    g_assert_cmpint(ret, == , sizeof(test_data) + sizeof(len));
 +
 +    /* Wait for RX interrupt. */
 +    g_assert_true(emc_wait_irq(qts, mod, RX_STEP_COUNT, /*is_tx=*/false));
 +
 +    /* Read the descriptor back. */
 +    emc_read_rx_desc(qts, desc_addr, &result_desc);
 +    /* Descriptor should be owned by cpu now. */
 +    g_assert((result_desc.status_and_length & RX_DESC_STATUS_OWNER_MASK) == 0);
 +    /* Test the status bits, ignoring the length field. */
 +    expected_mask = 0xffff << 16;
 +    expected_value = (RX_DESC_STATUS_RXGD |
 +                      RX_DESC_STATUS_PTLE |
 +                      RX_DESC_STATUS_RXINTR);
 +    g_assert_cmphex((result_desc.status_and_length & expected_mask), ==,
 +                    expected_value);
 +    g_assert_cmpint(RX_DESC_PKT_LEN(result_desc.status_and_length), ==,
 +                    PTLE_DATA_LEN + CRC_LENGTH);
 +
 +    {
 +        char buffer[PTLE_DATA_LEN];
 +        qtest_memread(qts, data_addr, buffer, sizeof(buffer));
 +        g_assert(memcmp(buffer, test_data, PTLE_DATA_LEN) == 0);
 +    }
 +}
 +
 +static void test_tx(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +    GString *cmd_line = g_string_new("-machine quanta-gsj");
 +    int *test_sockets = packet_test_init(emc_module_index(td->module),
 +                                         cmd_line);
 +    QTestState *qts = qtest_init(cmd_line->str);
 +
 +    /*
 +     * TODO: For pedantic correctness test_sockets[0] should be closed after
 +     * the fork and before the exec, but that will require some harness
 +     * improvements.
 +     */
 +    close(test_sockets[1]);
 +    /* Defensive programming */
 +    test_sockets[1] = -1;
 +
 +    qtest_irq_intercept_in(qts, "/machine/soc/a9mpcore/gic");
 +
 +    emc_send_verify(qts, td->module, test_sockets[0], /*with_irq=*/false);
 +    emc_send_verify(qts, td->module, test_sockets[0], /*with_irq=*/true);
 +
 +    qtest_quit(qts);
 +}
 +
 +static void test_rx(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +    GString *cmd_line = g_string_new("-machine quanta-gsj");
 +    int *test_sockets = packet_test_init(emc_module_index(td->module),
 +                                         cmd_line);
 +    QTestState *qts = qtest_init(cmd_line->str);
 +
 +    /*
 +     * TODO: For pedantic correctness test_sockets[0] should be closed after
 +     * the fork and before the exec, but that will require some harness
 +     * improvements.
 +     */
 +    close(test_sockets[1]);
 +    /* Defensive programming */
 +    test_sockets[1] = -1;
 +
 +    qtest_irq_intercept_in(qts, "/machine/soc/a9mpcore/gic");
 +
 +    emc_recv_verify(qts, td->module, test_sockets[0], /*with_irq=*/false);
 +    emc_recv_verify(qts, td->module, test_sockets[0], /*with_irq=*/true);
 +    emc_test_ptle(qts, td->module, test_sockets[0]);
 +
 +    qtest_quit(qts);
 +}
 +
 +static void emc_add_test(const char *name, const TestData* td,
 +                         GTestDataFunc fn)
 +{
 +    g_autofree char *full_name = g_strdup_printf(
 +            "npcm7xx_emc/emc[%d]/%s", emc_module_index(td->module), name);
 +    qtest_add_data_func(full_name, td, fn);
 +}
 +#define add_test(name, td) emc_add_test(#name, td, test_##name)
 +
 +int main(int argc, char **argv)
 +{
 +    TestData test_data_list[ARRAY_SIZE(emc_module_list)];
 +
 +    g_test_init(&argc, &argv, NULL);
 +
 +    for (int i = 0; i < ARRAY_SIZE(emc_module_list); ++i) {
 +        TestData *td = &test_data_list[i];
 +
 +        td->module = &emc_module_list[i];
 +
 +        add_test(init, td);
 +        add_test(tx, td);
 +        add_test(rx, td);
 +    }
 +
 +    return g_test_run();
 +}
 diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qtest/meson.build
 +++ b/tests/qtest/meson.build
@@ -XXX,XX +XXX,XX @@ qtests_npcm7xx = \
     'npcm7xx_rng-test',
     'npcm7xx_smbus-test',
     'npcm7xx_timer-test',
 -   'npcm7xx_watchdog_timer-test']
 +   'npcm7xx_watchdog_timer-test'] + \
 +   (slirp.found() ? ['npcm7xx_emc-test'] : [])
  qtests_arm = \
    (config_all_devices.has_key('CONFIG_CMSDK_APB_DUALTIMER') ? ['cmsdk-apb-dualtimer-test'] : []) + \
    (config_all_devices.has_key('CONFIG_CMSDK_APB_TIMER') ? ['cmsdk-apb-timer-test'] : []) + \
 --
 .20.1

-[PULL 07/49] hw/arm: Add npcm7xx emc model
+[PULL 52/57] target/arm: Implement MVE VCADD
-From: Doug Evans <dje@google.com>
+Implement the MVE VCADD insn, which performs a complex add with
 rotate.  Note that the size=0b11 encoding is VSBC.
-This is a 10/100 ethernet device that has several features.
+The architecture grants some leeway for the "destination and Vm
-Only the ones needed by the Linux driver have been implemented.
+source overlap" case for the size MO_32 case, but we choose not to
-See npcm7xx_emc.c for a list of unimplemented features.
+make use of it, instead always calculating all 16 bytes worth of
 results before setting the destination register.
-Reviewed-by: Hao Wu <wuhaotsh@google.com>
-Reviewed-by: Avi Fishman <avi.fishman@nuvoton.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Doug Evans <dje@google.com>
-Message-id: 20210218212453.831406-3-dje@google.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210617121628.20116-42-peter.maydell@linaro.org
 ---
- docs/system/arm/nuvoton.rst |  3 ++-
+ target/arm/helper-mve.h    |  8 ++++++++
- include/hw/arm/npcm7xx.h    |  2 ++
+ target/arm/mve.decode      |  9 +++++++--
- hw/arm/npcm7xx.c            | 50 +++++++++++++++++++++++++++++++++++--
+ target/arm/mve_helper.c    | 29 +++++++++++++++++++++++++++++
-files changed, 52 insertions(+), 3 deletions(-)
+ target/arm/translate-mve.c |  7 +++++++
 files changed, 51 insertions(+), 2 deletions(-)
-diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/docs/system/arm/nuvoton.rst
+--- a/target/arm/helper-mve.h
-+++ b/docs/system/arm/nuvoton.rst
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ Supported devices
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vadci, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-  * Analog to Digital Converter (ADC)
+ DEF_HELPER_FLAGS_4(mve_vsbc, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-  * Pulse Width Modulation (PWM)
+ DEF_HELPER_FLAGS_4(mve_vsbci, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-  * SMBus controller (SMBF)
-+ * Ethernet controller (EMC)
++DEF_HELPER_FLAGS_4(mve_vcadd90b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++DEF_HELPER_FLAGS_4(mve_vcadd90h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- Missing devices
++DEF_HELPER_FLAGS_4(mve_vcadd90w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- ---------------
++
-@@ -XXX,XX +XXX,XX @@ Missing devices
++DEF_HELPER_FLAGS_4(mve_vcadd270b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-    * Shared memory (SHM)
++DEF_HELPER_FLAGS_4(mve_vcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-    * eSPI slave interface
++DEF_HELPER_FLAGS_4(mve_vcadd270w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++
-- * Ethernet controllers (GMAC and EMC)
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+ * Ethernet controller (GMAC)
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-  * USB device (USBD)
+ DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-  * Peripheral SPI controller (PSPI)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
   * SD/MMC host
 diff --git a/include/hw/arm/npcm7xx.h b/include/hw/arm/npcm7xx.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/npcm7xx.h
+--- a/target/arm/mve.decode
-+++ b/include/hw/arm/npcm7xx.h
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
- #include "hw/misc/npcm7xx_gcr.h"
+ VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
- #include "hw/misc/npcm7xx_pwm.h"
- #include "hw/misc/npcm7xx_rng.h"
+ VADC             1110 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
-+#include "hw/net/npcm7xx_emc.h"
+-VSBC             1111 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
- #include "hw/nvram/npcm7xx_otp.h"
+ VADCI            1110 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
- #include "hw/timer/npcm7xx_timer.h"
+-VSBCI            1111 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
- #include "hw/ssi/npcm7xx_fiu.h"
++
-@@ -XXX,XX +XXX,XX @@ typedef struct NPCM7xxState {
++{
-     EHCISysBusState     ehci;
++  VSBC           1111 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
-     OHCISysBusState     ohci;
++  VSBCI          1111 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
-     NPCM7xxFIUState     fiu[2];
++  VCADD90        1111 1110 0 . .. ... 0 ... 0 1111 . 0 . 0 ... 0 @2op
-+    NPCM7xxEMCState     emc[2];
++  VCADD270       1111 1110 0 . .. ... 0 ... 1 1111 . 0 . 0 ... 0 @2op
- } NPCM7xxState;
++}
- #define TYPE_NPCM7XX    "npcm7xx"
+ # Vector miscellaneous
-diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/npcm7xx.c
+--- a/target/arm/mve_helper.c
-+++ b/hw/arm/npcm7xx.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ enum NPCM7xxInterrupt {
+@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vsbci)(CPUARMState *env, void *vd, void *vn, void *vm)
-     NPCM7XX_UART1_IRQ,
+     do_vadc(env, vd, vn, vm, -1, 1, true);
      NPCM7XX_UART2_IRQ,
      NPCM7XX_UART3_IRQ,
 +    NPCM7XX_EMC1RX_IRQ          = 15,
 +    NPCM7XX_EMC1TX_IRQ,
      NPCM7XX_TIMER0_IRQ          = 32,   /* Timer Module 0 */
      NPCM7XX_TIMER1_IRQ,
      NPCM7XX_TIMER2_IRQ,
@@ -XXX,XX +XXX,XX @@ enum NPCM7xxInterrupt {
      NPCM7XX_SMBUS15_IRQ,
      NPCM7XX_PWM0_IRQ            = 93,   /* PWM module 0 */
      NPCM7XX_PWM1_IRQ,                   /* PWM module 1 */
 +    NPCM7XX_EMC2RX_IRQ          = 114,
 +    NPCM7XX_EMC2TX_IRQ,
      NPCM7XX_GPIO0_IRQ           = 116,
      NPCM7XX_GPIO1_IRQ,
      NPCM7XX_GPIO2_IRQ,
@@ -XXX,XX +XXX,XX @@ static const hwaddr npcm7xx_smbus_addr[] = {
 xf008f000,
  };
 +/* Register base address for each EMC Module */
 +static const hwaddr npcm7xx_emc_addr[] = {
 +    0xf0825000,
 +    0xf0826000,
 +};
 +
  static const struct {
      hwaddr regs_addr;
      uint32_t unconnected_pins;
@@ -XXX,XX +XXX,XX @@ static void npcm7xx_init(Object *obj)
      for (i = 0; i < ARRAY_SIZE(s->pwm); i++) {
          object_initialize_child(obj, "pwm[*]", &s->pwm[i], TYPE_NPCM7XX_PWM);
      }
 +
 +    for (i = 0; i < ARRAY_SIZE(s->emc); i++) {
 +        object_initialize_child(obj, "emc[*]", &s->emc[i], TYPE_NPCM7XX_EMC);
 +    }
  }
- static void npcm7xx_realize(DeviceState *dev, Error **errp)
++#define DO_VCADD(OP, ESIZE, TYPE, FN0, FN1)                             \
-@@ -XXX,XX +XXX,XX @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \
-         sysbus_connect_irq(sbd, i, npcm7xx_irq(s, NPCM7XX_PWM0_IRQ + i));
++    {                                                                   \
-     }
++        TYPE *d = vd, *n = vn, *m = vm;                                 \
++        uint16_t mask = mve_element_mask(env);                          \
-+    /*
++        unsigned e;                                                     \
-+     * EMC Modules. Cannot fail.
++        TYPE r[16 / ESIZE];                                             \
-+     * The mapping of the device to its netdev backend works as follows:
++        /* Calculate all results first to avoid overwriting inputs */   \
-+     * emc[i] = nd_table[i]
++        for (e = 0; e < 16 / ESIZE; e++) {                              \
-+     * This works around the inability to specify the netdev property for the
++            if (!(e & 1)) {                                             \
-+     * emc device: it's not pluggable and thus the -device option can't be
++                r[e] = FN0(n[H##ESIZE(e)], m[H##ESIZE(e + 1)]);         \
-+     * used.
++            } else {                                                    \
-+     */
++                r[e] = FN1(n[H##ESIZE(e)], m[H##ESIZE(e - 1)]);         \
-+    QEMU_BUILD_BUG_ON(ARRAY_SIZE(npcm7xx_emc_addr) != ARRAY_SIZE(s->emc));
++            }                                                           \
-+    QEMU_BUILD_BUG_ON(ARRAY_SIZE(s->emc) != 2);
++        }                                                               \
-+    for (i = 0; i < ARRAY_SIZE(s->emc); i++) {
++        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
-+        s->emc[i].emc_num = i;
++            mergemask(&d[H##ESIZE(e)], r[e], mask);                     \
-+        SysBusDevice *sbd = SYS_BUS_DEVICE(&s->emc[i]);
++        }                                                               \
-+        if (nd_table[i].used) {
++        mve_advance_vpt(env);                                           \
 +            qemu_check_nic_model(&nd_table[i], TYPE_NPCM7XX_EMC);
 +            qdev_set_nic_properties(DEVICE(sbd), &nd_table[i]);
 +        }
 +        /*
 +         * The device exists regardless of whether it's connected to a QEMU
 +         * netdev backend. So always instantiate it even if there is no
 +         * backend.
 +         */
 +        sysbus_realize(sbd, &error_abort);
 +        sysbus_mmio_map(sbd, 0, npcm7xx_emc_addr[i]);
 +        int tx_irq = i == 0 ? NPCM7XX_EMC1TX_IRQ : NPCM7XX_EMC2TX_IRQ;
 +        int rx_irq = i == 0 ? NPCM7XX_EMC1RX_IRQ : NPCM7XX_EMC2RX_IRQ;
 +        /*
 +         * N.B. The values for the second argument sysbus_connect_irq are
 +         * chosen to match the registration order in npcm7xx_emc_realize.
 +         */
 +        sysbus_connect_irq(sbd, 0, npcm7xx_irq(s, tx_irq));
 +        sysbus_connect_irq(sbd, 1, npcm7xx_irq(s, rx_irq));
 +    }
 +
-     /*
++#define DO_VCADD_ALL(OP, FN0, FN1)              \
-      * Flash Interface Unit (FIU). Can fail if incorrect number of chip selects
++    DO_VCADD(OP##b, 1, int8_t, FN0, FN1)        \
-      * specified, but this is a programming error.
++    DO_VCADD(OP##h, 2, int16_t, FN0, FN1)       \
-@@ -XXX,XX +XXX,XX @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
++    DO_VCADD(OP##w, 4, int32_t, FN0, FN1)
-     create_unimplemented_device("npcm7xx.vcd",          0xf0810000,  64 * KiB);
++
-     create_unimplemented_device("npcm7xx.ece",          0xf0820000,   8 * KiB);
++DO_VCADD_ALL(vcadd90, DO_SUB, DO_ADD)
-     create_unimplemented_device("npcm7xx.vdma",         0xf0822000,   8 * KiB);
++DO_VCADD_ALL(vcadd270, DO_ADD, DO_SUB)
--    create_unimplemented_device("npcm7xx.emc1",         0xf0825000,   4 * KiB);
++
--    create_unimplemented_device("npcm7xx.emc2",         0xf0826000,   4 * KiB);
+ static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
-     create_unimplemented_device("npcm7xx.usbd[0]",      0xf0830000,   4 * KiB);
+ {
-     create_unimplemented_device("npcm7xx.usbd[1]",      0xf0831000,   4 * KiB);
+     if (val > max) {
-     create_unimplemented_device("npcm7xx.usbd[2]",      0xf0832000,   4 * KiB);
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQRDMLSDH, vqrdmlsdh)
  DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
  DO_2OP(VRHADD_S, vrhadds)
  DO_2OP(VRHADD_U, vrhaddu)
 +/*
 + * VCADD Qd == Qm at size MO_32 is UNPREDICTABLE; we choose not to diagnose
 + * so we can reuse the DO_2OP macro. (Our implementation calculates the
 + * "expected" results in this case.)
 + */
 +DO_2OP(VCADD90, vcadd90)
 +DO_2OP(VCADD270, vcadd270)
  static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
  {
 --
 .20.1

-[PULL 02/49] sbsa-ref: add 'max' to list of allowed cpus
+[PULL 53/57] target/arm: Implement MVE VHCADD
-From: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
+Implement the MVE VHCADD insn, which is similar to VCADD
 but performs a halving step. This one overlaps with VADC.
-Let add 'max' cpu while work goes on adding newer CPU types than
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Cortex-A72. This allows us to check SVE etc support.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210617121628.20116-43-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    | 8 ++++++++
  target/arm/mve.decode      | 8 ++++++--
  target/arm/mve_helper.c    | 2 ++
  target/arm/translate-mve.c | 4 +++-
 files changed, 19 insertions(+), 3 deletions(-)
-Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Acked-by: Leif Lindholm <leif@nuviainc.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20210216150122.3830863-3-marcin.juszkiewicz@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/sbsa-ref.c | 1 +
 file changed, 1 insertion(+)
 diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/sbsa-ref.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/sbsa-ref.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vcadd270b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- static const char * const valid_cpus[] = {
+ DEF_HELPER_FLAGS_4(mve_vcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     ARM_CPU_TYPE_NAME("cortex-a57"),
+ DEF_HELPER_FLAGS_4(mve_vcadd270w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     ARM_CPU_TYPE_NAME("cortex-a72"),
-+    ARM_CPU_TYPE_NAME("max"),
++DEF_HELPER_FLAGS_4(mve_vhcadd90b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- };
++DEF_HELPER_FLAGS_4(mve_vhcadd90h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
++DEF_HELPER_FLAGS_4(mve_vhcadd90w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- static bool cpu_type_valid(const char *cpu)
++
 +DEF_HELPER_FLAGS_4(mve_vhcadd270b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vhcadd270w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
  VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
  VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 -VADC             1110 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
 -VADCI            1110 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
 +{
 +  VADC           1110 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
 +  VADCI          1110 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 0 @2op_nosz
 +  VHCADD90       1110 1110 0 . .. ... 0 ... 0 1111 . 0 . 0 ... 0 @2op
 +  VHCADD270      1110 1110 0 . .. ... 0 ... 1 1111 . 0 . 0 ... 0 @2op
 +}
  {
    VSBC           1111 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 0 @2op_nosz
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vsbci)(CPUARMState *env, void *vd, void *vn, void *vm)
  DO_VCADD_ALL(vcadd90, DO_SUB, DO_ADD)
  DO_VCADD_ALL(vcadd270, DO_ADD, DO_SUB)
 +DO_VCADD_ALL(vhcadd90, do_vhsub_s, do_vhadd_s)
 +DO_VCADD_ALL(vhcadd270, do_vhadd_s, do_vhsub_s)
  static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
  {
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VRHADD_U, vrhaddu)
  /*
   * VCADD Qd == Qm at size MO_32 is UNPREDICTABLE; we choose not to diagnose
   * so we can reuse the DO_2OP macro. (Our implementation calculates the
 - * "expected" results in this case.)
 + * "expected" results in this case.) Similarly for VHCADD.
   */
  DO_2OP(VCADD90, vcadd90)
  DO_2OP(VCADD270, vcadd270)
 +DO_2OP(VHCADD90, vhcadd90)
 +DO_2OP(VHCADD270, vhcadd270)
  static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
  {
 --
 .20.1

-[PULL 36/49] hw/arm/mps2-tz: Correct wrong interrupt numbers for DMA and SPI
+[PULL 54/57] target/arm: Implement MVE VADDV
-On the MPS2 boards, the first 32 interrupt lines are entirely
+Implement the MVE VADDV insn, which performs an addition
-internal to the SSE; interrupt lines for devices outside the SSE
+across vector lanes.
 start at 32.  In the application notes that document each FPGA image,
 the interrupt wiring is documented from the point of view of the CPU,
 so '0' is the first of the SSE's interrupts and the devices in the
 FPGA image itself are '32' and up: so the UART 0 Receive interrupt is
 , the SPI #0 interrupt is 51, and so on.
 Within our implementation, because the external interrupts must be
 connected to the EXP_IRQ[0...n] lines of the SSE object, we made the
 get_sse_irq_in() function take an irqno whose values start at 0 for
 the first FPGA device interrupt.  In this numbering scheme the UART 0
 Receive interrupt is 0, the SPI #0 interrupt is 19, and so on.
 The result of these two different numbering schemes has been that
 half of the devices were wired up to the wrong IRQs: the UART IRQs
 are wired up correctly, but the DMA and SPI devices were passing
 start-at-32 values to get_sse_irq_in() and so being mis-connected.
 Fix the bug by making get_sse_irq_in() take values specified with the
 same scheme that the hardware manuals use, to avoid confusion.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215115138.20465-12-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-44-peter.maydell@linaro.org
 ---
- hw/arm/mps2-tz.c | 24 +++++++++++++++++-------
+ target/arm/helper-mve.h    |  7 +++++++
-file changed, 17 insertions(+), 7 deletions(-)
+ target/arm/mve.decode      |  2 ++
  target/arm/mve_helper.c    | 24 +++++++++++++++++++++
  target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++++++
 files changed, 76 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/mps2-tz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void make_ram_alias(MemoryRegion *mr, const char *name,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrmlaldavhuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- static qemu_irq get_sse_irq_in(MPS2TZMachineState *mms, int irqno)
+ DEF_HELPER_FLAGS_4(mve_vrmlsldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
- {
+ DEF_HELPER_FLAGS_4(mve_vrmlsldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
--    /* Return a qemu_irq which will signal IRQ n to all CPUs in the SSE. */
++
-+    /*
++DEF_HELPER_FLAGS_3(mve_vaddvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
-+     * Return a qemu_irq which will signal IRQ n to all CPUs in the
++DEF_HELPER_FLAGS_3(mve_vaddvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
-+     * SSE.  The irqno should be as the CPU sees it, so the first
++DEF_HELPER_FLAGS_3(mve_vaddvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
-+     * external-to-the-SSE interrupt is 32.
++DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
-+     */
++DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
-     MachineClass *mc = MACHINE_GET_CLASS(mms);
++DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
-     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+index XXXXXXX..XXXXXXX 100644
--    assert(irqno < mmc->numirq);
+--- a/target/arm/mve.decode
-+    assert(irqno >= 32 && irqno < (mmc->numirq + 32));
++++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
  VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 +# Vector add across vector
 +VADDV            111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
  # Predicate operations
  %mask_22_13      22:1 13:3
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64
  DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
  DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
 +
 +/* Vector add across vector */
 +#define DO_VADDV(OP, ESIZE, TYPE)                               \
 +    uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
 +                                    uint32_t ra)                \
 +    {                                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        TYPE *m = vm;                                           \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            if (mask & 1) {                                     \
 +                ra += m[H##ESIZE(e)];                           \
 +            }                                                   \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +        return ra;                                              \
 +    }                                                           \
 +
 +DO_VADDV(vaddvsb, 1, uint8_t)
 +DO_VADDV(vaddvsh, 2, uint16_t)
 +DO_VADDV(vaddvsw, 4, uint32_t)
 +DO_VADDV(vaddvub, 1, uint8_t)
 +DO_VADDV(vaddvuh, 2, uint16_t)
 +DO_VADDV(vaddvuw, 4, uint32_t)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
 +typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
      mve_update_and_store_eci(s);
      return true;
  }
 +
 +static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
 +{
 +    /* VADDV: vector add across vector */
 +    static MVEGenVADDVFn * const fns[4][2] = {
 +        { gen_helper_mve_vaddvsb, gen_helper_mve_vaddvub },
 +        { gen_helper_mve_vaddvsh, gen_helper_mve_vaddvuh },
 +        { gen_helper_mve_vaddvsw, gen_helper_mve_vaddvuw },
 +        { NULL, NULL }
 +    };
 +    TCGv_ptr qm;
 +    TCGv_i32 rda;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        a->size == 3) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    /*
-+     * Convert from "CPU irq number" (as listed in the FPGA image
++     * This insn is subject to beat-wise execution. Partial execution
-+     * documentation) to the SSE external-interrupt number.
++     * of an A=0 (no-accumulate) insn which does not execute the first
 +     * beat must start with the current value of Rda, not zero.
 +     */
-+    irqno -= 32;
++    if (a->a || mve_skip_first_beat(s)) {
++        /* Accumulate input from Rda */
-     if (mc->max_cpus > 1) {
++        rda = load_reg(s, a->rda);
-         return qdev_get_gpio_in(DEVICE(&mms->cpu_irq_splitter[irqno]), 0);
++    } else {
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
++        /* Accumulate starting at zero */
-     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
++        rda = tcg_const_i32(0);
-     CMSDKAPBUART *uart = opaque;
++    }
-     int i = uart - &mms->uart[0];
++
--    int rxirqno = i * 2;
++    qm = mve_qreg_ptr(a->qm);
--    int txirqno = i * 2 + 1;
++    fns[a->size][a->u](rda, cpu_env, qm, rda);
--    int combirqno = i + 10;
++    store_reg(s, a->rda, rda);
-+    int rxirqno = i * 2 + 32;
++    tcg_temp_free_ptr(qm);
-+    int txirqno = i * 2 + 33;
++
-+    int combirqno = i + 42;
++    mve_update_eci(s);
-     SysBusDevice *s;
++    return true;
-     DeviceState *orgate_dev = DEVICE(&mms->uart_irq_orgate);
++}
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
      s = SYS_BUS_DEVICE(mms->lan9118);
      sysbus_realize_and_unref(s, &error_fatal);
 -    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, 16));
 +    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, 48));
      return sysbus_mmio_get_region(s, 0);
  }
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
                              &error_fatal);
      qdev_realize(DEVICE(&mms->uart_irq_orgate), NULL, &error_fatal);
      qdev_connect_gpio_out(DEVICE(&mms->uart_irq_orgate), 0,
 -                          get_sse_irq_in(mms, 15));
 +                          get_sse_irq_in(mms, 47));
      /* Most of the devices in the FPGA are behind Peripheral Protection
       * Controllers. The required order for initializing things is:
 --
 .20.1

-[PULL 16/49] hw/arm/musicpal: Remove dead code for non-32-bit-RGB surfaces
+[PULL 55/57] target/arm: Make VMOV scalar <-> gpreg beatwise for MVE
-For a long time now the UI layer has guaranteed that the console
+In a CPU with MVE, the VMOV (vector lane to general-purpose register)
-surface is always 32 bits per pixel RGB. Remove the legacy dead
+and VMOV (general-purpose register to vector lane) insns are not
-code from the milkymist display device which was handling the
+predicated, but they are subject to beatwise execution if they
-possibility that the console surface was some other format.
+are not in an IT block.
 Since our implementation always executes all 4 beats in one tick,
 this means only that we need to handle PSR.ECI:
  * we must do the usual check for bad ECI state
  * we must advance ECI state if the insn succeeds
  * if ECI says we should not be executing the beat corresponding
    to the lane of the vector register being accessed then we
    should skip performing the move
 Note that if PSR.ECI is non-zero then we cannot be in an IT block.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210215103215.4944-2-peter.maydell@linaro.org
+Message-id: 20210617121628.20116-45-peter.maydell@linaro.org
 ---
- hw/arm/musicpal.c | 64 ++++++++++++++++++-----------------------------
+ target/arm/translate-a32.h |  2 +
-file changed, 24 insertions(+), 40 deletions(-)
+ target/arm/translate-mve.c |  4 +-
  target/arm/translate-vfp.c | 77 +++++++++++++++++++++++++++++++++++---
 files changed, 75 insertions(+), 8 deletions(-)
-diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
+diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/musicpal.c
+--- a/target/arm/translate-a32.h
-+++ b/hw/arm/musicpal.c
++++ b/target/arm/translate-a32.h
-@@ -XXX,XX +XXX,XX @@ static uint8_t scale_lcd_color(musicpal_lcd_state *s, uint8_t col)
+@@ -XXX,XX +XXX,XX @@ long neon_full_reg_offset(unsigned reg);
  long neon_element_offset(int reg, int element, MemOp memop);
  void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
  void clear_eci_state(DisasContext *s);
 +bool mve_eci_check(DisasContext *s);
 +void mve_update_and_store_eci(DisasContext *s);
  static inline TCGv_i32 load_cpu_offset(int offset)
  {
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool mve_check_qreg_bank(DisasContext *s, int qmask)
      return qmask < 8;
  }
 -static bool mve_eci_check(DisasContext *s)
 +bool mve_eci_check(DisasContext *s)
  {
      /*
       * This is a beatwise insn: check that ECI is valid (not a
@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
      }
  }
--#define SET_LCD_PIXEL(depth, type) \
+-static void mve_update_and_store_eci(DisasContext *s)
--static inline void glue(set_lcd_pixel, depth) \
++void mve_update_and_store_eci(DisasContext *s)
--        (musicpal_lcd_state *s, int x, int y, type col) \
+ {
--{ \
+     /*
--    int dx, dy; \
+      * For insns which don't call a helper function that will call
--    DisplaySurface *surface = qemu_console_surface(s->con); \
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
--    type *pixel = &((type *) surface_data(surface))[(y * 128 * 3 + x) * 3]; \
+index XXXXXXX..XXXXXXX 100644
--\
+--- a/target/arm/translate-vfp.c
--    for (dy = 0; dy < 3; dy++, pixel += 127 * 3) \
++++ b/target/arm/translate-vfp.c
--        for (dx = 0; dx < 3; dx++, pixel++) \
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
--            *pixel = col; \
+     return true;
-+static inline void set_lcd_pixel32(musicpal_lcd_state *s,
+ }
-+                                   int x, int y, uint32_t col)
 +static bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
 +{
-+    int dx, dy;
++    /*
-+    DisplaySurface *surface = qemu_console_surface(s->con);
++     * In a CPU with MVE, the VMOV (vector lane to general-purpose register)
-+    uint32_t *pixel =
++     * and VMOV (general-purpose register to vector lane) insns are not
-+        &((uint32_t *) surface_data(surface))[(y * 128 * 3 + x) * 3];
++     * predicated, but they are subject to beatwise execution if they are
 +     * not in an IT block.
 +     *
 +     * Since our implementation always executes all 4 beats in one tick,
 +     * this means only that if PSR.ECI says we should not be executing
 +     * the beat corresponding to the lane of the vector register being
 +     * accessed then we should skip performing the move, and that we need
 +     * to do the usual check for bad ECI state and advance of ECI state.
 +     *
 +     * Note that if PSR.ECI is non-zero then we cannot be in an IT block.
 +     *
 +     * Return true if this VMOV scalar <-> gpreg should be skipped because
 +     * the MVE PSR.ECI state says we skip the beat where the store happens.
 +     */
 +
-+    for (dy = 0; dy < 3; dy++, pixel += 127 * 3) {
++    /* Calculate the byte offset into Qn which we're going to access */
-+        for (dx = 0; dx < 3; dx++, pixel++) {
++    int ofs = (index << size) + ((vn & 1) * 8);
-+            *pixel = col;
++
 +    if (!dc_isar_feature(aa32_mve, s)) {
 +        return false;
 +    }
 +
 +    switch (s->eci) {
 +    case ECI_NONE:
 +        return false;
 +    case ECI_A0:
 +        return ofs < 4;
 +    case ECI_A0A1:
 +        return ofs < 8;
 +    case ECI_A0A1A2:
 +    case ECI_A0A1A2B0:
 +        return ofs < 12;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
  static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  {
      /* VMOV scalar to general purpose register */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
          return false;
      }
 +    if (dc_isar_feature(aa32_mve, s)) {
 +        if (!mve_eci_check(s)) {
 +            return true;
 +        }
 +    }
++
+     if (!vfp_access_check(s)) {
+         return true;
+     }
+-    tmp = tcg_temp_new_i32();
+-    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
+-    store_reg(s, a->rt, tmp);
++    if (!mve_skip_vmov(s, a->vn, a->index, a->size)) {
++        tmp = tcg_temp_new_i32();
++        read_neon_element32(tmp, a->vn, a->index,
++                            a->size | (a->u ? 0 : MO_SIGN));
++        store_reg(s, a->rt, tmp);
++    }
++    if (dc_isar_feature(aa32_mve, s)) {
++        mve_update_and_store_eci(s);
++    }
+     return true;
  }
--SET_LCD_PIXEL(8, uint8_t)
--SET_LCD_PIXEL(16, uint16_t)
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
--SET_LCD_PIXEL(32, uint32_t)
+         return false;
+     }
- static void lcd_refresh(void *opaque)
- {
++    if (dc_isar_feature(aa32_mve, s)) {
-     musicpal_lcd_state *s = opaque;
++        if (!mve_eci_check(s)) {
--    DisplaySurface *surface = qemu_console_surface(s->con);
++            return true;
      int x, y, col;
 -    switch (surface_bits_per_pixel(surface)) {
 -    case 0:
 -        return;
 -#define LCD_REFRESH(depth, func) \
 -    case depth: \
 -        col = func(scale_lcd_color(s, (MP_LCD_TEXTCOLOR >> 16) & 0xff), \
 -                   scale_lcd_color(s, (MP_LCD_TEXTCOLOR >> 8) & 0xff), \
 -                   scale_lcd_color(s, MP_LCD_TEXTCOLOR & 0xff)); \
 -        for (x = 0; x < 128; x++) { \
 -            for (y = 0; y < 64; y++) { \
 -                if (s->video_ram[x + (y/8)*128] & (1 << (y % 8))) { \
 -                    glue(set_lcd_pixel, depth)(s, x, y, col); \
 -                } else { \
 -                    glue(set_lcd_pixel, depth)(s, x, y, 0); \
 -                } \
 -            } \
 -        } \
 -        break;
 -    LCD_REFRESH(8, rgb_to_pixel8)
 -    LCD_REFRESH(16, rgb_to_pixel16)
 -    LCD_REFRESH(32, (is_surface_bgr(surface) ?
 -                     rgb_to_pixel32bgr : rgb_to_pixel32))
 -    default:
 -        hw_error("unsupported colour depth %i\n",
 -                 surface_bits_per_pixel(surface));
 +    col = rgb_to_pixel32(scale_lcd_color(s, (MP_LCD_TEXTCOLOR >> 16) & 0xff),
 +                         scale_lcd_color(s, (MP_LCD_TEXTCOLOR >> 8) & 0xff),
 +                         scale_lcd_color(s, MP_LCD_TEXTCOLOR & 0xff));
 +    for (x = 0; x < 128; x++) {
 +        for (y = 0; y < 64; y++) {
 +            if (s->video_ram[x + (y / 8) * 128] & (1 << (y % 8))) {
 +                set_lcd_pixel32(s, x, y, col);
 +            } else {
 +                set_lcd_pixel32(s, x, y, 0);
 +            }
 +        }
++    }
++
+     if (!vfp_access_check(s)) {
+         return true;
      }
-     dpy_gfx_update(s->con, 0, 0, 128*3, 64*3);
+-    tmp = load_reg(s, a->rt);
 -    write_neon_element32(tmp, a->vn, a->index, a->size);
 -    tcg_temp_free_i32(tmp);
 +    if (!mve_skip_vmov(s, a->vn, a->index, a->size)) {
 +        tmp = load_reg(s, a->rt);
 +        write_neon_element32(tmp, a->vn, a->index, a->size);
 +        tcg_temp_free_i32(tmp);
 +    }
 +    if (dc_isar_feature(aa32_mve, s)) {
 +        mve_update_and_store_eci(s);
 +    }
      return true;
  }
 --
 .20.1

-[PULL 13/49] target/arm: Use TCF0 and TFSRE0 for unprivileged tag checks
+[PULL 56/57] target/arm: Implement MTE3
 From: Peter Collingbourne <pcc@google.com>
-Section D6.7 of the ARM ARM states:
+MTE3 introduces an asymmetric tag checking mode, in which loads are
+checked synchronously and stores are checked asynchronously. Add
-For the purpose of determining Tag Check Fault handling, unprivileged
+support for it.
 load and store instructions are treated as if executed at EL0 when
 executed at either:
 - EL1, when the Effective value of PSTATE.UAO is 0.
 - EL2, when both the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}
   and the Effective value of PSTATE.UAO is 0.
 ARM has confirmed a defect in the pseudocode function
 AArch64.TagCheckFault that makes it inconsistent with the above
 wording. The remedy is to adjust references to PSTATE.EL in that
 function to instead refer to AArch64.AccessUsesEL(acctype), so
 that unprivileged instructions use SCTLR_EL1.TCF0 and TFSRE0_EL1.
 The exception type for synchronous tag check faults remains unchanged.
 This patch implements the described change by partially reverting
 commits 50244cc76abc and cc97b0019bb5.
 Signed-off-by: Peter Collingbourne <pcc@google.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210219201820.2672077-1-pcc@google.com
+Message-id: 20210616195614.11785-1-pcc@google.com
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+[PMM: Add line to emulation.rst]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c     |  2 +-
+ docs/system/arm/emulation.rst |  1 +
- target/arm/mte_helper.c | 13 +++++++++----
+ target/arm/cpu64.c            |  2 +-
-files changed, 10 insertions(+), 5 deletions(-)
+ target/arm/mte_helper.c       | 82 ++++++++++++++++++++++-------------
 files changed, 53 insertions(+), 32 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/docs/system/arm/emulation.rst
-+++ b/target/arm/helper.c
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
-         if (FIELD_EX32(flags, TBFLAG_A64, UNPRIV)
+ - FEAT_LSE (Large System Extensions)
-             && tbid
+ - FEAT_MTE (Memory Tagging Extension)
-             && !(env->pstate & PSTATE_TCO)
+ - FEAT_MTE2 (Memory Tagging Extension)
--            && (sctlr & SCTLR_TCF)
++- FEAT_MTE3 (MTE Asymmetric Fault Handling)
-+            && (sctlr & SCTLR_TCF0)
+ - FEAT_PAN (Privileged access never)
-             && allocation_tag_access_enabled(env, 0, sctlr)) {
+ - FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN)
-             flags = FIELD_DP32(flags, TBFLAG_A64, MTE0_ACTIVE, 1);
+ - FEAT_PAuth (Pointer authentication)
-         }
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
           * during realize if the board provides no tag memory, much like
           * we do for EL2 with the virtualization=on property.
           */
 -        t = FIELD_DP64(t, ID_AA64PFR1, MTE, 2);
 +        t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3);
          cpu->isar.id_aa64pfr1 = t;
          t = cpu->isar.id_aa64mmfr0;
 diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mte_helper.c
 +++ b/target/arm/mte_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(stzgm_tags)(CPUARMState *env, uint64_t ptr, uint64_t val)
+     }
+ }
++static void mte_sync_check_fail(CPUARMState *env, uint32_t desc,
++                                uint64_t dirty_ptr, uintptr_t ra)
++{
++    int is_write, syn;
++
++    env->exception.vaddress = dirty_ptr;
++
++    is_write = FIELD_EX32(desc, MTEDESC, WRITE);
++    syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0, is_write,
++                                0x11);
++    raise_exception_ra(env, EXCP_DATA_ABORT, syn, exception_target_el(env), ra);
++    g_assert_not_reached();
++}
++
++static void mte_async_check_fail(CPUARMState *env, uint64_t dirty_ptr,
++                                 uintptr_t ra, ARMMMUIdx arm_mmu_idx, int el)
++{
++    int select;
++
++    if (regime_has_2_ranges(arm_mmu_idx)) {
++        select = extract64(dirty_ptr, 55, 1);
++    } else {
++        select = 0;
++    }
++    env->cp15.tfsr_el[el] |= 1 << select;
++#ifdef CONFIG_USER_ONLY
++    /*
++     * Stand in for a timer irq, setting _TIF_MTE_ASYNC_FAULT,
++     * which then sends a SIGSEGV when the thread is next scheduled.
++     * This cpu will return to the main loop at the end of the TB,
++     * which is rather sooner than "normal".  But the alternative
++     * is waiting until the next syscall.
++     */
++    qemu_cpu_kick(env_cpu(env));
++#endif
++}
++
+ /* Record a tag check failure.  */
+ static void mte_check_fail(CPUARMState *env, uint32_t desc,
+                            uint64_t dirty_ptr, uintptr_t ra)
+ {
+     int mmu_idx = FIELD_EX32(desc, MTEDESC, MIDX);
+     ARMMMUIdx arm_mmu_idx = core_to_aa64_mmu_idx(mmu_idx);
+-    int el, reg_el, tcf, select, is_write, syn;
++    int el, reg_el, tcf;
+     uint64_t sctlr;
+     reg_el = regime_el(env, arm_mmu_idx);
 @@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
-     reg_el = regime_el(env, arm_mmu_idx);
+     switch (tcf) {
-     sctlr = env->cp15.sctlr_el[reg_el];
+     case 1:
+         /* Tag check fail causes a synchronous exception. */
--    el = arm_current_el(env);
+-        env->exception.vaddress = dirty_ptr;
--    if (el == 0) {
+-
-+    switch (arm_mmu_idx) {
+-        is_write = FIELD_EX32(desc, MTEDESC, WRITE);
-+    case ARMMMUIdx_E10_0:
+-        syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
-+    case ARMMMUIdx_E20_0:
+-                                    is_write, 0x11);
-+        el = 0;
+-        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
-         tcf = extract64(sctlr, 38, 2);
+-                           exception_target_el(env), ra);
--    } else {
+-        /* noreturn, but fall through to the assert anyway */
 +        mte_sync_check_fail(env, desc, dirty_ptr, ra);
 +        break;
-+    default:
-+        el = reg_el;
+     case 0:
-         tcf = extract64(sctlr, 40, 2);
+         /*
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
      case 2:
          /* Tag check fail causes asynchronous flag set.  */
 -        if (regime_has_2_ranges(arm_mmu_idx)) {
 -            select = extract64(dirty_ptr, 55, 1);
 -        } else {
 -            select = 0;
 -        }
 -        env->cp15.tfsr_el[el] |= 1 << select;
 -#ifdef CONFIG_USER_ONLY
 -        /*
 -         * Stand in for a timer irq, setting _TIF_MTE_ASYNC_FAULT,
 -         * which then sends a SIGSEGV when the thread is next scheduled.
 -         * This cpu will return to the main loop at the end of the TB,
 -         * which is rather sooner than "normal".  But the alternative
 -         * is waiting until the next syscall.
 -         */
 -        qemu_cpu_kick(env_cpu(env));
 -#endif
 +        mte_async_check_fail(env, dirty_ptr, ra, arm_mmu_idx, el);
          break;
 -    default:
 -        /* Case 3: Reserved. */
 -        qemu_log_mask(LOG_GUEST_ERROR,
 -                      "Tag check failure with SCTLR_EL%d.TCF%s "
 -                      "set to reserved value %d\n",
 -                      reg_el, el ? "" : "0", tcf);
 +    case 3:
 +        /*
 +         * Tag check fail causes asynchronous flag set for stores, or
 +         * a synchronous exception for loads.
 +         */
 +        if (FIELD_EX32(desc, MTEDESC, WRITE)) {
 +            mte_async_check_fail(env, dirty_ptr, ra, arm_mmu_idx, el);
 +        } else {
 +            mte_sync_check_fail(env, desc, dirty_ptr, ra);
 +        }
          break;
      }
+ }
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
          env->exception.vaddress = dirty_ptr;
          is_write = FIELD_EX32(desc, MTEDESC, WRITE);
 -        syn = syn_data_abort_no_iss(el != 0, 0, 0, 0, 0, is_write, 0x11);
 +        syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
 +                                    is_write, 0x11);
          raise_exception(env, EXCP_DATA_ABORT, syn, exception_target_el(env));
          /* noreturn, but fall through to the assert anyway */
 --
 .20.1

-[PULL 01/49] sbsa-ref: remove cortex-a53 from list of supported cpus
+[PULL 57/57] docs/system: arm: Add nRF boards description
-From: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
+From: Alexandre Iooss <erdnaxe@crans.org>
-Cortex-A53 supports 40bits of address space. sbsa-ref's memory starts
+This adds the target guide for BBC Micro:bit.
 above this limit.
-Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
+Information is taken from https://wiki.qemu.org/Features/MicroBit
 and from hw/arm/nrf51_soc.c.
 Signed-off-by: Alexandre Iooss <erdnaxe@crans.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Acked-by: Leif Lindholm <leif@nuviainc.com>
+Reviewed-by: Joel Stanley <joel@jms.id.au>
-Message-id: 20210216150122.3830863-2-marcin.juszkiewicz@linaro.org
+Message-id: 20210621075625.540471-1-erdnaxe@crans.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/sbsa-ref.c | 1 -
+ docs/system/arm/nrf.rst    | 51 ++++++++++++++++++++++++++++++++++++++
-file changed, 1 deletion(-)
+ docs/system/target-arm.rst |  1 +
  MAINTAINERS                |  1 +
 files changed, 53 insertions(+)
  create mode 100644 docs/system/arm/nrf.rst
-diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
+diff --git a/docs/system/arm/nrf.rst b/docs/system/arm/nrf.rst
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/docs/system/arm/nrf.rst
@@ -XXX,XX +XXX,XX @@
 +Nordic nRF boards (``microbit``)
 +================================
 +
 +The `Nordic nRF`_ chips are a family of ARM-based System-on-Chip that
 +are designed to be used for low-power and short-range wireless solutions.
 +
 +.. _Nordic nRF: https://www.nordicsemi.com/Products
 +
 +The nRF51 series is the first series for short range wireless applications.
 +It is superseded by the nRF52 series.
 +The following machines are based on this chip :
 +
 +- ``microbit``       BBC micro:bit board with nRF51822 SoC
 +
 +There are other series such as nRF52, nRF53 and nRF91 which are currently not
 +supported by QEMU.
 +
 +Supported devices
 +-----------------
 +
 + * ARM Cortex-M0 (ARMv6-M)
 + * Serial ports (UART)
 + * Clock controller
 + * Timers
 + * Random Number Generator (RNG)
 + * GPIO controller
 + * NVMC
 + * SWI
 +
 +Missing devices
 +---------------
 +
 + * Watchdog
 + * Real-Time Clock (RTC) controller
 + * TWI (i2c)
 + * SPI controller
 + * Analog to Digital Converter (ADC)
 + * Quadrature decoder
 + * Radio
 +
 +Boot options
 +------------
 +
 +The Micro:bit machine can be started using the ``-device`` option to load a
 +firmware in `ihex format`_. Example:
 +
 +.. _ihex format: https://en.wikipedia.org/wiki/Intel_HEX
 +
 +.. code-block:: bash
 +
 +  $ qemu-system-arm -M microbit -device loader,file=test.hex
 diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/sbsa-ref.c
+--- a/docs/system/target-arm.rst
-+++ b/hw/arm/sbsa-ref.c
++++ b/docs/system/target-arm.rst
-@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
+@@ -XXX,XX +XXX,XX @@ undocumented; you can get a complete list by running
- };
+    arm/digic
+    arm/musicpal
- static const char * const valid_cpus[] = {
+    arm/gumstix
--    ARM_CPU_TYPE_NAME("cortex-a53"),
++   arm/nrf
-     ARM_CPU_TYPE_NAME("cortex-a57"),
+    arm/nseries
-     ARM_CPU_TYPE_NAME("cortex-a72"),
+    arm/nuvoton
- };
+    arm/orangepi
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/*/microbit*.c
  F: include/hw/*/nrf51*.h
  F: include/hw/*/microbit*.h
  F: tests/qtest/microbit-test.c
 +F: docs/system/arm/nrf.rst
  AVR Machines
  -------------
 --
 .20.1

target-arm queue: I have a lot more still in my to-review
queue, but my rule of thumb is when I get to 50 patches or
so to send out what I have.

thanks
-- PMM

The following changes since commit 9a7beaad3dbba982f7a461d676b55a5c3851d312:

Merge remote-tracking branch 'remotes/alistair/tags/pull-riscv-to-apply-20210304' into staging (2021-03-05 10:47:46 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210305

for you to fetch changes up to 2c669ff88ec6733420a000103a2b8b9e93df4945:

hw/arm/mps2: Update old infocenter.arm.com URLs (2021-03-05 15:17:38 +0000)

----------------------------------------------------------------
 * sbsa-ref: remove cortex-a53 from list of supported cpus
 * sbsa-ref: add 'max' to list of allowed cpus
 * target/arm: Add support for FEAT_SSBS, Speculative Store Bypass Safe
 * npcm7xx: add EMC model
 * xlnx-zynqmp: Remove obsolete 'has_rpu' property
 * target/arm: Speed up aarch64 TBL/TBX
 * virtio-mmio: improve virtio-mmio get_dev_path alog
 * target/arm: Use TCF0 and TFSRE0 for unprivileged tag checks
 * target/arm: Restrict v8M IDAU to TCG
 * target/arm/cpu: Update coding style to make checkpatch.pl happy
 * musicpal, tc6393xb, omap_lcdc, tcx: drop dead code for non-32-bit-RGB surfaces
 * Add new board: mps3-an524

----------------------------------------------------------------
Doug Evans (3):
      hw/net: Add npcm7xx emc model
      hw/arm: Add npcm7xx emc model
      tests/qtests: Add npcm7xx emc model test

Marcin Juszkiewicz (2):
      sbsa-ref: remove cortex-a53 from list of supported cpus
      sbsa-ref: add 'max' to list of allowed cpus

Peter Collingbourne (1):
      target/arm: Use TCF0 and TFSRE0 for unprivileged tag checks

Peter Maydell (34):
      hw/arm/musicpal: Remove dead code for non-32-bit-RGB surfaces
      hw/display/tc6393xb: Remove dead code for handling non-32bpp surfaces
      hw/display/tc6393xb: Expand out macros in template header
      hw/display/tc6393xb: Inline tc6393xb_draw_graphic32() at its callsite
      hw/display/omap_lcdc: Expand out macros in template header
      hw/display/omap_lcdc: Drop broken bigendian ifdef
      hw/display/omap_lcdc: Fix coding style issues in template header
      hw/display/omap_lcdc: Inline template header into C file
      hw/display/omap_lcdc: Delete unnecessary macro
      hw/display/tcx: Drop unnecessary code for handling BGR format outputs
      hw/arm/mps2-tz: Make SYSCLK frequency board-specific
      hw/misc/mps2-scc: Support configurable number of OSCCLK values
      hw/arm/mps2-tz: Correct the OSCCLK settings for mps2-an505 and mps2-an511
      hw/arm/mps2-tz: Make the OSCCLK settings be configurable per-board
      hw/misc/mps2-fpgaio: Make number of LEDs configurable by board
      hw/misc/mps2-fpgaio: Support SWITCH register
      hw/arm/mps2-tz: Make FPGAIO switch and LED config per-board
      hw/arm/mps2-tz: Condition IRQ splitting on number of CPUs, not board type
      hw/arm/mps2-tz: Make number of IRQs board-specific
      hw/misc/mps2-scc: Implement CFG_REG5 and CFG_REG6 for MPS3 AN524
      hw/arm/mps2-tz: Correct wrong interrupt numbers for DMA and SPI
      hw/arm/mps2-tz: Allow PPCPortInfo structures to specify device interrupts
      hw/arm/mps2-tz: Move device IRQ info to data structures
      hw/arm/mps2-tz: Size the uart-irq-orgate based on the number of UARTs
      hw/arm/mps2-tz: Allow boards to have different PPCInfo data
      hw/arm/mps2-tz: Make RAM arrangement board-specific
      hw/arm/mps2-tz: Set MachineClass default_ram info from RAMInfo data
      hw/arm/mps2-tz: Support ROMs as well as RAMs
      hw/arm/mps2-tz: Get armv7m_load_kernel() size argument from RAMInfo
      hw/arm/mps2-tz: Add new mps3-an524 board
      hw/arm/mps2-tz: Stub out USB controller for mps3-an524
      hw/arm/mps2-tz: Provide PL031 RTC on mps3-an524
      docs/system/arm/mps2.rst: Document the new mps3-an524 board
      hw/arm/mps2: Update old infocenter.arm.com URLs

Philippe Mathieu-Daudé (4):
      hw/arm/xlnx-zynqmp: Remove obsolete 'has_rpu' property
      hw/i2c/npcm7xx_smbus: Simplify npcm7xx_smbus_init()
      target/arm: Restrict v8M IDAU to TCG
      target/arm/cpu: Update coding style to make checkpatch.pl happy

Rebecca Cran (3):
      target/arm: Add support for FEAT_SSBS, Speculative Store Bypass Safe
      target/arm: Enable FEAT_SSBS for "max" AARCH64 CPU
      target/arm: Set ID_PFR2.SSBS to 1 for "max" 32-bit CPU

Richard Henderson (1):
      target/arm: Speed up aarch64 TBL/TBX

schspa (1):
      virtio-mmio: improve virtio-mmio get_dev_path alog

From: Rebecca Cran <rebecca@nuviainc.com>

Add support for FEAT_SSBS. SSBS (Speculative Store Bypass Safe) is an
optional feature in ARMv8.0, and mandatory in ARMv8.5.

Signed-off-by: Rebecca Cran <rebecca@nuviainc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210216224543.16142-2-rebecca@nuviainc.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           | 15 ++++++++++++++-
 target/arm/internals.h     |  6 ++++++
 target/arm/helper.c        | 37 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-a64.c | 12 ++++++++++++
 4 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
 #define SCTLR_TE      (1U << 30) /* AArch32 only */
 #define SCTLR_EnIB    (1U << 30) /* v8.3, AArch64 only */
 #define SCTLR_EnIA    (1U << 31) /* v8.3, AArch64 only */
+#define SCTLR_DSSBS_32 (1U << 31) /* v8.5, AArch32 only */
 #define SCTLR_BT0     (1ULL << 35) /* v8.5-BTI */
 #define SCTLR_BT1     (1ULL << 36) /* v8.5-BTI */
 #define SCTLR_ITFSB   (1ULL << 37) /* v8.5-MemTag */
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
 #define SCTLR_TCF     (3ULL << 40) /* v8.5-MemTag */
 #define SCTLR_ATA0    (1ULL << 42) /* v8.5-MemTag */
 #define SCTLR_ATA     (1ULL << 43) /* v8.5-MemTag */
-#define SCTLR_DSSBS   (1ULL << 44) /* v8.5 */
+#define SCTLR_DSSBS_64 (1ULL << 44) /* v8.5, AArch64 only */
 
 #define CPTR_TCPAC    (1U << 31)
 #define CPTR_TTA      (1U << 20)
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
 #define CPSR_IL (1U << 20)
 #define CPSR_DIT (1U << 21)
 #define CPSR_PAN (1U << 22)
+#define CPSR_SSBS (1U << 23)
 #define CPSR_J (1U << 24)
 #define CPSR_IT_0_1 (3U << 25)
 #define CPSR_Q (1U << 27)
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
 #define PSTATE_A (1U << 8)
 #define PSTATE_D (1U << 9)
 #define PSTATE_BTYPE (3U << 10)
+#define PSTATE_SSBS (1U << 12)
 #define PSTATE_IL (1U << 20)
 #define PSTATE_SS (1U << 21)
 #define PSTATE_PAN (1U << 22)
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_dit(const ARMISARegisters *id)
     return FIELD_EX32(id->id_pfr0, ID_PFR0, DIT) != 0;
 }
 
+static inline bool isar_feature_aa32_ssbs(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_pfr2, ID_PFR2, SSBS) != 0;
+}
+
 /*
  * 64-bit feature tests via id registers.
  */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_dit(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, DIT) != 0;
 }
 
+static inline bool isar_feature_aa64_ssbs(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, SSBS) != 0;
+}
+
 /*
  * Feature tests for "does this exist in either 32-bit or 64-bit?"
  */
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint32_t aarch32_cpsr_valid_mask(uint64_t features,
     if (isar_feature_aa32_dit(id)) {
         valid |= CPSR_DIT;
     }
+    if (isar_feature_aa32_ssbs(id)) {
+        valid |= CPSR_SSBS;
+    }
 
     return valid;
 }
@@ -XXX,XX +XXX,XX @@ static inline uint32_t aarch64_pstate_valid_mask(const ARMISARegisters *id)
     if (isar_feature_aa64_dit(id)) {
         valid |= PSTATE_DIT;
     }
+    if (isar_feature_aa64_ssbs(id)) {
+        valid |= PSTATE_SSBS;
+    }
     if (isar_feature_aa64_mte(id)) {
         valid |= PSTATE_TCO;
     }
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo dit_reginfo = {
     .readfn = aa64_dit_read, .writefn = aa64_dit_write
 };
 
+static uint64_t aa64_ssbs_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    return env->pstate & PSTATE_SSBS;
+}
+
+static void aa64_ssbs_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                           uint64_t value)
+{
+    env->pstate = (env->pstate & ~PSTATE_SSBS) | (value & PSTATE_SSBS);
+}
+
+static const ARMCPRegInfo ssbs_reginfo = {
+    .name = "SSBS", .state = ARM_CP_STATE_AA64,
+    .opc0 = 3, .opc1 = 3, .crn = 4, .crm = 2, .opc2 = 6,
+    .type = ARM_CP_NO_RAW, .access = PL0_RW,
+    .readfn = aa64_ssbs_read, .writefn = aa64_ssbs_write
+};
+
 static CPAccessResult aa64_cacheop_poc_access(CPUARMState *env,
                                               const ARMCPRegInfo *ri,
                                               bool isread)
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
     if (cpu_isar_feature(aa64_dit, cpu)) {
         define_one_arm_cp_reg(cpu, &dit_reginfo);
     }
+    if (cpu_isar_feature(aa64_ssbs, cpu)) {
+        define_one_arm_cp_reg(cpu, &ssbs_reginfo);
+    }
 
     if (arm_feature(env, ARM_FEATURE_EL2) && cpu_isar_feature(aa64_vh, cpu)) {
         define_arm_cp_regs(cpu, vhe_reginfo);
@@ -XXX,XX +XXX,XX @@ static void take_aarch32_exception(CPUARMState *env, int new_mode,
     env->uncached_cpsr &= ~(CPSR_IL | CPSR_J);
     env->daif |= mask;
 
+    if (cpu_isar_feature(aa32_ssbs, env_archcpu(env))) {
+        if (env->cp15.sctlr_el[new_el] & SCTLR_DSSBS_32) {
+            env->uncached_cpsr |= CPSR_SSBS;
+        } else {
+            env->uncached_cpsr &= ~CPSR_SSBS;
+        }
+    }
+
     if (new_mode == ARM_CPU_MODE_HYP) {
         env->thumb = (env->cp15.sctlr_el[2] & SCTLR_TE) != 0;
         env->elr_el[2] = env->regs[15];
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
         new_mode |= PSTATE_TCO;
     }
 
+    if (cpu_isar_feature(aa64_ssbs, cpu)) {
+        if (env->cp15.sctlr_el[new_el] & SCTLR_DSSBS_64) {
+            new_mode |= PSTATE_SSBS;
+        } else {
+            new_mode &= ~PSTATE_SSBS;
+        }
+    }
+
     pstate_write(env, PSTATE_DAIF | new_mode);
     env->aarch64 = 1;
     aarch64_restore_sp(env, new_el);
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_msr_i(DisasContext *s, uint32_t insn,
         tcg_temp_free_i32(t1);
         break;
 
+    case 0x19: /* SSBS */
+        if (!dc_isar_feature(aa64_ssbs, s)) {
+            goto do_unallocated;
+        }
+        if (crm & 1) {
+            set_pstate_bits(PSTATE_SSBS);
+        } else {
+            clear_pstate_bits(PSTATE_SSBS);
+        }
+        /* Don't need to rebuild hflags since SSBS is a nop */
+        break;
+
     case 0x1a: /* DIT */
         if (!dc_isar_feature(aa64_dit, s)) {
             goto do_unallocated;
-- 
2.20.1

From: Rebecca Cran <rebecca@nuviainc.com>

Set ID_AA64PFR1_EL1.SSBS to 2 and ID_PFR2.SSBS to 1.

Signed-off-by: Rebecca Cran <rebecca@nuviainc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210216224543.16142-3-rebecca@nuviainc.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu64.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 
         t = cpu->isar.id_aa64pfr1;
         t = FIELD_DP64(t, ID_AA64PFR1, BT, 1);
+        t = FIELD_DP64(t, ID_AA64PFR1, SSBS, 2);
         /*
          * Begin with full support for MTE. This will be downgraded to MTE=0
          * during realize if the board provides no tag memory, much like
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_PFR0, DIT, 1);
         cpu->isar.id_pfr0 = u;
 
+        u = cpu->isar.id_pfr2;
+        u = FIELD_DP32(u, ID_PFR2, SSBS, 1);
+        cpu->isar.id_pfr2 = u;
+
         u = cpu->isar.id_mmfr3;
         u = FIELD_DP32(u, ID_MMFR3, PAN, 2); /* ATS1E1 */
         cpu->isar.id_mmfr3 = u;
-- 
2.20.1

From: Doug Evans <dje@google.com>

This is a 10/100 ethernet device that has several features.
Only the ones needed by the Linux driver have been implemented.
See npcm7xx_emc.c for a list of unimplemented features.

Reviewed-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Avi Fishman <avi.fishman@nuvoton.com>
Signed-off-by: Doug Evans <dje@google.com>
Message-id: 20210218212453.831406-2-dje@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/net/npcm7xx_emc.h | 286 ++++++++++++
 hw/net/npcm7xx_emc.c         | 857 +++++++++++++++++++++++++++++++++++
 hw/net/meson.build           |   1 +
 hw/net/trace-events          |  17 +
 4 files changed, 1161 insertions(+)
 create mode 100644 include/hw/net/npcm7xx_emc.h
 create mode 100644 hw/net/npcm7xx_emc.c

diff --git a/include/hw/net/npcm7xx_emc.h b/include/hw/net/npcm7xx_emc.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/net/npcm7xx_emc.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * Nuvoton NPCM7xx EMC Module
+ *
+ * Copyright 2020 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef NPCM7XX_EMC_H
+#define NPCM7XX_EMC_H
+
+#include "hw/irq.h"
+#include "hw/sysbus.h"
+#include "net/net.h"
+
+/* 32-bit register indices. */
+enum NPCM7xxPWMRegister {
+    /* Control registers. */
+    REG_CAMCMR,
+    REG_CAMEN,
+
+    /* There are 16 CAMn[ML] registers. */
+    REG_CAMM_BASE,
+    REG_CAML_BASE,
+    REG_CAMML_LAST = 0x21,
+
+    REG_TXDLSA = 0x22,
+    REG_RXDLSA,
+    REG_MCMDR,
+    REG_MIID,
+    REG_MIIDA,
+    REG_FFTCR,
+    REG_TSDR,
+    REG_RSDR,
+    REG_DMARFC,
+    REG_MIEN,
+
+    /* Status registers. */
+    REG_MISTA,
+    REG_MGSTA,
+    REG_MPCNT,
+    REG_MRPC,
+    REG_MRPCC,
+    REG_MREPC,
+    REG_DMARFS,
+    REG_CTXDSA,
+    REG_CTXBSA,
+    REG_CRXDSA,
+    REG_CRXBSA,
+
+    NPCM7XX_NUM_EMC_REGS,
+};
+
+/* REG_CAMCMR fields */
+/* Enable CAM Compare */
+#define REG_CAMCMR_ECMP (1 << 4)
+/* Complement CAM Compare */
+#define REG_CAMCMR_CCAM (1 << 3)
+/* Accept Broadcast Packet */
+#define REG_CAMCMR_ABP (1 << 2)
+/* Accept Multicast Packet */
+#define REG_CAMCMR_AMP (1 << 1)
+/* Accept Unicast Packet */
+#define REG_CAMCMR_AUP (1 << 0)
+
+/* REG_MCMDR fields */
+/* Software Reset */
+#define REG_MCMDR_SWR (1 << 24)
+/* Internal Loopback Select */
+#define REG_MCMDR_LBK (1 << 21)
+/* Operation Mode Select */
+#define REG_MCMDR_OPMOD (1 << 20)
+/* Enable MDC Clock Generation */
+#define REG_MCMDR_ENMDC (1 << 19)
+/* Full-Duplex Mode Select */
+#define REG_MCMDR_FDUP (1 << 18)
+/* Enable SQE Checking */
+#define REG_MCMDR_ENSEQ (1 << 17)
+/* Send PAUSE Frame */
+#define REG_MCMDR_SDPZ (1 << 16)
+/* No Defer */
+#define REG_MCMDR_NDEF (1 << 9)
+/* Frame Transmission On */
+#define REG_MCMDR_TXON (1 << 8)
+/* Strip CRC Checksum */
+#define REG_MCMDR_SPCRC (1 << 5)
+/* Accept CRC Error Packet */
+#define REG_MCMDR_AEP (1 << 4)
+/* Accept Control Packet */
+#define REG_MCMDR_ACP (1 << 3)
+/* Accept Runt Packet */
+#define REG_MCMDR_ARP (1 << 2)
+/* Accept Long Packet */
+#define REG_MCMDR_ALP (1 << 1)
+/* Frame Reception On */
+#define REG_MCMDR_RXON (1 << 0)
+
+/* REG_MIEN fields */
+/* Enable Transmit Descriptor Unavailable Interrupt */
+#define REG_MIEN_ENTDU (1 << 23)
+/* Enable Transmit Completion Interrupt */
+#define REG_MIEN_ENTXCP (1 << 18)
+/* Enable Transmit Interrupt */
+#define REG_MIEN_ENTXINTR (1 << 16)
+/* Enable Receive Descriptor Unavailable Interrupt */
+#define REG_MIEN_ENRDU (1 << 10)
+/* Enable Receive Good Interrupt */
+#define REG_MIEN_ENRXGD (1 << 4)
+/* Enable Receive Interrupt */
+#define REG_MIEN_ENRXINTR (1 << 0)
+
+/* REG_MISTA fields */
+/* TODO: Add error fields and support simulated errors? */
+/* Transmit Bus Error Interrupt */
+#define REG_MISTA_TXBERR (1 << 24)
+/* Transmit Descriptor Unavailable Interrupt */
+#define REG_MISTA_TDU (1 << 23)
+/* Transmit Completion Interrupt */
+#define REG_MISTA_TXCP (1 << 18)
+/* Transmit Interrupt */
+#define REG_MISTA_TXINTR (1 << 16)
+/* Receive Bus Error Interrupt */
+#define REG_MISTA_RXBERR (1 << 11)
+/* Receive Descriptor Unavailable Interrupt */
+#define REG_MISTA_RDU (1 << 10)
+/* DMA Early Notification Interrupt */
+#define REG_MISTA_DENI (1 << 9)
+/* Maximum Frame Length Interrupt */
+#define REG_MISTA_DFOI (1 << 8)
+/* Receive Good Interrupt */
+#define REG_MISTA_RXGD (1 << 4)
+/* Packet Too Long Interrupt */
+#define REG_MISTA_PTLE (1 << 3)
+/* Receive Interrupt */
+#define REG_MISTA_RXINTR (1 << 0)
+
+/* REG_MGSTA fields */
+/* Transmission Halted */
+#define REG_MGSTA_TXHA (1 << 11)
+/* Receive Halted */
+#define REG_MGSTA_RXHA (1 << 11)
+
+/* REG_DMARFC fields */
+/* Maximum Receive Frame Length */
+#define REG_DMARFC_RXMS(word) extract32((word), 0, 16)
+
+/* REG MIIDA fields */
+/* Busy Bit */
+#define REG_MIIDA_BUSY (1 << 17)
+
+/* Transmit and receive descriptors */
+typedef struct NPCM7xxEMCTxDesc NPCM7xxEMCTxDesc;
+typedef struct NPCM7xxEMCRxDesc NPCM7xxEMCRxDesc;
+
+struct NPCM7xxEMCTxDesc {
+    uint32_t flags;
+    uint32_t txbsa;
+    uint32_t status_and_length;
+    uint32_t ntxdsa;
+};
+
+struct NPCM7xxEMCRxDesc {
+    uint32_t status_and_length;
+    uint32_t rxbsa;
+    uint32_t reserved;
+    uint32_t nrxdsa;
+};
+
+/* NPCM7xxEMCTxDesc.flags values */
+/* Owner: 0 = cpu, 1 = emc */
+#define TX_DESC_FLAG_OWNER_MASK (1 << 31)
+/* Transmit interrupt enable */
+#define TX_DESC_FLAG_INTEN (1 << 2)
+/* CRC append */
+#define TX_DESC_FLAG_CRCAPP (1 << 1)
+/* Padding enable */
+#define TX_DESC_FLAG_PADEN (1 << 0)
+
+/* NPCM7xxEMCTxDesc.status_and_length values */
+/* Collision count */
+#define TX_DESC_STATUS_CCNT_SHIFT 28
+#define TX_DESC_STATUS_CCNT_BITSIZE 4
+/* SQE error */
+#define TX_DESC_STATUS_SQE (1 << 26)
+/* Transmission paused */
+#define TX_DESC_STATUS_PAU (1 << 25)
+/* P transmission halted */
+#define TX_DESC_STATUS_TXHA (1 << 24)
+/* Late collision */
+#define TX_DESC_STATUS_LC (1 << 23)
+/* Transmission abort */
+#define TX_DESC_STATUS_TXABT (1 << 22)
+/* No carrier sense */
+#define TX_DESC_STATUS_NCS (1 << 21)
+/* Defer exceed */
+#define TX_DESC_STATUS_EXDEF (1 << 20)
+/* Transmission complete */
+#define TX_DESC_STATUS_TXCP (1 << 19)
+/* Transmission deferred */
+#define TX_DESC_STATUS_DEF (1 << 17)
+/* Transmit interrupt */
+#define TX_DESC_STATUS_TXINTR (1 << 16)
+
+#define TX_DESC_PKT_LEN(word) extract32((word), 0, 16)
+
+/* Transmit buffer start address */
+#define TX_DESC_TXBSA(word) ((uint32_t) (word) & ~3u)
+
+/* Next transmit descriptor start address */
+#define TX_DESC_NTXDSA(word) ((uint32_t) (word) & ~3u)
+
+/* NPCM7xxEMCRxDesc.status_and_length values */
+/* Owner: 0b00 = cpu, 0b01 = undefined, 0b10 = emc, 0b11 = undefined */
+#define RX_DESC_STATUS_OWNER_SHIFT 30
+#define RX_DESC_STATUS_OWNER_BITSIZE 2
+#define RX_DESC_STATUS_OWNER_MASK (3 << RX_DESC_STATUS_OWNER_SHIFT)
+/* Runt packet */
+#define RX_DESC_STATUS_RP (1 << 22)
+/* Alignment error */
+#define RX_DESC_STATUS_ALIE (1 << 21)
+/* Frame reception complete */
+#define RX_DESC_STATUS_RXGD (1 << 20)
+/* Packet too long */
+#define RX_DESC_STATUS_PTLE (1 << 19)
+/* CRC error */
+#define RX_DESC_STATUS_CRCE (1 << 17)
+/* Receive interrupt */
+#define RX_DESC_STATUS_RXINTR (1 << 16)
+
+#define RX_DESC_PKT_LEN(word) extract32((word), 0, 16)
+
+/* Receive buffer start address */
+#define RX_DESC_RXBSA(word) ((uint32_t) (word) & ~3u)
+
+/* Next receive descriptor start address */
+#define RX_DESC_NRXDSA(word) ((uint32_t) (word) & ~3u)
+
+/* Minimum packet length, when TX_DESC_FLAG_PADEN is set. */
+#define MIN_PACKET_LENGTH 64
+
+struct NPCM7xxEMCState {
+    /*< private >*/
+    SysBusDevice parent;
+    /*< public >*/
+
+    MemoryRegion iomem;
+
+    qemu_irq tx_irq;
+    qemu_irq rx_irq;
+
+    NICState *nic;
+    NICConf conf;
+
+    /* 0 or 1, for log messages */
+    uint8_t emc_num;
+
+    uint32_t regs[NPCM7XX_NUM_EMC_REGS];
+
+    /*
+     * tx is active. Set to true by TSDR and then switches off when out of
+     * descriptors. If the TXON bit in REG_MCMDR is off then this is off.
+     */
+    bool tx_active;
+
+    /*
+     * rx is active. Set to true by RSDR and then switches off when out of
+     * descriptors. If the RXON bit in REG_MCMDR is off then this is off.
+     */
+    bool rx_active;
+};
+
+typedef struct NPCM7xxEMCState NPCM7xxEMCState;
+
+#define TYPE_NPCM7XX_EMC "npcm7xx-emc"
+#define NPCM7XX_EMC(obj) \
+    OBJECT_CHECK(NPCM7xxEMCState, (obj), TYPE_NPCM7XX_EMC)
+
+#endif /* NPCM7XX_EMC_H */
diff --git a/hw/net/npcm7xx_emc.c b/hw/net/npcm7xx_emc.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/net/npcm7xx_emc.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Nuvoton NPCM7xx EMC Module
+ *
+ * Copyright 2020 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * Unsupported/unimplemented features:
+ * - MCMDR.FDUP (full duplex) is ignored, half duplex is not supported
+ * - Only CAM0 is supported, CAM[1-15] are not
+ *   - writes to CAMEN.[1-15] are ignored, these bits always read as zeroes
+ * - MII is not implemented, MIIDA.BUSY and MIID always return zero
+ * - MCMDR.LBK is not implemented
+ * - MCMDR.{OPMOD,ENSQE,AEP,ARP} are not supported
+ * - H/W FIFOs are not supported, MCMDR.FFTCR is ignored
+ * - MGSTA.SQE is not supported
+ * - pause and control frames are not implemented
+ * - MGSTA.CCNT is not supported
+ * - MPCNT, DMARFS are not implemented
+ */
+
+#include "qemu/osdep.h"
+
+/* For crc32 */
+#include <zlib.h>
+
+#include "qemu-common.h"
+#include "hw/irq.h"
+#include "hw/qdev-clock.h"
+#include "hw/qdev-properties.h"
+#include "hw/net/npcm7xx_emc.h"
+#include "net/eth.h"
+#include "migration/vmstate.h"
+#include "qemu/bitops.h"
+#include "qemu/error-report.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qemu/units.h"
+#include "sysemu/dma.h"
+#include "trace.h"
+
+#define CRC_LENGTH 4
+
+/*
+ * The maximum size of a (layer 2) ethernet frame as defined by 802.3.
+ * 1518 = 6(dest macaddr) + 6(src macaddr) + 2(proto) + 4(crc) + 1500(payload)
+ * This does not include an additional 4 for the vlan field (802.1q).
+ */
+#define MAX_ETH_FRAME_SIZE 1518
+
+static const char *emc_reg_name(int regno)
+{
+#define REG(name) case REG_ ## name: return #name;
+    switch (regno) {
+    REG(CAMCMR)
+    REG(CAMEN)
+    REG(TXDLSA)
+    REG(RXDLSA)
+    REG(MCMDR)
+    REG(MIID)
+    REG(MIIDA)
+    REG(FFTCR)
+    REG(TSDR)
+    REG(RSDR)
+    REG(DMARFC)
+    REG(MIEN)
+    REG(MISTA)
+    REG(MGSTA)
+    REG(MPCNT)
+    REG(MRPC)
+    REG(MRPCC)
+    REG(MREPC)
+    REG(DMARFS)
+    REG(CTXDSA)
+    REG(CTXBSA)
+    REG(CRXDSA)
+    REG(CRXBSA)
+    case REG_CAMM_BASE + 0: return "CAM0M";
+    case REG_CAML_BASE + 0: return "CAM0L";
+    case REG_CAMM_BASE + 2 ... REG_CAMML_LAST:
+        /* Only CAM0 is supported, fold the others into something simple. */
+        if (regno & 1) {
+            return "CAM<n>L";
+        } else {
+            return "CAM<n>M";
+        }
+    default: return "UNKNOWN";
+    }
+#undef REG
+}
+
+static void emc_reset(NPCM7xxEMCState *emc)
+{
+    trace_npcm7xx_emc_reset(emc->emc_num);
+
+    memset(&emc->regs[0], 0, sizeof(emc->regs));
+
+    /* These regs have non-zero reset values. */
+    emc->regs[REG_TXDLSA] = 0xfffffffc;
+    emc->regs[REG_RXDLSA] = 0xfffffffc;
+    emc->regs[REG_MIIDA] = 0x00900000;
+    emc->regs[REG_FFTCR] = 0x0101;
+    emc->regs[REG_DMARFC] = 0x0800;
+    emc->regs[REG_MPCNT] = 0x7fff;
+
+    emc->tx_active = false;
+    emc->rx_active = false;
+}
+
+static void npcm7xx_emc_reset(DeviceState *dev)
+{
+    NPCM7xxEMCState *emc = NPCM7XX_EMC(dev);
+    emc_reset(emc);
+}
+
+static void emc_soft_reset(NPCM7xxEMCState *emc)
+{
+    /*
+     * The docs say at least MCMDR.{LBK,OPMOD} bits are not changed during a
+     * soft reset, but does not go into further detail. For now, KISS.
+     */
+    uint32_t mcmdr = emc->regs[REG_MCMDR];
+    emc_reset(emc);
+    emc->regs[REG_MCMDR] = mcmdr & (REG_MCMDR_LBK | REG_MCMDR_OPMOD);
+
+    qemu_set_irq(emc->tx_irq, 0);
+    qemu_set_irq(emc->rx_irq, 0);
+}
+
+static void emc_set_link(NetClientState *nc)
+{
+    /* Nothing to do yet. */
+}
+
+/* MISTA.TXINTR is the union of the individual bits with their enables. */
+static void emc_update_mista_txintr(NPCM7xxEMCState *emc)
+{
+    /* Only look at the bits we support. */
+    uint32_t mask = (REG_MISTA_TXBERR |
+                     REG_MISTA_TDU |
+                     REG_MISTA_TXCP);
+    if (emc->regs[REG_MISTA] & emc->regs[REG_MIEN] & mask) {
+        emc->regs[REG_MISTA] |= REG_MISTA_TXINTR;
+    } else {
+        emc->regs[REG_MISTA] &= ~REG_MISTA_TXINTR;
+    }
+}
+
+/* MISTA.RXINTR is the union of the individual bits with their enables. */
+static void emc_update_mista_rxintr(NPCM7xxEMCState *emc)
+{
+    /* Only look at the bits we support. */
+    uint32_t mask = (REG_MISTA_RXBERR |
+                     REG_MISTA_RDU |
+                     REG_MISTA_RXGD);
+    if (emc->regs[REG_MISTA] & emc->regs[REG_MIEN] & mask) {
+        emc->regs[REG_MISTA] |= REG_MISTA_RXINTR;
+    } else {
+        emc->regs[REG_MISTA] &= ~REG_MISTA_RXINTR;
+    }
+}
+
+/* N.B. emc_update_mista_txintr must have already been called. */
+static void emc_update_tx_irq(NPCM7xxEMCState *emc)
+{
+    int level = !!(emc->regs[REG_MISTA] &
+                   emc->regs[REG_MIEN] &
+                   REG_MISTA_TXINTR);
+    trace_npcm7xx_emc_update_tx_irq(level);
+    qemu_set_irq(emc->tx_irq, level);
+}
+
+/* N.B. emc_update_mista_rxintr must have already been called. */
+static void emc_update_rx_irq(NPCM7xxEMCState *emc)
+{
+    int level = !!(emc->regs[REG_MISTA] &
+                   emc->regs[REG_MIEN] &
+                   REG_MISTA_RXINTR);
+    trace_npcm7xx_emc_update_rx_irq(level);
+    qemu_set_irq(emc->rx_irq, level);
+}
+
+/* Update IRQ states due to changes in MIEN,MISTA. */
+static void emc_update_irq_from_reg_change(NPCM7xxEMCState *emc)
+{
+    emc_update_mista_txintr(emc);
+    emc_update_tx_irq(emc);
+
+    emc_update_mista_rxintr(emc);
+    emc_update_rx_irq(emc);
+}
+
+static int emc_read_tx_desc(dma_addr_t addr, NPCM7xxEMCTxDesc *desc)
+{
+    if (dma_memory_read(&address_space_memory, addr, desc, sizeof(*desc))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
+                      HWADDR_PRIx "\n", __func__, addr);
+        return -1;
+    }
+    desc->flags = le32_to_cpu(desc->flags);
+    desc->txbsa = le32_to_cpu(desc->txbsa);
+    desc->status_and_length = le32_to_cpu(desc->status_and_length);
+    desc->ntxdsa = le32_to_cpu(desc->ntxdsa);
+    return 0;
+}
+
+static int emc_write_tx_desc(const NPCM7xxEMCTxDesc *desc, dma_addr_t addr)
+{
+    NPCM7xxEMCTxDesc le_desc;
+
+    le_desc.flags = cpu_to_le32(desc->flags);
+    le_desc.txbsa = cpu_to_le32(desc->txbsa);
+    le_desc.status_and_length = cpu_to_le32(desc->status_and_length);
+    le_desc.ntxdsa = cpu_to_le32(desc->ntxdsa);
+    if (dma_memory_write(&address_space_memory, addr, &le_desc,
+                         sizeof(le_desc))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to write descriptor @ 0x%"
+                      HWADDR_PRIx "\n", __func__, addr);
+        return -1;
+    }
+    return 0;
+}
+
+static int emc_read_rx_desc(dma_addr_t addr, NPCM7xxEMCRxDesc *desc)
+{
+    if (dma_memory_read(&address_space_memory, addr, desc, sizeof(*desc))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
+                      HWADDR_PRIx "\n", __func__, addr);
+        return -1;
+    }
+    desc->status_and_length = le32_to_cpu(desc->status_and_length);
+    desc->rxbsa = le32_to_cpu(desc->rxbsa);
+    desc->reserved = le32_to_cpu(desc->reserved);
+    desc->nrxdsa = le32_to_cpu(desc->nrxdsa);
+    return 0;
+}
+
+static int emc_write_rx_desc(const NPCM7xxEMCRxDesc *desc, dma_addr_t addr)
+{
+    NPCM7xxEMCRxDesc le_desc;
+
+    le_desc.status_and_length = cpu_to_le32(desc->status_and_length);
+    le_desc.rxbsa = cpu_to_le32(desc->rxbsa);
+    le_desc.reserved = cpu_to_le32(desc->reserved);
+    le_desc.nrxdsa = cpu_to_le32(desc->nrxdsa);
+    if (dma_memory_write(&address_space_memory, addr, &le_desc,
+                         sizeof(le_desc))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to write descriptor @ 0x%"
+                      HWADDR_PRIx "\n", __func__, addr);
+        return -1;
+    }
+    return 0;
+}
+
+static void emc_set_mista(NPCM7xxEMCState *emc, uint32_t flags)
+{
+    trace_npcm7xx_emc_set_mista(flags);
+    emc->regs[REG_MISTA] |= flags;
+    if (extract32(flags, 16, 16)) {
+        emc_update_mista_txintr(emc);
+    }
+    if (extract32(flags, 0, 16)) {
+        emc_update_mista_rxintr(emc);
+    }
+}
+
+static void emc_halt_tx(NPCM7xxEMCState *emc, uint32_t mista_flag)
+{
+    emc->tx_active = false;
+    emc_set_mista(emc, mista_flag);
+}
+
+static void emc_halt_rx(NPCM7xxEMCState *emc, uint32_t mista_flag)
+{
+    emc->rx_active = false;
+    emc_set_mista(emc, mista_flag);
+}
+
+static void emc_set_next_tx_descriptor(NPCM7xxEMCState *emc,
+                                       const NPCM7xxEMCTxDesc *tx_desc,
+                                       uint32_t desc_addr)
+{
+    /* Update the current descriptor, if only to reset the owner flag. */
+    if (emc_write_tx_desc(tx_desc, desc_addr)) {
+        /*
+         * We just read it so this shouldn't generally happen.
+         * Error already reported.
+         */
+        emc_set_mista(emc, REG_MISTA_TXBERR);
+    }
+    emc->regs[REG_CTXDSA] = TX_DESC_NTXDSA(tx_desc->ntxdsa);
+}
+
+static void emc_set_next_rx_descriptor(NPCM7xxEMCState *emc,
+                                       const NPCM7xxEMCRxDesc *rx_desc,
+                                       uint32_t desc_addr)
+{
+    /* Update the current descriptor, if only to reset the owner flag. */
+    if (emc_write_rx_desc(rx_desc, desc_addr)) {
+        /*
+         * We just read it so this shouldn't generally happen.
+         * Error already reported.
+         */
+        emc_set_mista(emc, REG_MISTA_RXBERR);
+    }
+    emc->regs[REG_CRXDSA] = RX_DESC_NRXDSA(rx_desc->nrxdsa);
+}
+
+static void emc_try_send_next_packet(NPCM7xxEMCState *emc)
+{
+    /* Working buffer for sending out packets. Most packets fit in this. */
+#define TX_BUFFER_SIZE 2048
+    uint8_t tx_send_buffer[TX_BUFFER_SIZE];
+    uint32_t desc_addr = TX_DESC_NTXDSA(emc->regs[REG_CTXDSA]);
+    NPCM7xxEMCTxDesc tx_desc;
+    uint32_t next_buf_addr, length;
+    uint8_t *buf;
+    g_autofree uint8_t *malloced_buf = NULL;
+
+    if (emc_read_tx_desc(desc_addr, &tx_desc)) {
+        /* Error reading descriptor, already reported. */
+        emc_halt_tx(emc, REG_MISTA_TXBERR);
+        emc_update_tx_irq(emc);
+        return;
+    }
+
+    /* Nothing we can do if we don't own the descriptor. */
+    if (!(tx_desc.flags & TX_DESC_FLAG_OWNER_MASK)) {
+        trace_npcm7xx_emc_cpu_owned_desc(desc_addr);
+        emc_halt_tx(emc, REG_MISTA_TDU);
+        emc_update_tx_irq(emc);
+        return;
+     }
+
+    /* Give the descriptor back regardless of what happens. */
+    tx_desc.flags &= ~TX_DESC_FLAG_OWNER_MASK;
+    tx_desc.status_and_length &= 0xffff;
+
+    /*
+     * Despite the h/w documentation saying the tx buffer is word aligned,
+     * the linux driver does not word align the buffer. There is value in not
+     * aligning the buffer: See the description of NET_IP_ALIGN in linux
+     * kernel sources.
+     */
+    next_buf_addr = tx_desc.txbsa;
+    emc->regs[REG_CTXBSA] = next_buf_addr;
+    length = TX_DESC_PKT_LEN(tx_desc.status_and_length);
+    buf = &tx_send_buffer[0];
+
+    if (length > sizeof(tx_send_buffer)) {
+        malloced_buf = g_malloc(length);
+        buf = malloced_buf;
+    }
+
+    if (dma_memory_read(&address_space_memory, next_buf_addr, buf, length)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read packet @ 0x%x\n",
+                      __func__, next_buf_addr);
+        emc_set_mista(emc, REG_MISTA_TXBERR);
+        emc_set_next_tx_descriptor(emc, &tx_desc, desc_addr);
+        emc_update_tx_irq(emc);
+        trace_npcm7xx_emc_tx_done(emc->regs[REG_CTXDSA]);
+        return;
+    }
+
+    if ((tx_desc.flags & TX_DESC_FLAG_PADEN) && (length < MIN_PACKET_LENGTH)) {
+        memset(buf + length, 0, MIN_PACKET_LENGTH - length);
+        length = MIN_PACKET_LENGTH;
+    }
+
+    /* N.B. emc_receive can get called here. */
+    qemu_send_packet(qemu_get_queue(emc->nic), buf, length);
+    trace_npcm7xx_emc_sent_packet(length);
+
+    tx_desc.status_and_length |= TX_DESC_STATUS_TXCP;
+    if (tx_desc.flags & TX_DESC_FLAG_INTEN) {
+        emc_set_mista(emc, REG_MISTA_TXCP);
+    }
+    if (emc->regs[REG_MISTA] & emc->regs[REG_MIEN] & REG_MISTA_TXINTR) {
+        tx_desc.status_and_length |= TX_DESC_STATUS_TXINTR;
+    }
+
+    emc_set_next_tx_descriptor(emc, &tx_desc, desc_addr);
+    emc_update_tx_irq(emc);
+    trace_npcm7xx_emc_tx_done(emc->regs[REG_CTXDSA]);
+}
+
+static bool emc_can_receive(NetClientState *nc)
+{
+    NPCM7xxEMCState *emc = NPCM7XX_EMC(qemu_get_nic_opaque(nc));
+
+    bool can_receive = emc->rx_active;
+    trace_npcm7xx_emc_can_receive(can_receive);
+    return can_receive;
+}
+
+/* If result is false then *fail_reason contains the reason. */
+static bool emc_receive_filter1(NPCM7xxEMCState *emc, const uint8_t *buf,
+                                size_t len, const char **fail_reason)
+{
+    eth_pkt_types_e pkt_type = get_eth_packet_type(PKT_GET_ETH_HDR(buf));
+
+    switch (pkt_type) {
+    case ETH_PKT_BCAST:
+        if (emc->regs[REG_CAMCMR] & REG_CAMCMR_CCAM) {
+            return true;
+        } else {
+            *fail_reason = "Broadcast packet disabled";
+            return !!(emc->regs[REG_CAMCMR] & REG_CAMCMR_ABP);
+        }
+    case ETH_PKT_MCAST:
+        if (emc->regs[REG_CAMCMR] & REG_CAMCMR_CCAM) {
+            return true;
+        } else {
+            *fail_reason = "Multicast packet disabled";
+            return !!(emc->regs[REG_CAMCMR] & REG_CAMCMR_AMP);
+        }
+    case ETH_PKT_UCAST: {
+        bool matches;
+        if (emc->regs[REG_CAMCMR] & REG_CAMCMR_AUP) {
+            return true;
+        }
+        matches = ((emc->regs[REG_CAMCMR] & REG_CAMCMR_ECMP) &&
+                   /* We only support one CAM register, CAM0. */
+                   (emc->regs[REG_CAMEN] & (1 << 0)) &&
+                   memcmp(buf, emc->conf.macaddr.a, ETH_ALEN) == 0);
+        if (emc->regs[REG_CAMCMR] & REG_CAMCMR_CCAM) {
+            *fail_reason = "MACADDR matched, comparison complemented";
+            return !matches;
+        } else {
+            *fail_reason = "MACADDR didn't match";
+            return matches;
+        }
+    }
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static bool emc_receive_filter(NPCM7xxEMCState *emc, const uint8_t *buf,
+                               size_t len)
+{
+    const char *fail_reason = NULL;
+    bool ok = emc_receive_filter1(emc, buf, len, &fail_reason);
+    if (!ok) {
+        trace_npcm7xx_emc_packet_filtered_out(fail_reason);
+    }
+    return ok;
+}
+
+static ssize_t emc_receive(NetClientState *nc, const uint8_t *buf, size_t len1)
+{
+    NPCM7xxEMCState *emc = NPCM7XX_EMC(qemu_get_nic_opaque(nc));
+    const uint32_t len = len1;
+    size_t max_frame_len;
+    bool long_frame;
+    uint32_t desc_addr;
+    NPCM7xxEMCRxDesc rx_desc;
+    uint32_t crc;
+    uint8_t *crc_ptr;
+    uint32_t buf_addr;
+
+    trace_npcm7xx_emc_receiving_packet(len);
+
+    if (!emc_can_receive(nc)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Unexpected packet\n", __func__);
+        return -1;
+    }
+
+    if (len < ETH_HLEN ||
+        /* Defensive programming: drop unsupportable large packets. */
+        len > 0xffff - CRC_LENGTH) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Dropped frame of %u bytes\n",
+                      __func__, len);
+        return len;
+    }
+
+    /*
+     * DENI is set if EMC received the Length/Type field of the incoming
+     * packet, so it will be set regardless of what happens next.
+     */
+    emc_set_mista(emc, REG_MISTA_DENI);
+
+    if (!emc_receive_filter(emc, buf, len)) {
+        emc_update_rx_irq(emc);
+        return len;
+    }
+
+    /* Huge frames (> DMARFC) are dropped. */
+    max_frame_len = REG_DMARFC_RXMS(emc->regs[REG_DMARFC]);
+    if (len + CRC_LENGTH > max_frame_len) {
+        trace_npcm7xx_emc_packet_dropped(len);
+        emc_set_mista(emc, REG_MISTA_DFOI);
+        emc_update_rx_irq(emc);
+        return len;
+    }
+
+    /*
+     * Long Frames (> MAX_ETH_FRAME_SIZE) are also dropped, unless MCMDR.ALP
+     * is set.
+     */
+    long_frame = false;
+    if (len + CRC_LENGTH > MAX_ETH_FRAME_SIZE) {
+        if (emc->regs[REG_MCMDR] & REG_MCMDR_ALP) {
+            long_frame = true;
+        } else {
+            trace_npcm7xx_emc_packet_dropped(len);
+            emc_set_mista(emc, REG_MISTA_PTLE);
+            emc_update_rx_irq(emc);
+            return len;
+        }
+    }
+
+    desc_addr = RX_DESC_NRXDSA(emc->regs[REG_CRXDSA]);
+    if (emc_read_rx_desc(desc_addr, &rx_desc)) {
+        /* Error reading descriptor, already reported. */
+        emc_halt_rx(emc, REG_MISTA_RXBERR);
+        emc_update_rx_irq(emc);
+        return len;
+    }
+
+    /* Nothing we can do if we don't own the descriptor. */
+    if (!(rx_desc.status_and_length & RX_DESC_STATUS_OWNER_MASK)) {
+        trace_npcm7xx_emc_cpu_owned_desc(desc_addr);
+        emc_halt_rx(emc, REG_MISTA_RDU);
+        emc_update_rx_irq(emc);
+        return len;
+    }
+
+    crc = 0;
+    crc_ptr = (uint8_t *) &crc;
+    if (!(emc->regs[REG_MCMDR] & REG_MCMDR_SPCRC)) {
+        crc = cpu_to_be32(crc32(~0, buf, len));
+    }
+
+    /* Give the descriptor back regardless of what happens. */
+    rx_desc.status_and_length &= ~RX_DESC_STATUS_OWNER_MASK;
+
+    buf_addr = rx_desc.rxbsa;
+    emc->regs[REG_CRXBSA] = buf_addr;
+    if (dma_memory_write(&address_space_memory, buf_addr, buf, len) ||
+        (!(emc->regs[REG_MCMDR] & REG_MCMDR_SPCRC) &&
+         dma_memory_write(&address_space_memory, buf_addr + len, crc_ptr,
+                          4))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: Bus error writing packet\n",
+                      __func__);
+        emc_set_mista(emc, REG_MISTA_RXBERR);
+        emc_set_next_rx_descriptor(emc, &rx_desc, desc_addr);
+        emc_update_rx_irq(emc);
+        trace_npcm7xx_emc_rx_done(emc->regs[REG_CRXDSA]);
+        return len;
+    }
+
+    trace_npcm7xx_emc_received_packet(len);
+
+    /* Note: We've already verified len+4 <= 0xffff. */
+    rx_desc.status_and_length = len;
+    if (!(emc->regs[REG_MCMDR] & REG_MCMDR_SPCRC)) {
+        rx_desc.status_and_length += 4;
+    }
+    rx_desc.status_and_length |= RX_DESC_STATUS_RXGD;
+    emc_set_mista(emc, REG_MISTA_RXGD);
+
+    if (emc->regs[REG_MISTA] & emc->regs[REG_MIEN] & REG_MISTA_RXINTR) {
+        rx_desc.status_and_length |= RX_DESC_STATUS_RXINTR;
+    }
+    if (long_frame) {
+        rx_desc.status_and_length |= RX_DESC_STATUS_PTLE;
+    }
+
+    emc_set_next_rx_descriptor(emc, &rx_desc, desc_addr);
+    emc_update_rx_irq(emc);
+    trace_npcm7xx_emc_rx_done(emc->regs[REG_CRXDSA]);
+    return len;
+}
+
+static void emc_try_receive_next_packet(NPCM7xxEMCState *emc)
+{
+    if (emc_can_receive(qemu_get_queue(emc->nic))) {
+        qemu_flush_queued_packets(qemu_get_queue(emc->nic));
+    }
+}
+
+static uint64_t npcm7xx_emc_read(void *opaque, hwaddr offset, unsigned size)
+{
+    NPCM7xxEMCState *emc = opaque;
+    uint32_t reg = offset / sizeof(uint32_t);
+    uint32_t result;
+
+    if (reg >= NPCM7XX_NUM_EMC_REGS) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: Invalid offset 0x%04" HWADDR_PRIx "\n",
+                      __func__, offset);
+        return 0;
+    }
+
+    switch (reg) {
+    case REG_MIID:
+        /*
+         * We don't implement MII. For determinism, always return zero as
+         * writes record the last value written for debugging purposes.
+         */
+        qemu_log_mask(LOG_UNIMP, "%s: Read of MIID, returning 0\n", __func__);
+        result = 0;
+        break;
+    case REG_TSDR:
+    case REG_RSDR:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: Read of write-only reg, %s/%d\n",
+                      __func__, emc_reg_name(reg), reg);
+        return 0;
+    default:
+        result = emc->regs[reg];
+        break;
+    }
+
+    trace_npcm7xx_emc_reg_read(emc->emc_num, result, emc_reg_name(reg), reg);
+    return result;
+}
+
+static void npcm7xx_emc_write(void *opaque, hwaddr offset,
+                              uint64_t v, unsigned size)
+{
+    NPCM7xxEMCState *emc = opaque;
+    uint32_t reg = offset / sizeof(uint32_t);
+    uint32_t value = v;
+
+    g_assert(size == sizeof(uint32_t));
+
+    if (reg >= NPCM7XX_NUM_EMC_REGS) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: Invalid offset 0x%04" HWADDR_PRIx "\n",
+                      __func__, offset);
+        return;
+    }
+
+    trace_npcm7xx_emc_reg_write(emc->emc_num, emc_reg_name(reg), reg, value);
+
+    switch (reg) {
+    case REG_CAMCMR:
+        emc->regs[reg] = value;
+        break;
+    case REG_CAMEN:
+        /* Only CAM0 is supported, don't pretend otherwise. */
+        if (value & ~1) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s: Only CAM0 is supported, cannot enable others"
+                          ": 0x%x\n",
+                          __func__, value);
+        }
+        emc->regs[reg] = value & 1;
+        break;
+    case REG_CAMM_BASE + 0:
+        emc->regs[reg] = value;
+        emc->conf.macaddr.a[0] = value >> 24;
+        emc->conf.macaddr.a[1] = value >> 16;
+        emc->conf.macaddr.a[2] = value >> 8;
+        emc->conf.macaddr.a[3] = value >> 0;
+        break;
+    case REG_CAML_BASE + 0:
+        emc->regs[reg] = value;
+        emc->conf.macaddr.a[4] = value >> 24;
+        emc->conf.macaddr.a[5] = value >> 16;
+        break;
+    case REG_MCMDR: {
+        uint32_t prev;
+        if (value & REG_MCMDR_SWR) {
+            emc_soft_reset(emc);
+            /* On h/w the reset happens over multiple cycles. For now KISS. */
+            break;
+        }
+        prev = emc->regs[reg];
+        emc->regs[reg] = value;
+        /* Update tx state. */
+        if (!(prev & REG_MCMDR_TXON) &&
+            (value & REG_MCMDR_TXON)) {
+            emc->regs[REG_CTXDSA] = emc->regs[REG_TXDLSA];
+            /*
+             * Linux kernel turns TX on with CPU still holding descriptor,
+             * which suggests we should wait for a write to TSDR before trying
+             * to send a packet: so we don't send one here.
+             */
+        } else if ((prev & REG_MCMDR_TXON) &&
+                   !(value & REG_MCMDR_TXON)) {
+            emc->regs[REG_MGSTA] |= REG_MGSTA_TXHA;
+        }
+        if (!(value & REG_MCMDR_TXON)) {
+            emc_halt_tx(emc, 0);
+        }
+        /* Update rx state. */
+        if (!(prev & REG_MCMDR_RXON) &&
+            (value & REG_MCMDR_RXON)) {
+            emc->regs[REG_CRXDSA] = emc->regs[REG_RXDLSA];
+        } else if ((prev & REG_MCMDR_RXON) &&
+                   !(value & REG_MCMDR_RXON)) {
+            emc->regs[REG_MGSTA] |= REG_MGSTA_RXHA;
+        }
+        if (!(value & REG_MCMDR_RXON)) {
+            emc_halt_rx(emc, 0);
+        }
+        break;
+    }
+    case REG_TXDLSA:
+    case REG_RXDLSA:
+    case REG_DMARFC:
+    case REG_MIID:
+        emc->regs[reg] = value;
+        break;
+    case REG_MIEN:
+        emc->regs[reg] = value;
+        emc_update_irq_from_reg_change(emc);
+        break;
+    case REG_MISTA:
+        /* Clear the bits that have 1 in "value". */
+        emc->regs[reg] &= ~value;
+        emc_update_irq_from_reg_change(emc);
+        break;
+    case REG_MGSTA:
+        /* Clear the bits that have 1 in "value". */
+        emc->regs[reg] &= ~value;
+        break;
+    case REG_TSDR:
+        if (emc->regs[REG_MCMDR] & REG_MCMDR_TXON) {
+            emc->tx_active = true;
+            /* Keep trying to send packets until we run out. */
+            while (emc->tx_active) {
+                emc_try_send_next_packet(emc);
+            }
+        }
+        break;
+    case REG_RSDR:
+        if (emc->regs[REG_MCMDR] & REG_MCMDR_RXON) {
+            emc->rx_active = true;
+            emc_try_receive_next_packet(emc);
+        }
+        break;
+    case REG_MIIDA:
+        emc->regs[reg] = value & ~REG_MIIDA_BUSY;
+        break;
+    case REG_MRPC:
+    case REG_MRPCC:
+    case REG_MREPC:
+    case REG_CTXDSA:
+    case REG_CTXBSA:
+    case REG_CRXDSA:
+    case REG_CRXBSA:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: Write to read-only reg %s/%d\n",
+                      __func__, emc_reg_name(reg), reg);
+        break;
+    default:
+        qemu_log_mask(LOG_UNIMP, "%s: Write to unimplemented reg %s/%d\n",
+                      __func__, emc_reg_name(reg), reg);
+        break;
+    }
+}
+
+static const struct MemoryRegionOps npcm7xx_emc_ops = {
+    .read = npcm7xx_emc_read,
+    .write = npcm7xx_emc_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+        .unaligned = false,
+    },
+};
+
+static void emc_cleanup(NetClientState *nc)
+{
+    /* Nothing to do yet. */
+}
+
+static NetClientInfo net_npcm7xx_emc_info = {
+    .type = NET_CLIENT_DRIVER_NIC,
+    .size = sizeof(NICState),
+    .can_receive = emc_can_receive,
+    .receive = emc_receive,
+    .cleanup = emc_cleanup,
+    .link_status_changed = emc_set_link,
+};
+
+static void npcm7xx_emc_realize(DeviceState *dev, Error **errp)
+{
+    NPCM7xxEMCState *emc = NPCM7XX_EMC(dev);
+    SysBusDevice *sbd = SYS_BUS_DEVICE(emc);
+
+    memory_region_init_io(&emc->iomem, OBJECT(emc), &npcm7xx_emc_ops, emc,
+                          TYPE_NPCM7XX_EMC, 4 * KiB);
+    sysbus_init_mmio(sbd, &emc->iomem);
+    sysbus_init_irq(sbd, &emc->tx_irq);
+    sysbus_init_irq(sbd, &emc->rx_irq);
+
+    qemu_macaddr_default_if_unset(&emc->conf.macaddr);
+    emc->nic = qemu_new_nic(&net_npcm7xx_emc_info, &emc->conf,
+                            object_get_typename(OBJECT(dev)), dev->id, emc);
+    qemu_format_nic_info_str(qemu_get_queue(emc->nic), emc->conf.macaddr.a);
+}
+
+static void npcm7xx_emc_unrealize(DeviceState *dev)
+{
+    NPCM7xxEMCState *emc = NPCM7XX_EMC(dev);
+
+    qemu_del_nic(emc->nic);
+}
+
+static const VMStateDescription vmstate_npcm7xx_emc = {
+    .name = TYPE_NPCM7XX_EMC,
+    .version_id = 0,
+    .minimum_version_id = 0,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8(emc_num, NPCM7xxEMCState),
+        VMSTATE_UINT32_ARRAY(regs, NPCM7xxEMCState, NPCM7XX_NUM_EMC_REGS),
+        VMSTATE_BOOL(tx_active, NPCM7xxEMCState),
+        VMSTATE_BOOL(rx_active, NPCM7xxEMCState),
+        VMSTATE_END_OF_LIST(),
+    },
+};
+
+static Property npcm7xx_emc_properties[] = {
+    DEFINE_NIC_PROPERTIES(NPCM7xxEMCState, conf),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void npcm7xx_emc_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
+    dc->desc = "NPCM7xx EMC Controller";
+    dc->realize = npcm7xx_emc_realize;
+    dc->unrealize = npcm7xx_emc_unrealize;
+    dc->reset = npcm7xx_emc_reset;
+    dc->vmsd = &vmstate_npcm7xx_emc;
+    device_class_set_props(dc, npcm7xx_emc_properties);
+}
+
+static const TypeInfo npcm7xx_emc_info = {
+    .name = TYPE_NPCM7XX_EMC,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(NPCM7xxEMCState),
+    .class_init = npcm7xx_emc_class_init,
+};
+
+static void npcm7xx_emc_register_type(void)
+{
+    type_register_static(&npcm7xx_emc_info);
+}
+
+type_init(npcm7xx_emc_register_type)
diff --git a/hw/net/meson.build b/hw/net/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/meson.build
+++ b/hw/net/meson.build
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_I82596_COMMON', if_true: files('i82596.c'))
 softmmu_ss.add(when: 'CONFIG_SUNHME', if_true: files('sunhme.c'))
 softmmu_ss.add(when: 'CONFIG_FTGMAC100', if_true: files('ftgmac100.c'))
 softmmu_ss.add(when: 'CONFIG_SUNGEM', if_true: files('sungem.c'))
+softmmu_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c'))
 
 softmmu_ss.add(when: 'CONFIG_ETRAXFS', if_true: files('etraxfs_eth.c'))
 softmmu_ss.add(when: 'CONFIG_COLDFIRE', if_true: files('mcf_fec.c'))
diff --git a/hw/net/trace-events b/hw/net/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -XXX,XX +XXX,XX @@ imx_fec_receive_last(int last) "rx frame flags 0x%04x"
 imx_enet_receive(size_t size) "len %zu"
 imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
 imx_enet_receive_last(int last) "rx frame flags 0x%04x"
+
+# npcm7xx_emc.c
+npcm7xx_emc_reset(int emc_num) "Resetting emc%d"
+npcm7xx_emc_update_tx_irq(int level) "Setting tx irq to %d"
+npcm7xx_emc_update_rx_irq(int level) "Setting rx irq to %d"
+npcm7xx_emc_set_mista(uint32_t flags) "ORing 0x%x into MISTA"
+npcm7xx_emc_cpu_owned_desc(uint32_t addr) "Can't process cpu-owned descriptor @0x%x"
+npcm7xx_emc_sent_packet(uint32_t len) "Sent %u byte packet"
+npcm7xx_emc_tx_done(uint32_t ctxdsa) "TX done, CTXDSA=0x%x"
+npcm7xx_emc_can_receive(int can_receive) "Can receive: %d"
+npcm7xx_emc_packet_filtered_out(const char* fail_reason) "Packet filtered out: %s"
+npcm7xx_emc_packet_dropped(uint32_t len) "%u byte packet dropped"
+npcm7xx_emc_receiving_packet(uint32_t len) "Receiving %u byte packet"
+npcm7xx_emc_received_packet(uint32_t len) "Received %u byte packet"
+npcm7xx_emc_rx_done(uint32_t crxdsa) "RX done, CRXDSA=0x%x"
+npcm7xx_emc_reg_read(int emc_num, uint32_t result, const char *name, int regno) "emc%d: 0x%x = reg[%s/%d]"
+npcm7xx_emc_reg_write(int emc_num, const char *name, int regno, uint32_t value) "emc%d: reg[%s/%d] = 0x%x"
-- 
2.20.1

From: Doug Evans <dje@google.com>

This is a 10/100 ethernet device that has several features.
Only the ones needed by the Linux driver have been implemented.
See npcm7xx_emc.c for a list of unimplemented features.

Reviewed-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Avi Fishman <avi.fishman@nuvoton.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Doug Evans <dje@google.com>
Message-id: 20210218212453.831406-3-dje@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/nuvoton.rst |  3 ++-
 include/hw/arm/npcm7xx.h    |  2 ++
 hw/arm/npcm7xx.c            | 50 +++++++++++++++++++++++++++++++++++--
 3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -XXX,XX +XXX,XX @@ Supported devices
  * Analog to Digital Converter (ADC)
  * Pulse Width Modulation (PWM)
  * SMBus controller (SMBF)
+ * Ethernet controller (EMC)
 
 Missing devices
 ---------------
@@ -XXX,XX +XXX,XX @@ Missing devices
    * Shared memory (SHM)
    * eSPI slave interface
 
- * Ethernet controllers (GMAC and EMC)
+ * Ethernet controller (GMAC)
  * USB device (USBD)
  * Peripheral SPI controller (PSPI)
  * SD/MMC host
diff --git a/include/hw/arm/npcm7xx.h b/include/hw/arm/npcm7xx.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/npcm7xx.h
+++ b/include/hw/arm/npcm7xx.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/misc/npcm7xx_gcr.h"
 #include "hw/misc/npcm7xx_pwm.h"
 #include "hw/misc/npcm7xx_rng.h"
+#include "hw/net/npcm7xx_emc.h"
 #include "hw/nvram/npcm7xx_otp.h"
 #include "hw/timer/npcm7xx_timer.h"
 #include "hw/ssi/npcm7xx_fiu.h"
@@ -XXX,XX +XXX,XX @@ typedef struct NPCM7xxState {
     EHCISysBusState     ehci;
     OHCISysBusState     ohci;
     NPCM7xxFIUState     fiu[2];
+    NPCM7xxEMCState     emc[2];
 } NPCM7xxState;
 
 #define TYPE_NPCM7XX    "npcm7xx"
diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -XXX,XX +XXX,XX @@ enum NPCM7xxInterrupt {
     NPCM7XX_UART1_IRQ,
     NPCM7XX_UART2_IRQ,
     NPCM7XX_UART3_IRQ,
+    NPCM7XX_EMC1RX_IRQ          = 15,
+    NPCM7XX_EMC1TX_IRQ,
     NPCM7XX_TIMER0_IRQ          = 32,   /* Timer Module 0 */
     NPCM7XX_TIMER1_IRQ,
     NPCM7XX_TIMER2_IRQ,
@@ -XXX,XX +XXX,XX @@ enum NPCM7xxInterrupt {
     NPCM7XX_SMBUS15_IRQ,
     NPCM7XX_PWM0_IRQ            = 93,   /* PWM module 0 */
     NPCM7XX_PWM1_IRQ,                   /* PWM module 1 */
+    NPCM7XX_EMC2RX_IRQ          = 114,
+    NPCM7XX_EMC2TX_IRQ,
     NPCM7XX_GPIO0_IRQ           = 116,
     NPCM7XX_GPIO1_IRQ,
     NPCM7XX_GPIO2_IRQ,
@@ -XXX,XX +XXX,XX @@ static const hwaddr npcm7xx_smbus_addr[] = {
     0xf008f000,
 };
 
+/* Register base address for each EMC Module */
+static const hwaddr npcm7xx_emc_addr[] = {
+    0xf0825000,
+    0xf0826000,
+};
+
 static const struct {
     hwaddr regs_addr;
     uint32_t unconnected_pins;
@@ -XXX,XX +XXX,XX @@ static void npcm7xx_init(Object *obj)
     for (i = 0; i < ARRAY_SIZE(s->pwm); i++) {
         object_initialize_child(obj, "pwm[*]", &s->pwm[i], TYPE_NPCM7XX_PWM);
     }
+
+    for (i = 0; i < ARRAY_SIZE(s->emc); i++) {
+        object_initialize_child(obj, "emc[*]", &s->emc[i], TYPE_NPCM7XX_EMC);
+    }
 }
 
 static void npcm7xx_realize(DeviceState *dev, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
         sysbus_connect_irq(sbd, i, npcm7xx_irq(s, NPCM7XX_PWM0_IRQ + i));
     }
 
+    /*
+     * EMC Modules. Cannot fail.
+     * The mapping of the device to its netdev backend works as follows:
+     * emc[i] = nd_table[i]
+     * This works around the inability to specify the netdev property for the
+     * emc device: it's not pluggable and thus the -device option can't be
+     * used.
+     */
+    QEMU_BUILD_BUG_ON(ARRAY_SIZE(npcm7xx_emc_addr) != ARRAY_SIZE(s->emc));
+    QEMU_BUILD_BUG_ON(ARRAY_SIZE(s->emc) != 2);
+    for (i = 0; i < ARRAY_SIZE(s->emc); i++) {
+        s->emc[i].emc_num = i;
+        SysBusDevice *sbd = SYS_BUS_DEVICE(&s->emc[i]);
+        if (nd_table[i].used) {
+            qemu_check_nic_model(&nd_table[i], TYPE_NPCM7XX_EMC);
+            qdev_set_nic_properties(DEVICE(sbd), &nd_table[i]);
+        }
+        /*
+         * The device exists regardless of whether it's connected to a QEMU
+         * netdev backend. So always instantiate it even if there is no
+         * backend.
+         */
+        sysbus_realize(sbd, &error_abort);
+        sysbus_mmio_map(sbd, 0, npcm7xx_emc_addr[i]);
+        int tx_irq = i == 0 ? NPCM7XX_EMC1TX_IRQ : NPCM7XX_EMC2TX_IRQ;
+        int rx_irq = i == 0 ? NPCM7XX_EMC1RX_IRQ : NPCM7XX_EMC2RX_IRQ;
+        /*
+         * N.B. The values for the second argument sysbus_connect_irq are
+         * chosen to match the registration order in npcm7xx_emc_realize.
+         */
+        sysbus_connect_irq(sbd, 0, npcm7xx_irq(s, tx_irq));
+        sysbus_connect_irq(sbd, 1, npcm7xx_irq(s, rx_irq));
+    }
+
     /*
      * Flash Interface Unit (FIU). Can fail if incorrect number of chip selects
      * specified, but this is a programming error.
@@ -XXX,XX +XXX,XX @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
     create_unimplemented_device("npcm7xx.vcd",          0xf0810000,  64 * KiB);
     create_unimplemented_device("npcm7xx.ece",          0xf0820000,   8 * KiB);
     create_unimplemented_device("npcm7xx.vdma",         0xf0822000,   8 * KiB);
-    create_unimplemented_device("npcm7xx.emc1",         0xf0825000,   4 * KiB);
-    create_unimplemented_device("npcm7xx.emc2",         0xf0826000,   4 * KiB);
     create_unimplemented_device("npcm7xx.usbd[0]",      0xf0830000,   4 * KiB);
     create_unimplemented_device("npcm7xx.usbd[1]",      0xf0831000,   4 * KiB);
     create_unimplemented_device("npcm7xx.usbd[2]",      0xf0832000,   4 * KiB);
-- 
2.20.1

From: Doug Evans <dje@google.com>

Reviewed-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Avi Fishman <avi.fishman@nuvoton.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Doug Evans <dje@google.com>
Message-id: 20210218212453.831406-4-dje@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/npcm7xx_emc-test.c | 862 +++++++++++++++++++++++++++++++++
 tests/qtest/meson.build        |   3 +-
 2 files changed, 864 insertions(+), 1 deletion(-)
 create mode 100644 tests/qtest/npcm7xx_emc-test.c

diff --git a/tests/qtest/npcm7xx_emc-test.c b/tests/qtest/npcm7xx_emc-test.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qtest/npcm7xx_emc-test.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * QTests for Nuvoton NPCM7xx EMC Modules.
+ *
+ * Copyright 2020 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "libqos/libqos.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qnum.h"
+#include "qemu/bitops.h"
+#include "qemu/iov.h"
+
+/* Name of the emc device. */
+#define TYPE_NPCM7XX_EMC "npcm7xx-emc"
+
+/* Timeout for various operations, in seconds. */
+#define TIMEOUT_SECONDS 10
+
+/* Address in memory of the descriptor. */
+#define DESC_ADDR (1 << 20) /* 1 MiB */
+
+/* Address in memory of the data packet. */
+#define DATA_ADDR (DESC_ADDR + 4096)
+
+#define CRC_LENGTH 4
+
+#define NUM_TX_DESCRIPTORS 3
+#define NUM_RX_DESCRIPTORS 2
+
+/* Size of tx,rx test buffers. */
+#define TX_DATA_LEN 64
+#define RX_DATA_LEN 64
+
+#define TX_STEP_COUNT 10000
+#define RX_STEP_COUNT 10000
+
+/* 32-bit register indices. */
+typedef enum NPCM7xxPWMRegister {
+    /* Control registers. */
+    REG_CAMCMR,
+    REG_CAMEN,
+
+    /* There are 16 CAMn[ML] registers. */
+    REG_CAMM_BASE,
+    REG_CAML_BASE,
+
+    REG_TXDLSA = 0x22,
+    REG_RXDLSA,
+    REG_MCMDR,
+    REG_MIID,
+    REG_MIIDA,
+    REG_FFTCR,
+    REG_TSDR,
+    REG_RSDR,
+    REG_DMARFC,
+    REG_MIEN,
+
+    /* Status registers. */
+    REG_MISTA,
+    REG_MGSTA,
+    REG_MPCNT,
+    REG_MRPC,
+    REG_MRPCC,
+    REG_MREPC,
+    REG_DMARFS,
+    REG_CTXDSA,
+    REG_CTXBSA,
+    REG_CRXDSA,
+    REG_CRXBSA,
+
+    NPCM7XX_NUM_EMC_REGS,
+} NPCM7xxPWMRegister;
+
+enum { NUM_CAMML_REGS = 16 };
+
+/* REG_CAMCMR fields */
+/* Enable CAM Compare */
+#define REG_CAMCMR_ECMP (1 << 4)
+/* Accept Unicast Packet */
+#define REG_CAMCMR_AUP (1 << 0)
+
+/* REG_MCMDR fields */
+/* Software Reset */
+#define REG_MCMDR_SWR (1 << 24)
+/* Frame Transmission On */
+#define REG_MCMDR_TXON (1 << 8)
+/* Accept Long Packet */
+#define REG_MCMDR_ALP (1 << 1)
+/* Frame Reception On */
+#define REG_MCMDR_RXON (1 << 0)
+
+/* REG_MIEN fields */
+/* Enable Transmit Completion Interrupt */
+#define REG_MIEN_ENTXCP (1 << 18)
+/* Enable Transmit Interrupt */
+#define REG_MIEN_ENTXINTR (1 << 16)
+/* Enable Receive Good Interrupt */
+#define REG_MIEN_ENRXGD (1 << 4)
+/* ENable Receive Interrupt */
+#define REG_MIEN_ENRXINTR (1 << 0)
+
+/* REG_MISTA fields */
+/* Transmit Bus Error Interrupt */
+#define REG_MISTA_TXBERR (1 << 24)
+/* Transmit Descriptor Unavailable Interrupt */
+#define REG_MISTA_TDU (1 << 23)
+/* Transmit Completion Interrupt */
+#define REG_MISTA_TXCP (1 << 18)
+/* Transmit Interrupt */
+#define REG_MISTA_TXINTR (1 << 16)
+/* Receive Bus Error Interrupt */
+#define REG_MISTA_RXBERR (1 << 11)
+/* Receive Descriptor Unavailable Interrupt */
+#define REG_MISTA_RDU (1 << 10)
+/* DMA Early Notification Interrupt */
+#define REG_MISTA_DENI (1 << 9)
+/* Maximum Frame Length Interrupt */
+#define REG_MISTA_DFOI (1 << 8)
+/* Receive Good Interrupt */
+#define REG_MISTA_RXGD (1 << 4)
+/* Packet Too Long Interrupt */
+#define REG_MISTA_PTLE (1 << 3)
+/* Receive Interrupt */
+#define REG_MISTA_RXINTR (1 << 0)
+
+typedef struct NPCM7xxEMCTxDesc NPCM7xxEMCTxDesc;
+typedef struct NPCM7xxEMCRxDesc NPCM7xxEMCRxDesc;
+
+struct NPCM7xxEMCTxDesc {
+    uint32_t flags;
+    uint32_t txbsa;
+    uint32_t status_and_length;
+    uint32_t ntxdsa;
+};
+
+struct NPCM7xxEMCRxDesc {
+    uint32_t status_and_length;
+    uint32_t rxbsa;
+    uint32_t reserved;
+    uint32_t nrxdsa;
+};
+
+/* NPCM7xxEMCTxDesc.flags values */
+/* Owner: 0 = cpu, 1 = emc */
+#define TX_DESC_FLAG_OWNER_MASK (1 << 31)
+/* Transmit interrupt enable */
+#define TX_DESC_FLAG_INTEN (1 << 2)
+
+/* NPCM7xxEMCTxDesc.status_and_length values */
+/* Transmission complete */
+#define TX_DESC_STATUS_TXCP (1 << 19)
+/* Transmit interrupt */
+#define TX_DESC_STATUS_TXINTR (1 << 16)
+
+/* NPCM7xxEMCRxDesc.status_and_length values */
+/* Owner: 0b00 = cpu, 0b10 = emc */
+#define RX_DESC_STATUS_OWNER_SHIFT 30
+#define RX_DESC_STATUS_OWNER_MASK 0xc0000000
+/* Frame Reception Complete */
+#define RX_DESC_STATUS_RXGD (1 << 20)
+/* Packet too long */
+#define RX_DESC_STATUS_PTLE (1 << 19)
+/* Receive Interrupt */
+#define RX_DESC_STATUS_RXINTR (1 << 16)
+
+#define RX_DESC_PKT_LEN(word) ((uint32_t) (word) & 0xffff)
+
+typedef struct EMCModule {
+    int rx_irq;
+    int tx_irq;
+    uint64_t base_addr;
+} EMCModule;
+
+typedef struct TestData {
+    const EMCModule *module;
+} TestData;
+
+static const EMCModule emc_module_list[] = {
+    {
+        .rx_irq     = 15,
+        .tx_irq     = 16,
+        .base_addr  = 0xf0825000
+    },
+    {
+        .rx_irq     = 114,
+        .tx_irq     = 115,
+        .base_addr  = 0xf0826000
+    }
+};
+
+/* Returns the index of the EMC module. */
+static int emc_module_index(const EMCModule *mod)
+{
+    ptrdiff_t diff = mod - emc_module_list;
+
+    g_assert_true(diff >= 0 && diff < ARRAY_SIZE(emc_module_list));
+
+    return diff;
+}
+
+static void packet_test_clear(void *sockets)
+{
+    int *test_sockets = sockets;
+
+    close(test_sockets[0]);
+    g_free(test_sockets);
+}
+
+static int *packet_test_init(int module_num, GString *cmd_line)
+{
+    int *test_sockets = g_new(int, 2);
+    int ret = socketpair(PF_UNIX, SOCK_STREAM, 0, test_sockets);
+    g_assert_cmpint(ret, != , -1);
+
+    /*
+     * KISS and use -nic. We specify two nics (both emc{0,1}) because there's
+     * currently no way to specify only emc1: The driver implicitly relies on
+     * emc[i] == nd_table[i].
+     */
+    if (module_num == 0) {
+        g_string_append_printf(cmd_line,
+                               " -nic socket,fd=%d,model=" TYPE_NPCM7XX_EMC " "
+                               " -nic user,model=" TYPE_NPCM7XX_EMC " ",
+                               test_sockets[1]);
+    } else {
+        g_string_append_printf(cmd_line,
+                               " -nic user,model=" TYPE_NPCM7XX_EMC " "
+                               " -nic socket,fd=%d,model=" TYPE_NPCM7XX_EMC " ",
+                               test_sockets[1]);
+    }
+
+    g_test_queue_destroy(packet_test_clear, test_sockets);
+    return test_sockets;
+}
+
+static uint32_t emc_read(QTestState *qts, const EMCModule *mod,
+                         NPCM7xxPWMRegister regno)
+{
+    return qtest_readl(qts, mod->base_addr + regno * sizeof(uint32_t));
+}
+
+static void emc_write(QTestState *qts, const EMCModule *mod,
+                      NPCM7xxPWMRegister regno, uint32_t value)
+{
+    qtest_writel(qts, mod->base_addr + regno * sizeof(uint32_t), value);
+}
+
+static void emc_read_tx_desc(QTestState *qts, uint32_t addr,
+                             NPCM7xxEMCTxDesc *desc)
+{
+    qtest_memread(qts, addr, desc, sizeof(*desc));
+    desc->flags = le32_to_cpu(desc->flags);
+    desc->txbsa = le32_to_cpu(desc->txbsa);
+    desc->status_and_length = le32_to_cpu(desc->status_and_length);
+    desc->ntxdsa = le32_to_cpu(desc->ntxdsa);
+}
+
+static void emc_write_tx_desc(QTestState *qts, const NPCM7xxEMCTxDesc *desc,
+                              uint32_t addr)
+{
+    NPCM7xxEMCTxDesc le_desc;
+
+    le_desc.flags = cpu_to_le32(desc->flags);
+    le_desc.txbsa = cpu_to_le32(desc->txbsa);
+    le_desc.status_and_length = cpu_to_le32(desc->status_and_length);
+    le_desc.ntxdsa = cpu_to_le32(desc->ntxdsa);
+    qtest_memwrite(qts, addr, &le_desc, sizeof(le_desc));
+}
+
+static void emc_read_rx_desc(QTestState *qts, uint32_t addr,
+                             NPCM7xxEMCRxDesc *desc)
+{
+    qtest_memread(qts, addr, desc, sizeof(*desc));
+    desc->status_and_length = le32_to_cpu(desc->status_and_length);
+    desc->rxbsa = le32_to_cpu(desc->rxbsa);
+    desc->reserved = le32_to_cpu(desc->reserved);
+    desc->nrxdsa = le32_to_cpu(desc->nrxdsa);
+}
+
+static void emc_write_rx_desc(QTestState *qts, const NPCM7xxEMCRxDesc *desc,
+                              uint32_t addr)
+{
+    NPCM7xxEMCRxDesc le_desc;
+
+    le_desc.status_and_length = cpu_to_le32(desc->status_and_length);
+    le_desc.rxbsa = cpu_to_le32(desc->rxbsa);
+    le_desc.reserved = cpu_to_le32(desc->reserved);
+    le_desc.nrxdsa = cpu_to_le32(desc->nrxdsa);
+    qtest_memwrite(qts, addr, &le_desc, sizeof(le_desc));
+}
+
+/*
+ * Reset the EMC module.
+ * The module must be reset before, e.g., TXDLSA,RXDLSA are changed.
+ */
+static bool emc_soft_reset(QTestState *qts, const EMCModule *mod)
+{
+    uint32_t val;
+    uint64_t end_time;
+
+    emc_write(qts, mod, REG_MCMDR, REG_MCMDR_SWR);
+
+    /*
+     * Wait for device to reset as the linux driver does.
+     * During reset the AHB reads 0 for all registers. So first wait for
+     * something that resets to non-zero, and then wait for SWR becoming 0.
+     */
+    end_time = g_get_monotonic_time() + TIMEOUT_SECONDS * G_TIME_SPAN_SECOND;
+
+    do {
+        qtest_clock_step(qts, 100);
+        val = emc_read(qts, mod, REG_FFTCR);
+    } while (val == 0 && g_get_monotonic_time() < end_time);
+    if (val != 0) {
+        do {
+            qtest_clock_step(qts, 100);
+            val = emc_read(qts, mod, REG_MCMDR);
+            if ((val & REG_MCMDR_SWR) == 0) {
+                /*
+                 * N.B. The CAMs have been reset here, so macaddr matching of
+                 * incoming packets will not work.
+                 */
+                return true;
+            }
+        } while (g_get_monotonic_time() < end_time);
+    }
+
+    g_message("%s: Timeout expired", __func__);
+    return false;
+}
+
+/* Check emc registers are reset to default value. */
+static void test_init(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    const EMCModule *mod = td->module;
+    QTestState *qts = qtest_init("-machine quanta-gsj");
+    int i;
+
+#define CHECK_REG(regno, value) \
+  do { \
+    g_assert_cmphex(emc_read(qts, mod, (regno)), ==, (value)); \
+  } while (0)
+
+    CHECK_REG(REG_CAMCMR, 0);
+    CHECK_REG(REG_CAMEN, 0);
+    CHECK_REG(REG_TXDLSA, 0xfffffffc);
+    CHECK_REG(REG_RXDLSA, 0xfffffffc);
+    CHECK_REG(REG_MCMDR, 0);
+    CHECK_REG(REG_MIID, 0);
+    CHECK_REG(REG_MIIDA, 0x00900000);
+    CHECK_REG(REG_FFTCR, 0x0101);
+    CHECK_REG(REG_DMARFC, 0x0800);
+    CHECK_REG(REG_MIEN, 0);
+    CHECK_REG(REG_MISTA, 0);
+    CHECK_REG(REG_MGSTA, 0);
+    CHECK_REG(REG_MPCNT, 0x7fff);
+    CHECK_REG(REG_MRPC, 0);
+    CHECK_REG(REG_MRPCC, 0);
+    CHECK_REG(REG_MREPC, 0);
+    CHECK_REG(REG_DMARFS, 0);
+    CHECK_REG(REG_CTXDSA, 0);
+    CHECK_REG(REG_CTXBSA, 0);
+    CHECK_REG(REG_CRXDSA, 0);
+    CHECK_REG(REG_CRXBSA, 0);
+
+#undef CHECK_REG
+
+    for (i = 0; i < NUM_CAMML_REGS; ++i) {
+        g_assert_cmpuint(emc_read(qts, mod, REG_CAMM_BASE + i * 2), ==,
+                         0);
+        g_assert_cmpuint(emc_read(qts, mod, REG_CAML_BASE + i * 2), ==,
+                         0);
+    }
+
+    qtest_quit(qts);
+}
+
+static bool emc_wait_irq(QTestState *qts, const EMCModule *mod, int step,
+                         bool is_tx)
+{
+    uint64_t end_time =
+        g_get_monotonic_time() + TIMEOUT_SECONDS * G_TIME_SPAN_SECOND;
+
+    do {
+        if (qtest_get_irq(qts, is_tx ? mod->tx_irq : mod->rx_irq)) {
+            return true;
+        }
+        qtest_clock_step(qts, step);
+    } while (g_get_monotonic_time() < end_time);
+
+    g_message("%s: Timeout expired", __func__);
+    return false;
+}
+
+static bool emc_wait_mista(QTestState *qts, const EMCModule *mod, int step,
+                           uint32_t flag)
+{
+    uint64_t end_time =
+        g_get_monotonic_time() + TIMEOUT_SECONDS * G_TIME_SPAN_SECOND;
+
+    do {
+        uint32_t mista = emc_read(qts, mod, REG_MISTA);
+        if (mista & flag) {
+            return true;
+        }
+        qtest_clock_step(qts, step);
+    } while (g_get_monotonic_time() < end_time);
+
+    g_message("%s: Timeout expired", __func__);
+    return false;
+}
+
+static bool wait_socket_readable(int fd)
+{
+    fd_set read_fds;
+    struct timeval tv;
+    int rv;
+
+    FD_ZERO(&read_fds);
+    FD_SET(fd, &read_fds);
+    tv.tv_sec = TIMEOUT_SECONDS;
+    tv.tv_usec = 0;
+    rv = select(fd + 1, &read_fds, NULL, NULL, &tv);
+    if (rv == -1) {
+        perror("select");
+    } else if (rv == 0) {
+        g_message("%s: Timeout expired", __func__);
+    }
+    return rv == 1;
+}
+
+/* Initialize *desc (in host endian format). */
+static void init_tx_desc(NPCM7xxEMCTxDesc *desc, size_t count,
+                         uint32_t desc_addr)
+{
+    g_assert(count >= 2);
+    memset(&desc[0], 0, sizeof(*desc) * count);
+    /* Leave the last one alone, owned by the cpu -> stops transmission. */
+    for (size_t i = 0; i < count - 1; ++i) {
+        desc[i].flags =
+            (TX_DESC_FLAG_OWNER_MASK | /* owner = 1: emc */
+             TX_DESC_FLAG_INTEN |
+             0 | /* crc append = 0 */
+             0 /* padding enable = 0 */);
+        desc[i].status_and_length =
+            (0 | /* collision count = 0 */
+             0 | /* SQE = 0 */
+             0 | /* PAU = 0 */
+             0 | /* TXHA = 0 */
+             0 | /* LC = 0 */
+             0 | /* TXABT = 0 */
+             0 | /* NCS = 0 */
+             0 | /* EXDEF = 0 */
+             0 | /* TXCP = 0 */
+             0 | /* DEF = 0 */
+             0 | /* TXINTR = 0 */
+             0 /* length filled in later */);
+        desc[i].ntxdsa = desc_addr + (i + 1) * sizeof(*desc);
+    }
+}
+
+static void enable_tx(QTestState *qts, const EMCModule *mod,
+                      const NPCM7xxEMCTxDesc *desc, size_t count,
+                      uint32_t desc_addr, uint32_t mien_flags)
+{
+    /* Write the descriptors to guest memory. */
+    for (size_t i = 0; i < count; ++i) {
+        emc_write_tx_desc(qts, desc + i, desc_addr + i * sizeof(*desc));
+    }
+
+    /* Trigger sending the packet. */
+    /* The module must be reset before changing TXDLSA. */
+    g_assert(emc_soft_reset(qts, mod));
+    emc_write(qts, mod, REG_TXDLSA, desc_addr);
+    emc_write(qts, mod, REG_CTXDSA, ~0);
+    emc_write(qts, mod, REG_MIEN, REG_MIEN_ENTXCP | mien_flags);
+    {
+        uint32_t mcmdr = emc_read(qts, mod, REG_MCMDR);
+        mcmdr |= REG_MCMDR_TXON;
+        emc_write(qts, mod, REG_MCMDR, mcmdr);
+    }
+
+    /* Prod the device to send the packet. */
+    emc_write(qts, mod, REG_TSDR, 1);
+}
+
+static void emc_send_verify1(QTestState *qts, const EMCModule *mod, int fd,
+                             bool with_irq, uint32_t desc_addr,
+                             uint32_t next_desc_addr,
+                             const char *test_data, int test_size)
+{
+    NPCM7xxEMCTxDesc result_desc;
+    uint32_t expected_mask, expected_value, recv_len;
+    int ret;
+    char buffer[TX_DATA_LEN];
+
+    g_assert(wait_socket_readable(fd));
+
+    /* Read the descriptor back. */
+    emc_read_tx_desc(qts, desc_addr, &result_desc);
+    /* Descriptor should be owned by cpu now. */
+    g_assert((result_desc.flags & TX_DESC_FLAG_OWNER_MASK) == 0);
+    /* Test the status bits, ignoring the length field. */
+    expected_mask = 0xffff << 16;
+    expected_value = TX_DESC_STATUS_TXCP;
+    if (with_irq) {
+        expected_value |= TX_DESC_STATUS_TXINTR;
+    }
+    g_assert_cmphex((result_desc.status_and_length & expected_mask), ==,
+                    expected_value);
+
+    /* Check data sent to the backend. */
+    recv_len = ~0;
+    ret = qemu_recv(fd, &recv_len, sizeof(recv_len), MSG_DONTWAIT);
+    g_assert_cmpint(ret, == , sizeof(recv_len));
+
+    g_assert(wait_socket_readable(fd));
+    memset(buffer, 0xff, sizeof(buffer));
+    ret = qemu_recv(fd, buffer, test_size, MSG_DONTWAIT);
+    g_assert_cmpmem(buffer, ret, test_data, test_size);
+}
+
+static void emc_send_verify(QTestState *qts, const EMCModule *mod, int fd,
+                            bool with_irq)
+{
+    NPCM7xxEMCTxDesc desc[NUM_TX_DESCRIPTORS];
+    uint32_t desc_addr = DESC_ADDR;
+    static const char test1_data[] = "TEST1";
+    static const char test2_data[] = "Testing 1 2 3 ...";
+    uint32_t data1_addr = DATA_ADDR;
+    uint32_t data2_addr = data1_addr + sizeof(test1_data);
+    bool got_tdu;
+    uint32_t end_desc_addr;
+
+    /* Prepare test data buffer. */
+    qtest_memwrite(qts, data1_addr, test1_data, sizeof(test1_data));
+    qtest_memwrite(qts, data2_addr, test2_data, sizeof(test2_data));
+
+    init_tx_desc(&desc[0], NUM_TX_DESCRIPTORS, desc_addr);
+    desc[0].txbsa = data1_addr;
+    desc[0].status_and_length |= sizeof(test1_data);
+    desc[1].txbsa = data2_addr;
+    desc[1].status_and_length |= sizeof(test2_data);
+
+    enable_tx(qts, mod, &desc[0], NUM_TX_DESCRIPTORS, desc_addr,
+              with_irq ? REG_MIEN_ENTXINTR : 0);
+
+    /*
+     * It's problematic to observe the interrupt for each packet.
+     * Instead just wait until all the packets go out.
+     */
+    got_tdu = false;
+    while (!got_tdu) {
+        if (with_irq) {
+            g_assert_true(emc_wait_irq(qts, mod, TX_STEP_COUNT,
+                                       /*is_tx=*/true));
+        } else {
+            g_assert_true(emc_wait_mista(qts, mod, TX_STEP_COUNT,
+                                         REG_MISTA_TXINTR));
+        }
+        got_tdu = !!(emc_read(qts, mod, REG_MISTA) & REG_MISTA_TDU);
+        /* If we don't have TDU yet, reset the interrupt. */
+        if (!got_tdu) {
+            emc_write(qts, mod, REG_MISTA,
+                      emc_read(qts, mod, REG_MISTA) & 0xffff0000);
+        }
+    }
+
+    end_desc_addr = desc_addr + 2 * sizeof(desc[0]);
+    g_assert_cmphex(emc_read(qts, mod, REG_CTXDSA), ==, end_desc_addr);
+    g_assert_cmphex(emc_read(qts, mod, REG_MISTA), ==,
+                    REG_MISTA_TXCP | REG_MISTA_TXINTR | REG_MISTA_TDU);
+
+    emc_send_verify1(qts, mod, fd, with_irq,
+                     desc_addr, end_desc_addr,
+                     test1_data, sizeof(test1_data));
+    emc_send_verify1(qts, mod, fd, with_irq,
+                     desc_addr + sizeof(desc[0]), end_desc_addr,
+                     test2_data, sizeof(test2_data));
+}
+
+/* Initialize *desc (in host endian format). */
+static void init_rx_desc(NPCM7xxEMCRxDesc *desc, size_t count,
+                         uint32_t desc_addr, uint32_t data_addr)
+{
+    g_assert_true(count >= 2);
+    memset(desc, 0, sizeof(*desc) * count);
+    desc[0].rxbsa = data_addr;
+    desc[0].status_and_length =
+        (0b10 << RX_DESC_STATUS_OWNER_SHIFT | /* owner = 10: emc */
+         0 | /* RP = 0 */
+         0 | /* ALIE = 0 */
+         0 | /* RXGD = 0 */
+         0 | /* PTLE = 0 */
+         0 | /* CRCE = 0 */
+         0 | /* RXINTR = 0 */
+         0   /* length (filled in later) */);
+    /* Leave the last one alone, owned by the cpu -> stops transmission. */
+    desc[0].nrxdsa = desc_addr + sizeof(*desc);
+}
+
+static void enable_rx(QTestState *qts, const EMCModule *mod,
+                      const NPCM7xxEMCRxDesc *desc, size_t count,
+                      uint32_t desc_addr, uint32_t mien_flags,
+                      uint32_t mcmdr_flags)
+{
+    /*
+     * Write the descriptor to guest memory.
+     * FWIW, IWBN if the docs said the buffer needs to be at least DMARFC
+     * bytes.
+     */
+    for (size_t i = 0; i < count; ++i) {
+        emc_write_rx_desc(qts, desc + i, desc_addr + i * sizeof(*desc));
+    }
+
+    /* Trigger receiving the packet. */
+    /* The module must be reset before changing RXDLSA. */
+    g_assert(emc_soft_reset(qts, mod));
+    emc_write(qts, mod, REG_RXDLSA, desc_addr);
+    emc_write(qts, mod, REG_MIEN, REG_MIEN_ENRXGD | mien_flags);
+
+    /*
+     * We don't know what the device's macaddr is, so just accept all
+     * unicast packets (AUP).
+     */
+    emc_write(qts, mod, REG_CAMCMR, REG_CAMCMR_AUP);
+    emc_write(qts, mod, REG_CAMEN, 1 << 0);
+    {
+        uint32_t mcmdr = emc_read(qts, mod, REG_MCMDR);
+        mcmdr |= REG_MCMDR_RXON | mcmdr_flags;
+        emc_write(qts, mod, REG_MCMDR, mcmdr);
+    }
+
+    /* Prod the device to accept a packet. */
+    emc_write(qts, mod, REG_RSDR, 1);
+}
+
+static void emc_recv_verify(QTestState *qts, const EMCModule *mod, int fd,
+                            bool with_irq)
+{
+    NPCM7xxEMCRxDesc desc[NUM_RX_DESCRIPTORS];
+    uint32_t desc_addr = DESC_ADDR;
+    uint32_t data_addr = DATA_ADDR;
+    int ret;
+    uint32_t expected_mask, expected_value;
+    NPCM7xxEMCRxDesc result_desc;
+
+    /* Prepare test data buffer. */
+    const char test[RX_DATA_LEN] = "TEST";
+    int len = htonl(sizeof(test));
+    const struct iovec iov[] = {
+        {
+            .iov_base = &len,
+            .iov_len = sizeof(len),
+        },{
+            .iov_base = (char *) test,
+            .iov_len = sizeof(test),
+        },
+    };
+
+    /*
+     * Reset the device BEFORE sending a test packet, otherwise the packet
+     * may get swallowed by an active device of an earlier test.
+     */
+    init_rx_desc(&desc[0], NUM_RX_DESCRIPTORS, desc_addr, data_addr);
+    enable_rx(qts, mod, &desc[0], NUM_RX_DESCRIPTORS, desc_addr,
+              with_irq ? REG_MIEN_ENRXINTR : 0, 0);
+
+    /* Send test packet to device's socket. */
+    ret = iov_send(fd, iov, 2, 0, sizeof(len) + sizeof(test));
+    g_assert_cmpint(ret, == , sizeof(test) + sizeof(len));
+
+    /* Wait for RX interrupt. */
+    if (with_irq) {
+        g_assert_true(emc_wait_irq(qts, mod, RX_STEP_COUNT, /*is_tx=*/false));
+    } else {
+        g_assert_true(emc_wait_mista(qts, mod, RX_STEP_COUNT, REG_MISTA_RXGD));
+    }
+
+    g_assert_cmphex(emc_read(qts, mod, REG_CRXDSA), ==,
+                    desc_addr + sizeof(desc[0]));
+
+    expected_mask = 0xffff;
+    expected_value = (REG_MISTA_DENI |
+                      REG_MISTA_RXGD |
+                      REG_MISTA_RXINTR);
+    g_assert_cmphex((emc_read(qts, mod, REG_MISTA) & expected_mask),
+                    ==, expected_value);
+
+    /* Read the descriptor back. */
+    emc_read_rx_desc(qts, desc_addr, &result_desc);
+    /* Descriptor should be owned by cpu now. */
+    g_assert((result_desc.status_and_length & RX_DESC_STATUS_OWNER_MASK) == 0);
+    /* Test the status bits, ignoring the length field. */
+    expected_mask = 0xffff << 16;
+    expected_value = RX_DESC_STATUS_RXGD;
+    if (with_irq) {
+        expected_value |= RX_DESC_STATUS_RXINTR;
+    }
+    g_assert_cmphex((result_desc.status_and_length & expected_mask), ==,
+                    expected_value);
+    g_assert_cmpint(RX_DESC_PKT_LEN(result_desc.status_and_length), ==,
+                    RX_DATA_LEN + CRC_LENGTH);
+
+    {
+        char buffer[RX_DATA_LEN];
+        qtest_memread(qts, data_addr, buffer, sizeof(buffer));
+        g_assert_cmpstr(buffer, == , "TEST");
+    }
+}
+
+static void emc_test_ptle(QTestState *qts, const EMCModule *mod, int fd)
+{
+    NPCM7xxEMCRxDesc desc[NUM_RX_DESCRIPTORS];
+    uint32_t desc_addr = DESC_ADDR;
+    uint32_t data_addr = DATA_ADDR;
+    int ret;
+    NPCM7xxEMCRxDesc result_desc;
+    uint32_t expected_mask, expected_value;
+
+    /* Prepare test data buffer. */
+#define PTLE_DATA_LEN 1600
+    char test_data[PTLE_DATA_LEN];
+    int len = htonl(sizeof(test_data));
+    const struct iovec iov[] = {
+        {
+            .iov_base = &len,
+            .iov_len = sizeof(len),
+        },{
+            .iov_base = (char *) test_data,
+            .iov_len = sizeof(test_data),
+        },
+    };
+    memset(test_data, 42, sizeof(test_data));
+
+    /*
+     * Reset the device BEFORE sending a test packet, otherwise the packet
+     * may get swallowed by an active device of an earlier test.
+     */
+    init_rx_desc(&desc[0], NUM_RX_DESCRIPTORS, desc_addr, data_addr);
+    enable_rx(qts, mod, &desc[0], NUM_RX_DESCRIPTORS, desc_addr,
+              REG_MIEN_ENRXINTR, REG_MCMDR_ALP);
+
+    /* Send test packet to device's socket. */
+    ret = iov_send(fd, iov, 2, 0, sizeof(len) + sizeof(test_data));
+    g_assert_cmpint(ret, == , sizeof(test_data) + sizeof(len));
+
+    /* Wait for RX interrupt. */
+    g_assert_true(emc_wait_irq(qts, mod, RX_STEP_COUNT, /*is_tx=*/false));
+
+    /* Read the descriptor back. */
+    emc_read_rx_desc(qts, desc_addr, &result_desc);
+    /* Descriptor should be owned by cpu now. */
+    g_assert((result_desc.status_and_length & RX_DESC_STATUS_OWNER_MASK) == 0);
+    /* Test the status bits, ignoring the length field. */
+    expected_mask = 0xffff << 16;
+    expected_value = (RX_DESC_STATUS_RXGD |
+                      RX_DESC_STATUS_PTLE |
+                      RX_DESC_STATUS_RXINTR);
+    g_assert_cmphex((result_desc.status_and_length & expected_mask), ==,
+                    expected_value);
+    g_assert_cmpint(RX_DESC_PKT_LEN(result_desc.status_and_length), ==,
+                    PTLE_DATA_LEN + CRC_LENGTH);
+
+    {
+        char buffer[PTLE_DATA_LEN];
+        qtest_memread(qts, data_addr, buffer, sizeof(buffer));
+        g_assert(memcmp(buffer, test_data, PTLE_DATA_LEN) == 0);
+    }
+}
+
+static void test_tx(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    GString *cmd_line = g_string_new("-machine quanta-gsj");
+    int *test_sockets = packet_test_init(emc_module_index(td->module),
+                                         cmd_line);
+    QTestState *qts = qtest_init(cmd_line->str);
+
+    /*
+     * TODO: For pedantic correctness test_sockets[0] should be closed after
+     * the fork and before the exec, but that will require some harness
+     * improvements.
+     */
+    close(test_sockets[1]);
+    /* Defensive programming */
+    test_sockets[1] = -1;
+
+    qtest_irq_intercept_in(qts, "/machine/soc/a9mpcore/gic");
+
+    emc_send_verify(qts, td->module, test_sockets[0], /*with_irq=*/false);
+    emc_send_verify(qts, td->module, test_sockets[0], /*with_irq=*/true);
+
+    qtest_quit(qts);
+}
+
+static void test_rx(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    GString *cmd_line = g_string_new("-machine quanta-gsj");
+    int *test_sockets = packet_test_init(emc_module_index(td->module),
+                                         cmd_line);
+    QTestState *qts = qtest_init(cmd_line->str);
+
+    /*
+     * TODO: For pedantic correctness test_sockets[0] should be closed after
+     * the fork and before the exec, but that will require some harness
+     * improvements.
+     */
+    close(test_sockets[1]);
+    /* Defensive programming */
+    test_sockets[1] = -1;
+
+    qtest_irq_intercept_in(qts, "/machine/soc/a9mpcore/gic");
+
+    emc_recv_verify(qts, td->module, test_sockets[0], /*with_irq=*/false);
+    emc_recv_verify(qts, td->module, test_sockets[0], /*with_irq=*/true);
+    emc_test_ptle(qts, td->module, test_sockets[0]);
+
+    qtest_quit(qts);
+}
+
+static void emc_add_test(const char *name, const TestData* td,
+                         GTestDataFunc fn)
+{
+    g_autofree char *full_name = g_strdup_printf(
+            "npcm7xx_emc/emc[%d]/%s", emc_module_index(td->module), name);
+    qtest_add_data_func(full_name, td, fn);
+}
+#define add_test(name, td) emc_add_test(#name, td, test_##name)
+
+int main(int argc, char **argv)
+{
+    TestData test_data_list[ARRAY_SIZE(emc_module_list)];
+
+    g_test_init(&argc, &argv, NULL);
+
+    for (int i = 0; i < ARRAY_SIZE(emc_module_list); ++i) {
+        TestData *td = &test_data_list[i];
+
+        td->module = &emc_module_list[i];
+
+        add_test(init, td);
+        add_test(tx, td);
+        add_test(rx, td);
+    }
+
+    return g_test_run();
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -XXX,XX +XXX,XX @@ qtests_npcm7xx = \
    'npcm7xx_rng-test',
    'npcm7xx_smbus-test',
    'npcm7xx_timer-test',
-   'npcm7xx_watchdog_timer-test']
+   'npcm7xx_watchdog_timer-test'] + \
+   (slirp.found() ? ['npcm7xx_emc-test'] : [])
 qtests_arm = \
   (config_all_devices.has_key('CONFIG_CMSDK_APB_DUALTIMER') ? ['cmsdk-apb-dualtimer-test'] : []) + \
   (config_all_devices.has_key('CONFIG_CMSDK_APB_TIMER') ? ['cmsdk-apb-timer-test'] : []) + \
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

We hint the 'has_rpu' property is no longer required since commit
6908ec448b4 ("xlnx-zynqmp: Properly support the smp command line
option") which was released in QEMU v2.11.0.

Beside, this device is marked 'user_creatable = false', so the
only thing that could be setting the property is the board code
that creates the device.

Since the property is not user-facing, we can remove it without
going through the deprecation process.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210219144350.1979905-1-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-zynqmp.h | 2 --
 hw/arm/xlnx-zynqmp.c         | 6 ------
 2 files changed, 8 deletions(-)

diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/xlnx-zynqmp.h
+++ b/include/hw/arm/xlnx-zynqmp.h
@@ -XXX,XX +XXX,XX @@ struct XlnxZynqMPState {
     bool secure;
     /* Has the ARM Virtualization extensions?  */
     bool virt;
-    /* Has the RPU subsystem?  */
-    bool has_rpu;
 
     /* CAN bus. */
     CanBusState *canbus[XLNX_ZYNQMP_NUM_CAN];
diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
         }
     }
 
-    if (s->has_rpu) {
-        info_report("The 'has_rpu' property is no longer required, to use the "
-                    "RPUs just use -smp 6.");
-    }
-
     xlnx_zynqmp_create_rpu(ms, s, boot_cpu, &err);
     if (err) {
         error_propagate(errp, err);
@@ -XXX,XX +XXX,XX @@ static Property xlnx_zynqmp_props[] = {
     DEFINE_PROP_STRING("boot-cpu", XlnxZynqMPState, boot_cpu),
     DEFINE_PROP_BOOL("secure", XlnxZynqMPState, secure, false),
     DEFINE_PROP_BOOL("virtualization", XlnxZynqMPState, virt, false),
-    DEFINE_PROP_BOOL("has_rpu", XlnxZynqMPState, has_rpu, false),
     DEFINE_PROP_LINK("ddr-ram", XlnxZynqMPState, ddr_ram, TYPE_MEMORY_REGION,
                      MemoryRegion *),
     DEFINE_PROP_LINK("canbus0", XlnxZynqMPState, canbus[0], TYPE_CAN_BUS,
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Always perform one call instead of two for 16-byte operands.
Use byte loads/stores directly into the vector register file
instead of extractions and deposits to a 64-bit local variable.

In order to easily receive pointers into the vector register file,
convert the helper to the gvec out-of-line signature.  Move the
helper into vec_helper.c, where it can make use of H1 and clear_tail.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210224230532.276878-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-a64.h    |  2 +-
 target/arm/helper-a64.c    | 32 ---------------------
 target/arm/translate-a64.c | 58 +++++---------------------------------
 target/arm/vec_helper.c    | 48 +++++++++++++++++++++++++++++++
 4 files changed, 56 insertions(+), 84 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpd_a64, i64, f64, f64, ptr)
 DEF_HELPER_3(vfp_cmped_a64, i64, f64, f64, ptr)
-DEF_HELPER_FLAGS_5(simd_tbl, TCG_CALL_NO_RWG_SE, i64, env, i64, i64, i32, i32)
+DEF_HELPER_FLAGS_4(simd_tblx, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(vfp_mulxs, TCG_CALL_NO_RWG, f32, f32, f32, ptr)
 DEF_HELPER_FLAGS_3(vfp_mulxd, TCG_CALL_NO_RWG, f64, f64, f64, ptr)
 DEF_HELPER_FLAGS_3(neon_ceq_f64, TCG_CALL_NO_RWG, i64, i64, i64, ptr)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -XXX,XX +XXX,XX @@ float64 HELPER(vfp_mulxd)(float64 a, float64 b, void *fpstp)
     return float64_mul(a, b, fpst);
 }
 
-uint64_t HELPER(simd_tbl)(CPUARMState *env, uint64_t result, uint64_t indices,
-                          uint32_t rn, uint32_t numregs)
-{
-    /* Helper function for SIMD TBL and TBX. We have to do the table
-     * lookup part for the 64 bits worth of indices we're passed in.
-     * result is the initial results vector (either zeroes for TBL
-     * or some guest values for TBX), rn the register number where
-     * the table starts, and numregs the number of registers in the table.
-     * We return the results of the lookups.
-     */
-    int shift;
-
-    for (shift = 0; shift < 64; shift += 8) {
-        int index = extract64(indices, shift, 8);
-        if (index < 16 * numregs) {
-            /* Convert index (a byte offset into the virtual table
-             * which is a series of 128-bit vectors concatenated)
-             * into the correct register element plus a bit offset
-             * into that element, bearing in mind that the table
-             * can wrap around from V31 to V0.
-             */
-            int elt = (rn * 2 + (index >> 3)) % 64;
-            int bitidx = (index & 7) * 8;
-            uint64_t *q = aa64_vfp_qreg(env, elt >> 1);
-            uint64_t val = extract64(q[elt & 1], bitidx, 8);
-
-            result = deposit64(result, shift, 8, val);
-        }
-    }
-    return result;
-}
-
 /* 64bit/double versions of the neon float compare functions */
 uint64_t HELPER(neon_ceq_f64)(float64 a, float64 b, void *fpstp)
 {
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_tb(DisasContext *s, uint32_t insn)
     int rm = extract32(insn, 16, 5);
     int rn = extract32(insn, 5, 5);
     int rd = extract32(insn, 0, 5);
-    int is_tblx = extract32(insn, 12, 1);
-    int len = extract32(insn, 13, 2);
-    TCGv_i64 tcg_resl, tcg_resh, tcg_idx;
-    TCGv_i32 tcg_regno, tcg_numregs;
+    int is_tbx = extract32(insn, 12, 1);
+    int len = (extract32(insn, 13, 2) + 1) * 16;
 
     if (op2 != 0) {
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_tb(DisasContext *s, uint32_t insn)
         return;
     }
 
-    /* This does a table lookup: for every byte element in the input
-     * we index into a table formed from up to four vector registers,
-     * and then the output is the result of the lookups. Our helper
-     * function does the lookup operation for a single 64 bit part of
-     * the input.
-     */
-    tcg_resl = tcg_temp_new_i64();
-    tcg_resh = NULL;
-
-    if (is_tblx) {
-        read_vec_element(s, tcg_resl, rd, 0, MO_64);
-    } else {
-        tcg_gen_movi_i64(tcg_resl, 0);
-    }
-
-    if (is_q) {
-        tcg_resh = tcg_temp_new_i64();
-        if (is_tblx) {
-            read_vec_element(s, tcg_resh, rd, 1, MO_64);
-        } else {
-            tcg_gen_movi_i64(tcg_resh, 0);
-        }
-    }
-
-    tcg_idx = tcg_temp_new_i64();
-    tcg_regno = tcg_const_i32(rn);
-    tcg_numregs = tcg_const_i32(len + 1);
-    read_vec_element(s, tcg_idx, rm, 0, MO_64);
-    gen_helper_simd_tbl(tcg_resl, cpu_env, tcg_resl, tcg_idx,
-                        tcg_regno, tcg_numregs);
-    if (is_q) {
-        read_vec_element(s, tcg_idx, rm, 1, MO_64);
-        gen_helper_simd_tbl(tcg_resh, cpu_env, tcg_resh, tcg_idx,
-                            tcg_regno, tcg_numregs);
-    }
-    tcg_temp_free_i64(tcg_idx);
-    tcg_temp_free_i32(tcg_regno);
-    tcg_temp_free_i32(tcg_numregs);
-
-    write_vec_element(s, tcg_resl, rd, 0, MO_64);
-    tcg_temp_free_i64(tcg_resl);
-
-    if (is_q) {
-        write_vec_element(s, tcg_resh, rd, 1, MO_64);
-        tcg_temp_free_i64(tcg_resh);
-    }
-    clear_vec_high(s, is_q, rd);
+    tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
+                       vec_full_reg_offset(s, rm), cpu_env,
+                       is_q ? 16 : 8, vec_full_reg_size(s),
+                       (len << 6) | (is_tbx << 5) | rn,
+                       gen_helper_simd_tblx);
 }
 
 /* ZIP/UZP/TRN
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VRINT_RMODE(gvec_vrint_rm_h, helper_rinth, uint16_t)
 DO_VRINT_RMODE(gvec_vrint_rm_s, helper_rints, uint32_t)
 
 #undef DO_VRINT_RMODE
+
+#ifdef TARGET_AARCH64
+void HELPER(simd_tblx)(void *vd, void *vm, void *venv, uint32_t desc)
+{
+    const uint8_t *indices = vm;
+    CPUARMState *env = venv;
+    size_t oprsz = simd_oprsz(desc);
+    uint32_t rn = extract32(desc, SIMD_DATA_SHIFT, 5);
+    bool is_tbx = extract32(desc, SIMD_DATA_SHIFT + 5, 1);
+    uint32_t table_len = desc >> (SIMD_DATA_SHIFT + 6);
+    union {
+        uint8_t b[16];
+        uint64_t d[2];
+    } result;
+
+    /*
+     * We must construct the final result in a temp, lest the output
+     * overlaps the input table.  For TBL, begin with zero; for TBX,
+     * begin with the original register contents.  Note that we always
+     * copy 16 bytes here to avoid an extra branch; clearing the high
+     * bits of the register for oprsz == 8 is handled below.
+     */
+    if (is_tbx) {
+        memcpy(&result, vd, 16);
+    } else {
+        memset(&result, 0, 16);
+    }
+
+    for (size_t i = 0; i < oprsz; ++i) {
+        uint32_t index = indices[H1(i)];
+
+        if (index < table_len) {
+            /*
+             * Convert index (a byte offset into the virtual table
+             * which is a series of 128-bit vectors concatenated)
+             * into the correct register element, bearing in mind
+             * that the table can wrap around from V31 to V0.
+             */
+            const uint8_t *table = (const uint8_t *)
+                aa64_vfp_qreg(env, (rn + (index >> 4)) % 32);
+            result.b[H1(i)] = table[H1(index % 16)];
+        }
+    }
+
+    memcpy(vd, &result, 16);
+    clear_tail(vd, oprsz, simd_maxsz(desc));
+}
+#endif
-- 
2.20.1

From: schspa <schspa@gmail.com>

At the moment the following QEMU command line triggers an assertion
failure On xlnx-versal SOC:
  qemu-system-aarch64 \
      -machine xlnx-versal-virt -nographic -smp 2 -m 128 \
      -fsdev local,id=shareid,path=${HOME}/work,security_model=none \
      -device virtio-9p-device,fsdev=shareid,mount_tag=share \
      -fsdev local,id=shareid1,path=${HOME}/Music,security_model=none \
      -device virtio-9p-device,fsdev=shareid1,mount_tag=share1

qemu-system-aarch64: ../migration/savevm.c:860:
  vmstate_register_with_alias_id:
  Assertion `!se->compat || se->instance_id == 0' failed.

This problem was fixed on arm virt platform in commit f58b39d2d5b
("virtio-mmio: format transport base address in BusClass.get_dev_path")

It works perfectly on arm virt platform. but there is still there on
xlnx-versal SOC.

The main difference between arm virt and xlnx-versal is they use
different way to create virtio-mmio qdev. on arm virt, it calls
sysbus_create_simple("virtio-mmio", base, pic[irq]); which will call
sysbus_mmio_map internally and assign base address to subsys device
mmio correctly. but xlnx-versal's implements won't do this.

However, xlnx-versal can't switch to sysbus_create_simple() to create
virtio-mmio device. It's because xlnx-versal's cpu use
VersalVirt.soc.fpd.apu.mr as it's memory. which is subregion of
system_memory. sysbus_create_simple will add virtio to system_memory,
which can't be accessed by cpu.

Besides, xlnx-versal can't add sysbus_mmio_map api call too, because
this will add memory region to system_memory, and it can't be added
to VersalVirt.soc.fpd.apu.mr again.

We can solve this by assign correct base address offset on dev_path.

This path was test on aarch64 virt & xlnx-versal platform.

Signed-off-by: schspa <schspa@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/virtio/virtio-mmio.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -XXX,XX +XXX,XX @@ static char *virtio_mmio_bus_get_dev_path(DeviceState *dev)
     BusState *virtio_mmio_bus;
     VirtIOMMIOProxy *virtio_mmio_proxy;
     char *proxy_path;
-    SysBusDevice *proxy_sbd;
     char *path;
+    MemoryRegionSection section;
 
     virtio_mmio_bus = qdev_get_parent_bus(dev);
     virtio_mmio_proxy = VIRTIO_MMIO(virtio_mmio_bus->parent);
@@ -XXX,XX +XXX,XX @@ static char *virtio_mmio_bus_get_dev_path(DeviceState *dev)
     }
 
     /* Otherwise, we append the base address of the transport. */
-    proxy_sbd = SYS_BUS_DEVICE(virtio_mmio_proxy);
-    assert(proxy_sbd->num_mmio == 1);
-    assert(proxy_sbd->mmio[0].memory == &virtio_mmio_proxy->iomem);
+    section = memory_region_find(&virtio_mmio_proxy->iomem, 0, 0x200);
+    assert(section.mr);
 
     if (proxy_path) {
         path = g_strdup_printf("%s/virtio-mmio@" TARGET_FMT_plx, proxy_path,
-                               proxy_sbd->mmio[0].addr);
+                               section.offset_within_address_space);
     } else {
         path = g_strdup_printf("virtio-mmio@" TARGET_FMT_plx,
-                               proxy_sbd->mmio[0].addr);
+                               section.offset_within_address_space);
     }
+    memory_region_unref(section.mr);
+
     g_free(proxy_path);
     return path;
 }
-- 
2.20.1

From: Peter Collingbourne <pcc@google.com>

Section D6.7 of the ARM ARM states:

For the purpose of determining Tag Check Fault handling, unprivileged
load and store instructions are treated as if executed at EL0 when
executed at either:
- EL1, when the Effective value of PSTATE.UAO is 0.
- EL2, when both the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}
  and the Effective value of PSTATE.UAO is 0.

ARM has confirmed a defect in the pseudocode function
AArch64.TagCheckFault that makes it inconsistent with the above
wording. The remedy is to adjust references to PSTATE.EL in that
function to instead refer to AArch64.AccessUsesEL(acctype), so
that unprivileged instructions use SCTLR_EL1.TCF0 and TFSRE0_EL1.
The exception type for synchronous tag check faults remains unchanged.

This patch implements the described change by partially reverting
commits 50244cc76abc and cc97b0019bb5.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210219201820.2672077-1-pcc@google.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c     |  2 +-
 target/arm/mte_helper.c | 13 +++++++++----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
         if (FIELD_EX32(flags, TBFLAG_A64, UNPRIV)
             && tbid
             && !(env->pstate & PSTATE_TCO)
-            && (sctlr & SCTLR_TCF)
+            && (sctlr & SCTLR_TCF0)
             && allocation_tag_access_enabled(env, 0, sctlr)) {
             flags = FIELD_DP32(flags, TBFLAG_A64, MTE0_ACTIVE, 1);
         }
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
     reg_el = regime_el(env, arm_mmu_idx);
     sctlr = env->cp15.sctlr_el[reg_el];
 
-    el = arm_current_el(env);
-    if (el == 0) {
+    switch (arm_mmu_idx) {
+    case ARMMMUIdx_E10_0:
+    case ARMMMUIdx_E20_0:
+        el = 0;
         tcf = extract64(sctlr, 38, 2);
-    } else {
+        break;
+    default:
+        el = reg_el;
         tcf = extract64(sctlr, 40, 2);
     }
 
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
         env->exception.vaddress = dirty_ptr;
 
         is_write = FIELD_EX32(desc, MTEDESC, WRITE);
-        syn = syn_data_abort_no_iss(el != 0, 0, 0, 0, 0, is_write, 0x11);
+        syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
+                                    is_write, 0x11);
         raise_exception(env, EXCP_DATA_ABORT, syn, exception_target_el(env));
         /* noreturn, but fall through to the assert anyway */
 
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

IDAU is specific to M-profile. KVM only supports A-profile.
Restrict this interface to TCG, as it is pointless (and
confusing) on a KVM-only build.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210221222617.2579610-2-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c     | 7 -------
 target/arm/cpu_tcg.c | 8 ++++++++
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static const TypeInfo arm_cpu_type_info = {
     .class_init = arm_cpu_class_init,
 };
 
-static const TypeInfo idau_interface_type_info = {
-    .name = TYPE_IDAU_INTERFACE,
-    .parent = TYPE_INTERFACE,
-    .class_size = sizeof(IDAUInterfaceClass),
-};
-
 static void arm_cpu_register_types(void)
 {
     const size_t cpu_count = ARRAY_SIZE(arm_cpus);
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_register_types(void)
     if (cpu_count) {
         size_t i;
 
-        type_register_static(&idau_interface_type_info);
         for (i = 0; i < cpu_count; ++i) {
             arm_cpu_register(&arm_cpus[i]);
         }
diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu_tcg.c
+++ b/target/arm/cpu_tcg.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/core/tcg-cpu-ops.h"
 #endif /* CONFIG_TCG */
 #include "internals.h"
+#include "target/arm/idau.h"
 
 /* CPU models. These are not needed for the AArch64 linux-user build. */
 #if !defined(CONFIG_USER_ONLY) || !defined(TARGET_AARCH64)
@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_tcg_cpus[] = {
     { .name = "pxa270-c5",   .initfn = pxa270c5_initfn },
 };
 
+static const TypeInfo idau_interface_type_info = {
+    .name = TYPE_IDAU_INTERFACE,
+    .parent = TYPE_INTERFACE,
+    .class_size = sizeof(IDAUInterfaceClass),
+};
+
 static void arm_tcg_cpu_register_types(void)
 {
     size_t i;
 
+    type_register_static(&idau_interface_type_info);
     for (i = 0; i < ARRAY_SIZE(arm_tcg_cpus); ++i) {
         arm_cpu_register(&arm_tcg_cpus[i]);
     }
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

We will move this code in the next commit. Clean it up
first to avoid checkpatch.pl errors.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210221222617.2579610-3-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
 }
 
 static const ARMCPRegInfo cortexa9_cp_reginfo[] = {
-    /* power_control should be set to maximum latency. Again,
+    /*
+     * power_control should be set to maximum latency. Again,
      * default to 0 and set by private hook
      */
     { .name = "A9_PWRCTL", .cp = 15, .crn = 15, .crm = 0, .opc1 = 0, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_NEON);
     set_feature(&cpu->env, ARM_FEATURE_THUMB2EE);
     set_feature(&cpu->env, ARM_FEATURE_EL3);
-    /* Note that A9 supports the MP extensions even for
+    /*
+     * Note that A9 supports the MP extensions even for
      * A9UP and single-core A9MP (which are both different
      * and valid configurations; we don't model A9UP).
      */
@@ -XXX,XX +XXX,XX @@ static uint64_t a15_l2ctlr_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
 
-    /* Linux wants the number of processors from here.
+    /*
+     * Linux wants the number of processors from here.
      * Might as well set the interrupt-controller bit too.
      */
     return ((ms->smp.cpus - 1) << 24) | (1 << 23);
@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
     cpu->isar.id_mmfr1 = 0x40000000;
     cpu->isar.id_mmfr2 = 0x01240000;
     cpu->isar.id_mmfr3 = 0x02102211;
-    /* a7_mpcore_r0p5_trm, page 4-4 gives 0x01101110; but
+    /*
+     * a7_mpcore_r0p5_trm, page 4-4 gives 0x01101110; but
      * table 4-41 gives 0x02101110, which includes the arm div insns.
      */
     cpu->isar.id_isar0 = 0x02101110;
-- 
2.20.1

For a long time now the UI layer has guaranteed that the console
surface is always 32 bits per pixel RGB. Remove the legacy dead
code from the milkymist display device which was handling the
possibility that the console surface was some other format.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215103215.4944-2-peter.maydell@linaro.org
---
 hw/arm/musicpal.c | 64 ++++++++++++++++++-----------------------------
 1 file changed, 24 insertions(+), 40 deletions(-)

diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/musicpal.c
+++ b/hw/arm/musicpal.c
@@ -XXX,XX +XXX,XX @@ static uint8_t scale_lcd_color(musicpal_lcd_state *s, uint8_t col)
     }
 }
 
-#define SET_LCD_PIXEL(depth, type) \
-static inline void glue(set_lcd_pixel, depth) \
-        (musicpal_lcd_state *s, int x, int y, type col) \
-{ \
-    int dx, dy; \
-    DisplaySurface *surface = qemu_console_surface(s->con); \
-    type *pixel = &((type *) surface_data(surface))[(y * 128 * 3 + x) * 3]; \
-\
-    for (dy = 0; dy < 3; dy++, pixel += 127 * 3) \
-        for (dx = 0; dx < 3; dx++, pixel++) \
-            *pixel = col; \
+static inline void set_lcd_pixel32(musicpal_lcd_state *s,
+                                   int x, int y, uint32_t col)
+{
+    int dx, dy;
+    DisplaySurface *surface = qemu_console_surface(s->con);
+    uint32_t *pixel =
+        &((uint32_t *) surface_data(surface))[(y * 128 * 3 + x) * 3];
+
+    for (dy = 0; dy < 3; dy++, pixel += 127 * 3) {
+        for (dx = 0; dx < 3; dx++, pixel++) {
+            *pixel = col;
+        }
+    }
 }
-SET_LCD_PIXEL(8, uint8_t)
-SET_LCD_PIXEL(16, uint16_t)
-SET_LCD_PIXEL(32, uint32_t)
 
 static void lcd_refresh(void *opaque)
 {
     musicpal_lcd_state *s = opaque;
-    DisplaySurface *surface = qemu_console_surface(s->con);
     int x, y, col;
 
-    switch (surface_bits_per_pixel(surface)) {
-    case 0:
-        return;
-#define LCD_REFRESH(depth, func) \
-    case depth: \
-        col = func(scale_lcd_color(s, (MP_LCD_TEXTCOLOR >> 16) & 0xff), \
-                   scale_lcd_color(s, (MP_LCD_TEXTCOLOR >> 8) & 0xff), \
-                   scale_lcd_color(s, MP_LCD_TEXTCOLOR & 0xff)); \
-        for (x = 0; x < 128; x++) { \
-            for (y = 0; y < 64; y++) { \
-                if (s->video_ram[x + (y/8)*128] & (1 << (y % 8))) { \
-                    glue(set_lcd_pixel, depth)(s, x, y, col); \
-                } else { \
-                    glue(set_lcd_pixel, depth)(s, x, y, 0); \
-                } \
-            } \
-        } \
-        break;
-    LCD_REFRESH(8, rgb_to_pixel8)
-    LCD_REFRESH(16, rgb_to_pixel16)
-    LCD_REFRESH(32, (is_surface_bgr(surface) ?
-                     rgb_to_pixel32bgr : rgb_to_pixel32))
-    default:
-        hw_error("unsupported colour depth %i\n",
-                 surface_bits_per_pixel(surface));
+    col = rgb_to_pixel32(scale_lcd_color(s, (MP_LCD_TEXTCOLOR >> 16) & 0xff),
+                         scale_lcd_color(s, (MP_LCD_TEXTCOLOR >> 8) & 0xff),
+                         scale_lcd_color(s, MP_LCD_TEXTCOLOR & 0xff));
+    for (x = 0; x < 128; x++) {
+        for (y = 0; y < 64; y++) {
+            if (s->video_ram[x + (y / 8) * 128] & (1 << (y % 8))) {
+                set_lcd_pixel32(s, x, y, col);
+            } else {
+                set_lcd_pixel32(s, x, y, 0);
+            }
+        }
     }
 
     dpy_gfx_update(s->con, 0, 0, 128*3, 64*3);
-- 
2.20.1

For a long time now the UI layer has guaranteed that the console
surface is always 32 bits per pixel RGB. Remove the legacy dead
code from the tc6393xb display device which was handling the
possibility that the console surface was some other format.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215103215.4944-3-peter.maydell@linaro.org
---
 include/ui/console.h  | 10 ----------
 hw/display/tc6393xb.c | 33 +--------------------------------
 2 files changed, 1 insertion(+), 42 deletions(-)

diff --git a/include/ui/console.h b/include/ui/console.h
index XXXXXXX..XXXXXXX 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -XXX,XX +XXX,XX @@ PixelFormat qemu_default_pixelformat(int bpp);
 DisplaySurface *qemu_create_displaysurface(int width, int height);
 void qemu_free_displaysurface(DisplaySurface *surface);
 
-static inline int is_surface_bgr(DisplaySurface *surface)
-{
-    if (PIXMAN_FORMAT_BPP(surface->format) == 32 &&
-        PIXMAN_FORMAT_TYPE(surface->format) == PIXMAN_TYPE_ABGR) {
-        return 1;
-    } else {
-        return 0;
-    }
-}
-
 static inline int is_buffer_shared(DisplaySurface *surface)
 {
     return !(surface->flags & QEMU_ALLOCATED_FLAG);
diff --git a/hw/display/tc6393xb.c b/hw/display/tc6393xb.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/tc6393xb.c
+++ b/hw/display/tc6393xb.c
@@ -XXX,XX +XXX,XX @@ static void tc6393xb_nand_writeb(TC6393xbState *s, hwaddr addr, uint32_t value)
                                         (uint32_t) addr, value & 0xff);
 }
 
-#define BITS 8
-#include "tc6393xb_template.h"
-#define BITS 15
-#include "tc6393xb_template.h"
-#define BITS 16
-#include "tc6393xb_template.h"
-#define BITS 24
-#include "tc6393xb_template.h"
 #define BITS 32
 #include "tc6393xb_template.h"
 
 static void tc6393xb_draw_graphic(TC6393xbState *s, int full_update)
 {
-    DisplaySurface *surface = qemu_console_surface(s->con);
-
-    switch (surface_bits_per_pixel(surface)) {
-        case 8:
-            tc6393xb_draw_graphic8(s);
-            break;
-        case 15:
-            tc6393xb_draw_graphic15(s);
-            break;
-        case 16:
-            tc6393xb_draw_graphic16(s);
-            break;
-        case 24:
-            tc6393xb_draw_graphic24(s);
-            break;
-        case 32:
-            tc6393xb_draw_graphic32(s);
-            break;
-        default:
-            printf("tc6393xb: unknown depth %d\n",
-                   surface_bits_per_pixel(surface));
-            return;
-    }
-
+    tc6393xb_draw_graphic32(s);
     dpy_gfx_update_full(s->con);
 }
 
-- 
2.20.1

Now the template header is included only for BITS==32, expand
out all the macros that depended on the BITS setting.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215103215.4944-4-peter.maydell@linaro.org
---
 hw/display/tc6393xb_template.h | 35 ++++------------------------------
 1 file changed, 4 insertions(+), 31 deletions(-)

diff --git a/hw/display/tc6393xb_template.h b/hw/display/tc6393xb_template.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/tc6393xb_template.h
+++ b/hw/display/tc6393xb_template.h
@@ -XXX,XX +XXX,XX @@
  * with this program; if not, see <http://www.gnu.org/licenses/>.
  */
 
-#if BITS == 8
-# define SET_PIXEL(addr, color)  (*(uint8_t *)addr = color)
-#elif BITS == 15 || BITS == 16
-# define SET_PIXEL(addr, color)  (*(uint16_t *)addr = color)
-#elif BITS == 24
-# define SET_PIXEL(addr, color)  \
-    do {                         \
-        addr[0] = color;         \
-        addr[1] = (color) >> 8;  \
-        addr[2] = (color) >> 16; \
-    } while (0)
-#elif BITS == 32
-# define SET_PIXEL(addr, color)  (*(uint32_t *)addr = color)
-#else
-# error unknown bit depth
-#endif
-
-
-static void glue(tc6393xb_draw_graphic, BITS)(TC6393xbState *s)
+static void tc6393xb_draw_graphic32(TC6393xbState *s)
 {
     DisplaySurface *surface = qemu_console_surface(s->con);
     int i;
@@ -XXX,XX +XXX,XX @@ static void glue(tc6393xb_draw_graphic, BITS)(TC6393xbState *s)
     data_buffer = s->vram_ptr;
     data_display = surface_data(surface);
     for(i = 0; i < s->scr_height; i++) {
-#if (BITS == 16)
-        memcpy(data_display, data_buffer, s->scr_width * 2);
-        data_buffer += s->scr_width;
-        data_display += surface_stride(surface);
-#else
         int j;
-        for (j = 0; j < s->scr_width; j++, data_display += BITS / 8, data_buffer++) {
+        for (j = 0; j < s->scr_width; j++, data_display += 4, data_buffer++) {
             uint16_t color = *data_buffer;
-            uint32_t dest_color = glue(rgb_to_pixel, BITS)(
+            uint32_t dest_color = rgb_to_pixel32(
                            ((color & 0xf800) * 0x108) >> 11,
                            ((color & 0x7e0) * 0x41) >> 9,
                            ((color & 0x1f) * 0x21) >> 2
                            );
-            SET_PIXEL(data_display, dest_color);
+            *(uint32_t *)data_display = dest_color;
         }
-#endif
     }
 }
-
-#undef BITS
-#undef SET_PIXEL
-- 
2.20.1

The function tc6393xb_draw_graphic32() is called in exactly one place,
so just inline the function body at its callsite. This allows us to
drop the template header entirely.

The code move includes a single added space after 'for' to fix
the coding style.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210215103215.4944-5-peter.maydell@linaro.org
---
 hw/display/tc6393xb_template.h | 45 ----------------------------------
 hw/display/tc6393xb.c          | 23 ++++++++++++++---
 2 files changed, 19 insertions(+), 49 deletions(-)
 delete mode 100644 hw/display/tc6393xb_template.h

diff --git a/hw/display/tc6393xb_template.h b/hw/display/tc6393xb_template.h
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/hw/display/tc6393xb_template.h
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-/*
- * Toshiba TC6393XB I/O Controller.
- * Found in Sharp Zaurus SL-6000 (tosa) or some
- * Toshiba e-Series PDAs.
- *
- * FB support code. Based on G364 fb emulator
- *
- * Copyright (c) 2007 Hervé Poussineau
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2 of
- * the License, or (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, see <http://www.gnu.org/licenses/>.
- */
-
-static void tc6393xb_draw_graphic32(TC6393xbState *s)
-{
-    DisplaySurface *surface = qemu_console_surface(s->con);
-    int i;
-    uint16_t *data_buffer;
-    uint8_t *data_display;
-
-    data_buffer = s->vram_ptr;
-    data_display = surface_data(surface);
-    for(i = 0; i < s->scr_height; i++) {
-        int j;
-        for (j = 0; j < s->scr_width; j++, data_display += 4, data_buffer++) {
-            uint16_t color = *data_buffer;
-            uint32_t dest_color = rgb_to_pixel32(
-                           ((color & 0xf800) * 0x108) >> 11,
-                           ((color & 0x7e0) * 0x41) >> 9,
-                           ((color & 0x1f) * 0x21) >> 2
-                           );
-            *(uint32_t *)data_display = dest_color;
-        }
-    }
-}
diff --git a/hw/display/tc6393xb.c b/hw/display/tc6393xb.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/tc6393xb.c
+++ b/hw/display/tc6393xb.c
@@ -XXX,XX +XXX,XX @@ static void tc6393xb_nand_writeb(TC6393xbState *s, hwaddr addr, uint32_t value)
                                         (uint32_t) addr, value & 0xff);
 }
 
-#define BITS 32
-#include "tc6393xb_template.h"
-
 static void tc6393xb_draw_graphic(TC6393xbState *s, int full_update)
 {
-    tc6393xb_draw_graphic32(s);
+    DisplaySurface *surface = qemu_console_surface(s->con);
+    int i;
+    uint16_t *data_buffer;
+    uint8_t *data_display;
+
+    data_buffer = s->vram_ptr;
+    data_display = surface_data(surface);
+    for (i = 0; i < s->scr_height; i++) {
+        int j;
+        for (j = 0; j < s->scr_width; j++, data_display += 4, data_buffer++) {
+            uint16_t color = *data_buffer;
+            uint32_t dest_color = rgb_to_pixel32(
+                           ((color & 0xf800) * 0x108) >> 11,
+                           ((color & 0x7e0) * 0x41) >> 9,
+                           ((color & 0x1f) * 0x21) >> 2
+                           );
+            *(uint32_t *)data_display = dest_color;
+        }
+    }
     dpy_gfx_update_full(s->con);
 }
 
-- 
2.20.1

The omap_lcdc template header is already only included once, for
DEPTH==32, but it still has all the macro-driven parameterization
for other depths. Expand out all the macros in the header.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210215103215.4944-6-peter.maydell@linaro.org
---
 hw/display/omap_lcd_template.h | 67 ++++++++++++++--------------------
 1 file changed, 28 insertions(+), 39 deletions(-)

diff --git a/hw/display/omap_lcd_template.h b/hw/display/omap_lcd_template.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/omap_lcd_template.h
+++ b/hw/display/omap_lcd_template.h
@@ -XXX,XX +XXX,XX @@
  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#if DEPTH == 32
-# define BPP 4
-# define PIXEL_TYPE uint32_t
-#else
-# error unsupport depth
-#endif
-
 /*
  * 2-bit colour
  */
-static void glue(draw_line2_, DEPTH)(void *opaque,
-                uint8_t *d, const uint8_t *s, int width, int deststep)
+static void draw_line2_32(void *opaque, uint8_t *d, const uint8_t *s,
+                          int width, int deststep)
 {
     uint16_t *pal = opaque;
     uint8_t v, r, g, b;
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line2_, DEPTH)(void *opaque,
         r = (pal[v & 3] >> 4) & 0xf0;
         g = pal[v & 3] & 0xf0;
         b = (pal[v & 3] << 4) & 0xf0;
-        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
-        d += BPP;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
         v >>= 2;
         r = (pal[v & 3] >> 4) & 0xf0;
         g = pal[v & 3] & 0xf0;
         b = (pal[v & 3] << 4) & 0xf0;
-        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
-        d += BPP;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
         v >>= 2;
         r = (pal[v & 3] >> 4) & 0xf0;
         g = pal[v & 3] & 0xf0;
         b = (pal[v & 3] << 4) & 0xf0;
-        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
-        d += BPP;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
         v >>= 2;
         r = (pal[v & 3] >> 4) & 0xf0;
         g = pal[v & 3] & 0xf0;
         b = (pal[v & 3] << 4) & 0xf0;
-        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
-        d += BPP;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
         s ++;
         width -= 4;
     } while (width > 0);
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line2_, DEPTH)(void *opaque,
 /*
  * 4-bit colour
  */
-static void glue(draw_line4_, DEPTH)(void *opaque,
-                uint8_t *d, const uint8_t *s, int width, int deststep)
+static void draw_line4_32(void *opaque, uint8_t *d, const uint8_t *s,
+                          int width, int deststep)
 {
     uint16_t *pal = opaque;
     uint8_t v, r, g, b;
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line4_, DEPTH)(void *opaque,
         r = (pal[v & 0xf] >> 4) & 0xf0;
         g = pal[v & 0xf] & 0xf0;
         b = (pal[v & 0xf] << 4) & 0xf0;
-        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
-        d += BPP;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
         v >>= 4;
         r = (pal[v & 0xf] >> 4) & 0xf0;
         g = pal[v & 0xf] & 0xf0;
         b = (pal[v & 0xf] << 4) & 0xf0;
-        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
-        d += BPP;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
         s ++;
         width -= 2;
     } while (width > 0);
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line4_, DEPTH)(void *opaque,
 /*
  * 8-bit colour
  */
-static void glue(draw_line8_, DEPTH)(void *opaque,
-                uint8_t *d, const uint8_t *s, int width, int deststep)
+static void draw_line8_32(void *opaque, uint8_t *d, const uint8_t *s,
+                          int width, int deststep)
 {
     uint16_t *pal = opaque;
     uint8_t v, r, g, b;
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line8_, DEPTH)(void *opaque,
         r = (pal[v] >> 4) & 0xf0;
         g = pal[v] & 0xf0;
         b = (pal[v] << 4) & 0xf0;
-        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
         s ++;
-        d += BPP;
+        d += 4;
     } while (-- width != 0);
 }
 
 /*
  * 12-bit colour
  */
-static void glue(draw_line12_, DEPTH)(void *opaque,
-                uint8_t *d, const uint8_t *s, int width, int deststep)
+static void draw_line12_32(void *opaque, uint8_t *d, const uint8_t *s,
+                           int width, int deststep)
 {
     uint16_t v;
     uint8_t r, g, b;
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line12_, DEPTH)(void *opaque,
         r = (v >> 4) & 0xf0;
         g = v & 0xf0;
         b = (v << 4) & 0xf0;
-        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
         s += 2;
-        d += BPP;
+        d += 4;
     } while (-- width != 0);
 }
 
 /*
  * 16-bit colour
  */
-static void glue(draw_line16_, DEPTH)(void *opaque,
-                uint8_t *d, const uint8_t *s, int width, int deststep)
+static void draw_line16_32(void *opaque, uint8_t *d, const uint8_t *s,
+                           int width, int deststep)
 {
 #if defined(HOST_WORDS_BIGENDIAN) == defined(TARGET_WORDS_BIGENDIAN)
     memcpy(d, s, width * 2);
@@ -XXX,XX +XXX,XX @@ static void glue(draw_line16_, DEPTH)(void *opaque,
         r = (v >> 8) & 0xf8;
         g = (v >> 3) & 0xfc;
         b = (v << 3) & 0xf8;
-        ((PIXEL_TYPE *) d)[0] = glue(rgb_to_pixel, DEPTH)(r, g, b);
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
         s += 2;
-        d += BPP;
+        d += 4;
     } while (-- width != 0);
 #endif
 }
-
-#undef DEPTH
-#undef BPP
-#undef PIXEL_TYPE
-- 
2.20.1

The draw_line16_32() function in the omap_lcdc template header
includes an ifdef for the case where HOST_WORDS_BIGENDIAN matches
TARGET_WORDS_BIGENDIAN.  This is trying to optimise for "source
bitmap and destination bitmap format match", but it is broken,
because in this function the formats don't match: the source is
16-bit colour and the destination is 32-bit colour, so a memcpy()
will produce corrupted graphics output.  Drop the bogus ifdef.

This bug was introduced in commit ea644cf343129, when we dropped
support for DEPTH values other than 32 from the template header.
The old #if line was
  #if DEPTH == 16 && defined(HOST_WORDS_BIGENDIAN) == defined(TARGET_WORDS_BIGENDIAN)
and this was mistakenly changed to
  #if defined(HOST_WORDS_BIGENDIAN) == defined(TARGET_WORDS_BIGENDIAN)
rather than deleting the #if as now having an always-false condition.

Fixes: ea644cf343129
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210215103215.4944-7-peter.maydell@linaro.org
---
 hw/display/omap_lcd_template.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/hw/display/omap_lcd_template.h b/hw/display/omap_lcd_template.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/omap_lcd_template.h
+++ b/hw/display/omap_lcd_template.h
@@ -XXX,XX +XXX,XX @@ static void draw_line12_32(void *opaque, uint8_t *d, const uint8_t *s,
 static void draw_line16_32(void *opaque, uint8_t *d, const uint8_t *s,
                            int width, int deststep)
 {
-#if defined(HOST_WORDS_BIGENDIAN) == defined(TARGET_WORDS_BIGENDIAN)
-    memcpy(d, s, width * 2);
-#else
     uint16_t v;
     uint8_t r, g, b;
 
@@ -XXX,XX +XXX,XX @@ static void draw_line16_32(void *opaque, uint8_t *d, const uint8_t *s,
         s += 2;
         d += 4;
     } while (-- width != 0);
-#endif
 }
-- 
2.20.1

Fix some minor coding style issues in the template header,
so checkpatch doesn't complain when we move the code.

diff --git a/hw/display/omap_lcd_template.h b/hw/display/omap_lcd_template.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/omap_lcd_template.h
+++ b/hw/display/omap_lcd_template.h
@@ -XXX,XX +XXX,XX @@ static void draw_line2_32(void *opaque, uint8_t *d, const uint8_t *s,
         b = (pal[v & 3] << 4) & 0xf0;
         ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
         d += 4;
-        s ++;
+        s++;
         width -= 4;
     } while (width > 0);
 }
@@ -XXX,XX +XXX,XX @@ static void draw_line4_32(void *opaque, uint8_t *d, const uint8_t *s,
         b = (pal[v & 0xf] << 4) & 0xf0;
         ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
         d += 4;
-        s ++;
+        s++;
         width -= 2;
     } while (width > 0);
 }
@@ -XXX,XX +XXX,XX @@ static void draw_line8_32(void *opaque, uint8_t *d, const uint8_t *s,
         g = pal[v] & 0xf0;
         b = (pal[v] << 4) & 0xf0;
         ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
-        s ++;
+        s++;
         d += 4;
     } while (-- width != 0);
 }
-- 
2.20.1

We only include the template header once, so just inline it into the
source file for the device.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210215103215.4944-9-peter.maydell@linaro.org
---
 hw/display/omap_lcd_template.h | 154 ---------------------------------
 hw/display/omap_lcdc.c         | 127 ++++++++++++++++++++++++++-
 2 files changed, 125 insertions(+), 156 deletions(-)
 delete mode 100644 hw/display/omap_lcd_template.h

diff --git a/hw/display/omap_lcd_template.h b/hw/display/omap_lcd_template.h
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/hw/display/omap_lcd_template.h
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-/*
- * QEMU OMAP LCD Emulator templates
- *
- * Copyright (c) 2006 Andrzej Zaborowski  <balrog@zabor.org>
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- *
- * 1. Redistributions of source code must retain the above copyright
- *    notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- *    notice, this list of conditions and the following disclaimer in
- *    the documentation and/or other materials provided with the
- *    distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS''
- * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
- * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
- * PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR
- * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
- * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
- * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
- * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
- * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-/*
- * 2-bit colour
- */
-static void draw_line2_32(void *opaque, uint8_t *d, const uint8_t *s,
-                          int width, int deststep)
-{
-    uint16_t *pal = opaque;
-    uint8_t v, r, g, b;
-
-    do {
-        v = ldub_p((void *) s);
-        r = (pal[v & 3] >> 4) & 0xf0;
-        g = pal[v & 3] & 0xf0;
-        b = (pal[v & 3] << 4) & 0xf0;
-        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
-        d += 4;
-        v >>= 2;
-        r = (pal[v & 3] >> 4) & 0xf0;
-        g = pal[v & 3] & 0xf0;
-        b = (pal[v & 3] << 4) & 0xf0;
-        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
-        d += 4;
-        v >>= 2;
-        r = (pal[v & 3] >> 4) & 0xf0;
-        g = pal[v & 3] & 0xf0;
-        b = (pal[v & 3] << 4) & 0xf0;
-        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
-        d += 4;
-        v >>= 2;
-        r = (pal[v & 3] >> 4) & 0xf0;
-        g = pal[v & 3] & 0xf0;
-        b = (pal[v & 3] << 4) & 0xf0;
-        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
-        d += 4;
-        s++;
-        width -= 4;
-    } while (width > 0);
-}
-
-/*
- * 4-bit colour
- */
-static void draw_line4_32(void *opaque, uint8_t *d, const uint8_t *s,
-                          int width, int deststep)
-{
-    uint16_t *pal = opaque;
-    uint8_t v, r, g, b;
-
-    do {
-        v = ldub_p((void *) s);
-        r = (pal[v & 0xf] >> 4) & 0xf0;
-        g = pal[v & 0xf] & 0xf0;
-        b = (pal[v & 0xf] << 4) & 0xf0;
-        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
-        d += 4;
-        v >>= 4;
-        r = (pal[v & 0xf] >> 4) & 0xf0;
-        g = pal[v & 0xf] & 0xf0;
-        b = (pal[v & 0xf] << 4) & 0xf0;
-        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
-        d += 4;
-        s++;
-        width -= 2;
-    } while (width > 0);
-}
-
-/*
- * 8-bit colour
- */
-static void draw_line8_32(void *opaque, uint8_t *d, const uint8_t *s,
-                          int width, int deststep)
-{
-    uint16_t *pal = opaque;
-    uint8_t v, r, g, b;
-
-    do {
-        v = ldub_p((void *) s);
-        r = (pal[v] >> 4) & 0xf0;
-        g = pal[v] & 0xf0;
-        b = (pal[v] << 4) & 0xf0;
-        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
-        s++;
-        d += 4;
-    } while (-- width != 0);
-}
-
-/*
- * 12-bit colour
- */
-static void draw_line12_32(void *opaque, uint8_t *d, const uint8_t *s,
-                           int width, int deststep)
-{
-    uint16_t v;
-    uint8_t r, g, b;
-
-    do {
-        v = lduw_le_p((void *) s);
-        r = (v >> 4) & 0xf0;
-        g = v & 0xf0;
-        b = (v << 4) & 0xf0;
-        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
-        s += 2;
-        d += 4;
-    } while (-- width != 0);
-}
-
-/*
- * 16-bit colour
- */
-static void draw_line16_32(void *opaque, uint8_t *d, const uint8_t *s,
-                           int width, int deststep)
-{
-    uint16_t v;
-    uint8_t r, g, b;
-
-    do {
-        v = lduw_le_p((void *) s);
-        r = (v >> 8) & 0xf8;
-        g = (v >> 3) & 0xfc;
-        b = (v << 3) & 0xf8;
-        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
-        s += 2;
-        d += 4;
-    } while (-- width != 0);
-}
diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/omap_lcdc.c
+++ b/hw/display/omap_lcdc.c
@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
 
 #define draw_line_func drawfn
 
-#define DEPTH 32
-#include "omap_lcd_template.h"
+/*
+ * 2-bit colour
+ */
+static void draw_line2_32(void *opaque, uint8_t *d, const uint8_t *s,
+                          int width, int deststep)
+{
+    uint16_t *pal = opaque;
+    uint8_t v, r, g, b;
+
+    do {
+        v = ldub_p((void *) s);
+        r = (pal[v & 3] >> 4) & 0xf0;
+        g = pal[v & 3] & 0xf0;
+        b = (pal[v & 3] << 4) & 0xf0;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
+        v >>= 2;
+        r = (pal[v & 3] >> 4) & 0xf0;
+        g = pal[v & 3] & 0xf0;
+        b = (pal[v & 3] << 4) & 0xf0;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
+        v >>= 2;
+        r = (pal[v & 3] >> 4) & 0xf0;
+        g = pal[v & 3] & 0xf0;
+        b = (pal[v & 3] << 4) & 0xf0;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
+        v >>= 2;
+        r = (pal[v & 3] >> 4) & 0xf0;
+        g = pal[v & 3] & 0xf0;
+        b = (pal[v & 3] << 4) & 0xf0;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
+        s++;
+        width -= 4;
+    } while (width > 0);
+}
+
+/*
+ * 4-bit colour
+ */
+static void draw_line4_32(void *opaque, uint8_t *d, const uint8_t *s,
+                          int width, int deststep)
+{
+    uint16_t *pal = opaque;
+    uint8_t v, r, g, b;
+
+    do {
+        v = ldub_p((void *) s);
+        r = (pal[v & 0xf] >> 4) & 0xf0;
+        g = pal[v & 0xf] & 0xf0;
+        b = (pal[v & 0xf] << 4) & 0xf0;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
+        v >>= 4;
+        r = (pal[v & 0xf] >> 4) & 0xf0;
+        g = pal[v & 0xf] & 0xf0;
+        b = (pal[v & 0xf] << 4) & 0xf0;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        d += 4;
+        s++;
+        width -= 2;
+    } while (width > 0);
+}
+
+/*
+ * 8-bit colour
+ */
+static void draw_line8_32(void *opaque, uint8_t *d, const uint8_t *s,
+                          int width, int deststep)
+{
+    uint16_t *pal = opaque;
+    uint8_t v, r, g, b;
+
+    do {
+        v = ldub_p((void *) s);
+        r = (pal[v] >> 4) & 0xf0;
+        g = pal[v] & 0xf0;
+        b = (pal[v] << 4) & 0xf0;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        s++;
+        d += 4;
+    } while (-- width != 0);
+}
+
+/*
+ * 12-bit colour
+ */
+static void draw_line12_32(void *opaque, uint8_t *d, const uint8_t *s,
+                           int width, int deststep)
+{
+    uint16_t v;
+    uint8_t r, g, b;
+
+    do {
+        v = lduw_le_p((void *) s);
+        r = (v >> 4) & 0xf0;
+        g = v & 0xf0;
+        b = (v << 4) & 0xf0;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        s += 2;
+        d += 4;
+    } while (-- width != 0);
+}
+
+/*
+ * 16-bit colour
+ */
+static void draw_line16_32(void *opaque, uint8_t *d, const uint8_t *s,
+                           int width, int deststep)
+{
+    uint16_t v;
+    uint8_t r, g, b;
+
+    do {
+        v = lduw_le_p((void *) s);
+        r = (v >> 8) & 0xf8;
+        g = (v >> 3) & 0xfc;
+        b = (v << 3) & 0xf8;
+        ((uint32_t *) d)[0] = rgb_to_pixel32(r, g, b);
+        s += 2;
+        d += 4;
+    } while (-- width != 0);
+}
 
 static void omap_update_display(void *opaque)
 {
-- 
2.20.1

The macro draw_line_func is used only once; just expand it.

diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/omap_lcdc.c
+++ b/hw/display/omap_lcdc.c
@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
     qemu_irq_lower(s->irq);
 }
 
-#define draw_line_func drawfn
-
 /*
  * 2-bit colour
  */
@@ -XXX,XX +XXX,XX @@ static void omap_update_display(void *opaque)
 {
     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
     DisplaySurface *surface;
-    draw_line_func draw_line;
+    drawfn draw_line;
     int size, height, first, last;
     int width, linesize, step, bpp, frame_offset;
     hwaddr frame_base;
-- 
2.20.1

For a long time now the UI layer has guaranteed that the console
surface is always 32 bits per pixel, RGB. The TCX code already
assumes 32bpp, but it still has some checks of is_surface_bgr()
in an attempt to support 32bpp BGR. is_surface_bgr() will always
return false for the qemu_console_surface(), unless the display
device itself has deliberately created an alternate-format
surface via a function like qemu_create_displaysurface_from().

Drop the never-used BGR-handling code, and assert that we have
a 32-bit surface rather than just doing nothing if it isn't.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215102149.20513-1-peter.maydell@linaro.org
---
 hw/display/tcx.c | 31 ++++++++-----------------------
 1 file changed, 8 insertions(+), 23 deletions(-)

diff --git a/hw/display/tcx.c b/hw/display/tcx.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/tcx.c
+++ b/hw/display/tcx.c
@@ -XXX,XX +XXX,XX @@ static int tcx_check_dirty(TCXState *s, DirtyBitmapSnapshot *snap,
 
 static void update_palette_entries(TCXState *s, int start, int end)
 {
-    DisplaySurface *surface = qemu_console_surface(s->con);
     int i;
 
     for (i = start; i < end; i++) {
-        if (is_surface_bgr(surface)) {
-            s->palette[i] = rgb_to_pixel32bgr(s->r[i], s->g[i], s->b[i]);
-        } else {
-            s->palette[i] = rgb_to_pixel32(s->r[i], s->g[i], s->b[i]);
-        }
+        s->palette[i] = rgb_to_pixel32(s->r[i], s->g[i], s->b[i]);
     }
     tcx_set_dirty(s, 0, memory_region_size(&s->vram_mem));
 }
@@ -XXX,XX +XXX,XX @@ static void tcx_draw_cursor32(TCXState *s1, uint8_t *d,
 }
 
 /*
-  XXX Could be much more optimal:
-  * detect if line/page/whole screen is in 24 bit mode
-  * if destination is also BGR, use memcpy
-  */
+ * XXX Could be much more optimal:
+ * detect if line/page/whole screen is in 24 bit mode
+ */
 static inline void tcx24_draw_line32(TCXState *s1, uint8_t *d,
                                      const uint8_t *s, int width,
                                      const uint32_t *cplane,
                                      const uint32_t *s24)
 {
-    DisplaySurface *surface = qemu_console_surface(s1->con);
-    int x, bgr, r, g, b;
+    int x, r, g, b;
     uint8_t val, *p8;
     uint32_t *p = (uint32_t *)d;
     uint32_t dval;
-    bgr = is_surface_bgr(surface);
     for(x = 0; x < width; x++, s++, s24++) {
         if (be32_to_cpu(*cplane) & 0x03000000) {
             /* 24-bit direct, BGR order */
@@ -XXX,XX +XXX,XX @@ static inline void tcx24_draw_line32(TCXState *s1, uint8_t *d,
             b = *p8++;
             g = *p8++;
             r = *p8;
-            if (bgr)
-                dval = rgb_to_pixel32bgr(r, g, b);
-            else
-                dval = rgb_to_pixel32(r, g, b);
+            dval = rgb_to_pixel32(r, g, b);
         } else {
             /* 8-bit pseudocolor */
             val = *s;
@@ -XXX,XX +XXX,XX @@ static void tcx_update_display(void *opaque)
     int y, y_start, dd, ds;
     uint8_t *d, *s;
 
-    if (surface_bits_per_pixel(surface) != 32) {
-        return;
-    }
+    assert(surface_bits_per_pixel(surface) == 32);
 
     page = 0;
     y_start = -1;
@@ -XXX,XX +XXX,XX @@ static void tcx24_update_display(void *opaque)
     uint8_t *d, *s;
     uint32_t *cptr, *s24;
 
-    if (surface_bits_per_pixel(surface) != 32) {
-            return;
-    }
+    assert(surface_bits_per_pixel(surface) == 32);
 
     page = 0;
     y_start = -1;
-- 
2.20.1

The AN524 has a different SYSCLK frequency from the AN505 and AN521;
make the SYSCLK frequency a field in the MPS2TZMachineClass rather
than a compile-time constant so we can support the AN524.

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineClass {
     MachineClass parent;
     MPS2TZFPGAType fpga_type;
     uint32_t scc_id;
+    uint32_t sysclk_frq; /* Main SYSCLK frequency in Hz */
     const char *armsse_type;
 };
 
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
 
 OBJECT_DECLARE_TYPE(MPS2TZMachineState, MPS2TZMachineClass, MPS2TZ_MACHINE)
 
-/* Main SYSCLK frequency in Hz */
-#define SYSCLK_FRQ 20000000
 /* Slow 32Khz S32KCLK frequency in Hz */
 #define S32KCLK_FRQ (32 * 1000)
 
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_unimp_dev(MPS2TZMachineState *mms,
 static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
                                const char *name, hwaddr size)
 {
+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
     CMSDKAPBUART *uart = opaque;
     int i = uart - &mms->uart[0];
     int rxirqno = i * 2;
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
 
     object_initialize_child(OBJECT(mms), name, uart, TYPE_CMSDK_APB_UART);
     qdev_prop_set_chr(DEVICE(uart), "chardev", serial_hd(i));
-    qdev_prop_set_uint32(DEVICE(uart), "pclk-frq", SYSCLK_FRQ);
+    qdev_prop_set_uint32(DEVICE(uart), "pclk-frq", mmc->sysclk_frq);
     sysbus_realize(SYS_BUS_DEVICE(uart), &error_fatal);
     s = SYS_BUS_DEVICE(uart);
     sysbus_connect_irq(s, 0, get_sse_irq_in(mms, txirqno));
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
 
     /* These clocks don't need migration because they are fixed-frequency */
     mms->sysclk = clock_new(OBJECT(machine), "SYSCLK");
-    clock_set_hz(mms->sysclk, SYSCLK_FRQ);
+    clock_set_hz(mms->sysclk, mmc->sysclk_frq);
     mms->s32kclk = clock_new(OBJECT(machine), "S32KCLK");
     clock_set_hz(mms->s32kclk, S32KCLK_FRQ);
 
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
     mmc->fpga_type = FPGA_AN505;
     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
     mmc->scc_id = 0x41045050;
+    mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
     mmc->armsse_type = TYPE_IOTKIT;
 }
 
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
     mmc->fpga_type = FPGA_AN521;
     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
     mmc->scc_id = 0x41045210;
+    mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
     mmc->armsse_type = TYPE_SSE200;
 }
 
-- 
2.20.1

Currently the MPS2 SCC device implements a fixed number of OSCCLK
values (3).  The variant of this device in the MPS3 AN524 board has 6
OSCCLK values.  Switch to using a PROP_ARRAY, which allows board code
to specify how large the OSCCLK array should be as well as its
values.

With a variable-length property array, the SCC no longer specifies
default values for the OSCCLKs, so we must set them explicitly in the
board code.  This defaults are actually incorrect for the an521 and
an505; we will correct this bug in a following patch.

This is a migration compatibility break for all the mps boards.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-3-peter.maydell@linaro.org
---
 include/hw/misc/mps2-scc.h |  7 +++----
 hw/arm/mps2-tz.c           |  5 +++++
 hw/arm/mps2.c              |  5 +++++
 hw/misc/mps2-scc.c         | 24 +++++++++++++-----------
 4 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/mps2-scc.h
+++ b/include/hw/misc/mps2-scc.h
@@ -XXX,XX +XXX,XX @@
 #define TYPE_MPS2_SCC "mps2-scc"
 OBJECT_DECLARE_SIMPLE_TYPE(MPS2SCC, MPS2_SCC)
 
-#define NUM_OSCCLK 3
-
 struct MPS2SCC {
     /*< private >*/
     SysBusDevice parent_obj;
@@ -XXX,XX +XXX,XX @@ struct MPS2SCC {
     uint32_t dll;
     uint32_t aid;
     uint32_t id;
-    uint32_t oscclk[NUM_OSCCLK];
-    uint32_t oscclk_reset[NUM_OSCCLK];
+    uint32_t num_oscclk;
+    uint32_t *oscclk;
+    uint32_t *oscclk_reset;
 };
 
 #endif
diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
     qdev_prop_set_uint32(sccdev, "scc-cfg4", 0x2);
     qdev_prop_set_uint32(sccdev, "scc-aid", 0x00200008);
     qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
+    /* This will need to be per-FPGA image eventually */
+    qdev_prop_set_uint32(sccdev, "len-oscclk", 3);
+    qdev_prop_set_uint32(sccdev, "oscclk[0]", 50000000);
+    qdev_prop_set_uint32(sccdev, "oscclk[1]", 24576000);
+    qdev_prop_set_uint32(sccdev, "oscclk[2]", 25000000);
     sysbus_realize(SYS_BUS_DEVICE(scc), &error_fatal);
     return sysbus_mmio_get_region(SYS_BUS_DEVICE(sccdev), 0);
 }
diff --git a/hw/arm/mps2.c b/hw/arm/mps2.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2.c
+++ b/hw/arm/mps2.c
@@ -XXX,XX +XXX,XX @@ static void mps2_common_init(MachineState *machine)
     qdev_prop_set_uint32(sccdev, "scc-cfg4", 0x2);
     qdev_prop_set_uint32(sccdev, "scc-aid", 0x00200008);
     qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
+    /* All these FPGA images have the same OSCCLK configuration */
+    qdev_prop_set_uint32(sccdev, "len-oscclk", 3);
+    qdev_prop_set_uint32(sccdev, "oscclk[0]", 50000000);
+    qdev_prop_set_uint32(sccdev, "oscclk[1]", 24576000);
+    qdev_prop_set_uint32(sccdev, "oscclk[2]", 25000000);
     sysbus_realize(SYS_BUS_DEVICE(&mms->scc), &error_fatal);
     sysbus_mmio_map(SYS_BUS_DEVICE(sccdev), 0, 0x4002f000);
     object_initialize_child(OBJECT(mms), "fpgaio",
diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/mps2-scc.c
+++ b/hw/misc/mps2-scc.c
@@ -XXX,XX +XXX,XX @@ static bool scc_cfg_write(MPS2SCC *s, unsigned function,
 {
     trace_mps2_scc_cfg_write(function, device, value);
 
-    if (function != 1 || device >= NUM_OSCCLK) {
+    if (function != 1 || device >= s->num_oscclk) {
         qemu_log_mask(LOG_GUEST_ERROR,
                       "MPS2 SCC config write: bad function %d device %d\n",
                       function, device);
@@ -XXX,XX +XXX,XX @@ static bool scc_cfg_write(MPS2SCC *s, unsigned function,
 static bool scc_cfg_read(MPS2SCC *s, unsigned function,
                          unsigned device, uint32_t *value)
 {
-    if (function != 1 || device >= NUM_OSCCLK) {
+    if (function != 1 || device >= s->num_oscclk) {
         qemu_log_mask(LOG_GUEST_ERROR,
                       "MPS2 SCC config read: bad function %d device %d\n",
                       function, device);
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_reset(DeviceState *dev)
     s->cfgctrl = 0x100000;
     s->cfgstat = 0;
     s->dll = 0xffff0001;
-    for (i = 0; i < NUM_OSCCLK; i++) {
+    for (i = 0; i < s->num_oscclk; i++) {
         s->oscclk[i] = s->oscclk_reset[i];
     }
     for (i = 0; i < ARRAY_SIZE(s->led); i++) {
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_realize(DeviceState *dev, Error **errp)
                                       LED_COLOR_GREEN, name);
         g_free(name);
     }
+
+    s->oscclk = g_new0(uint32_t, s->num_oscclk);
 }
 
 static const VMStateDescription mps2_scc_vmstate = {
     .name = "mps2-scc",
-    .version_id = 1,
-    .minimum_version_id = 1,
+    .version_id = 2,
+    .minimum_version_id = 2,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(cfg0, MPS2SCC),
         VMSTATE_UINT32(cfg1, MPS2SCC),
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription mps2_scc_vmstate = {
         VMSTATE_UINT32(cfgctrl, MPS2SCC),
         VMSTATE_UINT32(cfgstat, MPS2SCC),
         VMSTATE_UINT32(dll, MPS2SCC),
-        VMSTATE_UINT32_ARRAY(oscclk, MPS2SCC, NUM_OSCCLK),
+        VMSTATE_VARRAY_UINT32(oscclk, MPS2SCC, num_oscclk,
+                              0, vmstate_info_uint32, uint32_t),
         VMSTATE_END_OF_LIST()
     }
 };
@@ -XXX,XX +XXX,XX @@ static Property mps2_scc_properties[] = {
     DEFINE_PROP_UINT32("scc-cfg4", MPS2SCC, cfg4, 0),
     DEFINE_PROP_UINT32("scc-aid", MPS2SCC, aid, 0),
     DEFINE_PROP_UINT32("scc-id", MPS2SCC, id, 0),
-    /* These are the initial settings for the source clocks on the board.
+    /*
+     * These are the initial settings for the source clocks on the board.
      * In hardware they can be configured via a config file read by the
      * motherboard configuration controller to suit the FPGA image.
-     * These default values are used by most of the standard FPGA images.
      */
-    DEFINE_PROP_UINT32("oscclk0", MPS2SCC, oscclk_reset[0], 50000000),
-    DEFINE_PROP_UINT32("oscclk1", MPS2SCC, oscclk_reset[1], 24576000),
-    DEFINE_PROP_UINT32("oscclk2", MPS2SCC, oscclk_reset[2], 25000000),
+    DEFINE_PROP_ARRAY("oscclk", MPS2SCC, num_oscclk, oscclk_reset,
+                      qdev_prop_uint32, uint32_t),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.20.1

We were previously using the default OSCCLK settings, which are
correct for the older MPS2 boards (mps2-an385, mps2-an386,
mps2-an500, mps2-an511), but wrong for the mps2-an505 and mps2-511
implemented in mps2-tz.c.  Now we're setting the values explicitly we
can fix them to be correct.

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
     qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
     /* This will need to be per-FPGA image eventually */
     qdev_prop_set_uint32(sccdev, "len-oscclk", 3);
-    qdev_prop_set_uint32(sccdev, "oscclk[0]", 50000000);
-    qdev_prop_set_uint32(sccdev, "oscclk[1]", 24576000);
+    qdev_prop_set_uint32(sccdev, "oscclk[0]", 40000000);
+    qdev_prop_set_uint32(sccdev, "oscclk[1]", 24580000);
     qdev_prop_set_uint32(sccdev, "oscclk[2]", 25000000);
     sysbus_realize(SYS_BUS_DEVICE(scc), &error_fatal);
     return sysbus_mmio_get_region(SYS_BUS_DEVICE(sccdev), 0);
-- 
2.20.1

The AN505 and AN511 happen to share the same OSCCLK values, but the
AN524 will have a different set (and more of them), so split the
settings out to be per-board.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-5-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineClass {
     MPS2TZFPGAType fpga_type;
     uint32_t scc_id;
     uint32_t sysclk_frq; /* Main SYSCLK frequency in Hz */
+    uint32_t len_oscclk;
+    const uint32_t *oscclk;
     const char *armsse_type;
 };
 
@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_TYPE(MPS2TZMachineState, MPS2TZMachineClass, MPS2TZ_MACHINE)
 /* Slow 32Khz S32KCLK frequency in Hz */
 #define S32KCLK_FRQ (32 * 1000)
 
+static const uint32_t an505_oscclk[] = {
+    40000000,
+    24580000,
+    25000000,
+};
+
 /* Create an alias of an entire original MemoryRegion @orig
  * located at @base in the memory map.
  */
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
     MPS2SCC *scc = opaque;
     DeviceState *sccdev;
     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+    uint32_t i;
 
     object_initialize_child(OBJECT(mms), "scc", scc, TYPE_MPS2_SCC);
     sccdev = DEVICE(scc);
     qdev_prop_set_uint32(sccdev, "scc-cfg4", 0x2);
     qdev_prop_set_uint32(sccdev, "scc-aid", 0x00200008);
     qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
-    /* This will need to be per-FPGA image eventually */
-    qdev_prop_set_uint32(sccdev, "len-oscclk", 3);
-    qdev_prop_set_uint32(sccdev, "oscclk[0]", 40000000);
-    qdev_prop_set_uint32(sccdev, "oscclk[1]", 24580000);
-    qdev_prop_set_uint32(sccdev, "oscclk[2]", 25000000);
+    qdev_prop_set_uint32(sccdev, "len-oscclk", mmc->len_oscclk);
+    for (i = 0; i < mmc->len_oscclk; i++) {
+        g_autofree char *propname = g_strdup_printf("oscclk[%u]", i);
+        qdev_prop_set_uint32(sccdev, propname, mmc->oscclk[i]);
+    }
     sysbus_realize(SYS_BUS_DEVICE(scc), &error_fatal);
     return sysbus_mmio_get_region(SYS_BUS_DEVICE(sccdev), 0);
 }
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
     mmc->scc_id = 0x41045050;
     mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
+    mmc->oscclk = an505_oscclk;
+    mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
     mmc->armsse_type = TYPE_IOTKIT;
 }
 
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
     mmc->scc_id = 0x41045210;
     mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
+    mmc->oscclk = an505_oscclk; /* AN521 is the same as AN505 here */
+    mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
     mmc->armsse_type = TYPE_SSE200;
 }
 
-- 
2.20.1

The MPS2 board has 2 LEDs, but the MPS3 board has 10 LEDs.  The
FPGAIO device is similar on both sets of boards, but the LED0
register has correspondingly more bits that have an effect.  Add a
device property for number of LEDs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-6-peter.maydell@linaro.org
---
 include/hw/misc/mps2-fpgaio.h |  5 ++++-
 hw/misc/mps2-fpgaio.c         | 31 +++++++++++++++++++++++--------
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/include/hw/misc/mps2-fpgaio.h b/include/hw/misc/mps2-fpgaio.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/mps2-fpgaio.h
+++ b/include/hw/misc/mps2-fpgaio.h
@@ -XXX,XX +XXX,XX @@
 #define TYPE_MPS2_FPGAIO "mps2-fpgaio"
 OBJECT_DECLARE_SIMPLE_TYPE(MPS2FPGAIO, MPS2_FPGAIO)
 
+#define MPS2FPGAIO_MAX_LEDS 32
+
 struct MPS2FPGAIO {
     /*< private >*/
     SysBusDevice parent_obj;
 
     /*< public >*/
     MemoryRegion iomem;
-    LEDState *led[2];
+    LEDState *led[MPS2FPGAIO_MAX_LEDS];
+    uint32_t num_leds;
 
     uint32_t led0;
     uint32_t prescale;
diff --git a/hw/misc/mps2-fpgaio.c b/hw/misc/mps2-fpgaio.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/mps2-fpgaio.c
+++ b/hw/misc/mps2-fpgaio.c
@@ -XXX,XX +XXX,XX @@ static void mps2_fpgaio_write(void *opaque, hwaddr offset, uint64_t value,
 
     switch (offset) {
     case A_LED0:
-        s->led0 = value & 0x3;
-        led_set_state(s->led[0], value & 0x01);
-        led_set_state(s->led[1], value & 0x02);
+        if (s->num_leds != 0) {
+            uint32_t i;
+
+            s->led0 = value & MAKE_64BIT_MASK(0, s->num_leds);
+            for (i = 0; i < s->num_leds; i++) {
+                led_set_state(s->led[i], value & (1 << i));
+            }
+        }
         break;
     case A_PRESCALE:
         resync_counter(s);
@@ -XXX,XX +XXX,XX @@ static void mps2_fpgaio_reset(DeviceState *dev)
     s->pscntr = 0;
     s->pscntr_sync_ticks = now;
 
-    for (size_t i = 0; i < ARRAY_SIZE(s->led); i++) {
+    for (size_t i = 0; i < s->num_leds; i++) {
         device_cold_reset(DEVICE(s->led[i]));
     }
 }
@@ -XXX,XX +XXX,XX @@ static void mps2_fpgaio_init(Object *obj)
 static void mps2_fpgaio_realize(DeviceState *dev, Error **errp)
 {
     MPS2FPGAIO *s = MPS2_FPGAIO(dev);
+    uint32_t i;
 
-    s->led[0] = led_create_simple(OBJECT(dev), GPIO_POLARITY_ACTIVE_HIGH,
-                                  LED_COLOR_GREEN, "USERLED0");
-    s->led[1] = led_create_simple(OBJECT(dev), GPIO_POLARITY_ACTIVE_HIGH,
-                                  LED_COLOR_GREEN, "USERLED1");
+    if (s->num_leds > MPS2FPGAIO_MAX_LEDS) {
+        error_setg(errp, "num-leds cannot be greater than %d",
+                   MPS2FPGAIO_MAX_LEDS);
+        return;
+    }
+
+    for (i = 0; i < s->num_leds; i++) {
+        g_autofree char *ledname = g_strdup_printf("USERLED%d", i);
+        s->led[i] = led_create_simple(OBJECT(dev), GPIO_POLARITY_ACTIVE_HIGH,
+                                      LED_COLOR_GREEN, ledname);
+    }
 }
 
 static bool mps2_fpgaio_counters_needed(void *opaque)
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription mps2_fpgaio_vmstate = {
 static Property mps2_fpgaio_properties[] = {
     /* Frequency of the prescale counter */
     DEFINE_PROP_UINT32("prescale-clk", MPS2FPGAIO, prescale_clk, 20000000),
+    /* Number of LEDs controlled by LED0 register */
+    DEFINE_PROP_UINT32("num-leds", MPS2FPGAIO, num_leds, 2),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.20.1

MPS3 boards have an extra SWITCH register in the FPGAIO block which
reports the value of some switches.  Implement this, governed by a
property the board code can use to specify whether whether it exists.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-7-peter.maydell@linaro.org
---
 include/hw/misc/mps2-fpgaio.h |  1 +
 hw/misc/mps2-fpgaio.c         | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/hw/misc/mps2-fpgaio.h b/include/hw/misc/mps2-fpgaio.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/mps2-fpgaio.h
+++ b/include/hw/misc/mps2-fpgaio.h
@@ -XXX,XX +XXX,XX @@ struct MPS2FPGAIO {
     MemoryRegion iomem;
     LEDState *led[MPS2FPGAIO_MAX_LEDS];
     uint32_t num_leds;
+    bool has_switches;
 
     uint32_t led0;
     uint32_t prescale;
diff --git a/hw/misc/mps2-fpgaio.c b/hw/misc/mps2-fpgaio.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/mps2-fpgaio.c
+++ b/hw/misc/mps2-fpgaio.c
@@ -XXX,XX +XXX,XX @@ REG32(CLK100HZ, 0x14)
 REG32(COUNTER, 0x18)
 REG32(PRESCALE, 0x1c)
 REG32(PSCNTR, 0x20)
+REG32(SWITCH, 0x28)
 REG32(MISC, 0x4c)
 
 static uint32_t counter_from_tickoff(int64_t now, int64_t tick_offset, int frq)
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_fpgaio_read(void *opaque, hwaddr offset, unsigned size)
         resync_counter(s);
         r = s->pscntr;
         break;
+    case A_SWITCH:
+        if (!s->has_switches) {
+            goto bad_offset;
+        }
+        /* User-togglable board switches. We don't model that, so report 0. */
+        r = 0;
+        break;
     default:
+    bad_offset:
         qemu_log_mask(LOG_GUEST_ERROR,
                       "MPS2 FPGAIO read: bad offset %x\n", (int) offset);
         r = 0;
@@ -XXX,XX +XXX,XX @@ static Property mps2_fpgaio_properties[] = {
     DEFINE_PROP_UINT32("prescale-clk", MPS2FPGAIO, prescale_clk, 20000000),
     /* Number of LEDs controlled by LED0 register */
     DEFINE_PROP_UINT32("num-leds", MPS2FPGAIO, num_leds, 2),
+    DEFINE_PROP_BOOL("has-switches", MPS2FPGAIO, has_switches, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.20.1

Set the FPGAIO num-leds and have-switches properties explicitly
per-board, rather than relying on the defaults.  The AN505 and AN521
both have the same settings as the default values, but the AN524 will
be different.

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineClass {
     uint32_t sysclk_frq; /* Main SYSCLK frequency in Hz */
     uint32_t len_oscclk;
     const uint32_t *oscclk;
+    uint32_t fpgaio_num_leds; /* Number of LEDs in FPGAIO LED0 register */
+    bool fpgaio_has_switches; /* Does FPGAIO have SWITCH register? */
     const char *armsse_type;
 };
 
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_fpgaio(MPS2TZMachineState *mms, void *opaque,
                                  const char *name, hwaddr size)
 {
     MPS2FPGAIO *fpgaio = opaque;
+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
 
     object_initialize_child(OBJECT(mms), "fpgaio", fpgaio, TYPE_MPS2_FPGAIO);
+    qdev_prop_set_uint32(DEVICE(fpgaio), "num-leds", mmc->fpgaio_num_leds);
+    qdev_prop_set_bit(DEVICE(fpgaio), "has-switches", mmc->fpgaio_has_switches);
     sysbus_realize(SYS_BUS_DEVICE(fpgaio), &error_fatal);
     return sysbus_mmio_get_region(SYS_BUS_DEVICE(fpgaio), 0);
 }
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
     mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
     mmc->oscclk = an505_oscclk;
     mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
+    mmc->fpgaio_num_leds = 2;
+    mmc->fpgaio_has_switches = false;
     mmc->armsse_type = TYPE_IOTKIT;
 }
 
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
     mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
     mmc->oscclk = an505_oscclk; /* AN521 is the same as AN505 here */
     mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
+    mmc->fpgaio_num_leds = 2;
+    mmc->fpgaio_has_switches = false;
     mmc->armsse_type = TYPE_SSE200;
 }
 
-- 
2.20.1

In the mps2-tz board code, we handle devices whose interrupt lines
must be wired to all CPUs by creating IRQ splitter devices for the
AN521, because it has 2 CPUs, but wiring the device IRQ directly to
the SSE/IoTKit input for the AN505, which has only 1 CPU.

We can avoid making an explicit check on the board type constant by
instead creating and using the IRQ splitters for any board with more
than 1 CPU.  This avoids having to add extra cases to the
conditionals every time we add new boards.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-9-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void make_ram_alias(MemoryRegion *mr, const char *name,
 static qemu_irq get_sse_irq_in(MPS2TZMachineState *mms, int irqno)
 {
     /* Return a qemu_irq which will signal IRQ n to all CPUs in the SSE. */
-    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+    MachineClass *mc = MACHINE_GET_CLASS(mms);
 
     assert(irqno < MPS2TZ_NUMIRQ);
 
-    switch (mmc->fpga_type) {
-    case FPGA_AN505:
-        return qdev_get_gpio_in_named(DEVICE(&mms->iotkit), "EXP_IRQ", irqno);
-    case FPGA_AN521:
+    if (mc->max_cpus > 1) {
         return qdev_get_gpio_in(DEVICE(&mms->cpu_irq_splitter[irqno]), 0);
-    default:
-        g_assert_not_reached();
+    } else {
+        return qdev_get_gpio_in_named(DEVICE(&mms->iotkit), "EXP_IRQ", irqno);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
     sysbus_realize(SYS_BUS_DEVICE(&mms->iotkit), &error_fatal);
 
     /*
-     * The AN521 needs us to create splitters to feed the IRQ inputs
-     * for each CPU in the SSE-200 from each device in the board.
+     * If this board has more than one CPU, then we need to create splitters
+     * to feed the IRQ inputs for each CPU in the SSE from each device in the
+     * board. If there is only one CPU, we can just wire the device IRQ
+     * directly to the SSE's IRQ input.
      */
-    if (mmc->fpga_type == FPGA_AN521) {
+    if (mc->max_cpus > 1) {
         for (i = 0; i < MPS2TZ_NUMIRQ; i++) {
             char *name = g_strdup_printf("mps2-irq-splitter%d", i);
             SplitIRQ *splitter = &mms->cpu_irq_splitter[i];
-- 
2.20.1

The AN524 has more interrupt lines than the AN505 and AN521; make
numirq board-specific rather than a compile-time constant.

Since the difference is small (92 on the current boards and 95 on the
new one) we don't dynamically allocate the cpu_irq_splitter[] array
but leave it as a fixed length array whose size is the maximum needed
for any of the boards.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-10-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/qdev-clock.h"
 #include "qom/object.h"
 
-#define MPS2TZ_NUMIRQ 92
+#define MPS2TZ_NUMIRQ_MAX 92
 
 typedef enum MPS2TZFPGAType {
     FPGA_AN505,
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineClass {
     const uint32_t *oscclk;
     uint32_t fpgaio_num_leds; /* Number of LEDs in FPGAIO LED0 register */
     bool fpgaio_has_switches; /* Does FPGAIO have SWITCH register? */
+    int numirq; /* Number of external interrupts */
     const char *armsse_type;
 };
 
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
     SplitIRQ sec_resp_splitter;
     qemu_or_irq uart_irq_orgate;
     DeviceState *lan9118;
-    SplitIRQ cpu_irq_splitter[MPS2TZ_NUMIRQ];
+    SplitIRQ cpu_irq_splitter[MPS2TZ_NUMIRQ_MAX];
     Clock *sysclk;
     Clock *s32kclk;
 };
@@ -XXX,XX +XXX,XX @@ static qemu_irq get_sse_irq_in(MPS2TZMachineState *mms, int irqno)
 {
     /* Return a qemu_irq which will signal IRQ n to all CPUs in the SSE. */
     MachineClass *mc = MACHINE_GET_CLASS(mms);
+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
 
-    assert(irqno < MPS2TZ_NUMIRQ);
+    assert(irqno < mmc->numirq);
 
     if (mc->max_cpus > 1) {
         return qdev_get_gpio_in(DEVICE(&mms->cpu_irq_splitter[irqno]), 0);
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
     iotkitdev = DEVICE(&mms->iotkit);
     object_property_set_link(OBJECT(&mms->iotkit), "memory",
                              OBJECT(system_memory), &error_abort);
-    qdev_prop_set_uint32(iotkitdev, "EXP_NUMIRQ", MPS2TZ_NUMIRQ);
+    qdev_prop_set_uint32(iotkitdev, "EXP_NUMIRQ", mmc->numirq);
     qdev_connect_clock_in(iotkitdev, "MAINCLK", mms->sysclk);
     qdev_connect_clock_in(iotkitdev, "S32KCLK", mms->s32kclk);
     sysbus_realize(SYS_BUS_DEVICE(&mms->iotkit), &error_fatal);
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
      * board. If there is only one CPU, we can just wire the device IRQ
      * directly to the SSE's IRQ input.
      */
+    assert(mmc->numirq <= MPS2TZ_NUMIRQ_MAX);
     if (mc->max_cpus > 1) {
-        for (i = 0; i < MPS2TZ_NUMIRQ; i++) {
+        for (i = 0; i < mmc->numirq; i++) {
             char *name = g_strdup_printf("mps2-irq-splitter%d", i);
             SplitIRQ *splitter = &mms->cpu_irq_splitter[i];
 
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
     mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
     mmc->fpgaio_num_leds = 2;
     mmc->fpgaio_has_switches = false;
+    mmc->numirq = 92;
     mmc->armsse_type = TYPE_IOTKIT;
 }
 
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
     mmc->len_oscclk = ARRAY_SIZE(an505_oscclk);
     mmc->fpgaio_num_leds = 2;
     mmc->fpgaio_has_switches = false;
+    mmc->numirq = 92;
     mmc->armsse_type = TYPE_SSE200;
 }
 
-- 
2.20.1

The AN524 version of the SCC interface has different behaviour for
some of the CFG registers; implement it.

Each board in this family can have minor differences in the meaning
of the CFG registers, so rather than trying to specify all the
possible semantics via individual device properties, we make the
behaviour conditional on the part-number field of the SCC_ID register
which the board code already passes us.

For the AN524, the differences are:
 * CFG3 is reserved rather than being board switches
 * CFG5 is a new register ("ACLK Frequency in Hz")
 * CFG6 is a new register ("Clock divider for BRAM")

We implement both of the new registers as reads-as-written.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-11-peter.maydell@linaro.org
---
 include/hw/misc/mps2-scc.h |  3 ++
 hw/misc/mps2-scc.c         | 71 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/mps2-scc.h
+++ b/include/hw/misc/mps2-scc.h
@@ -XXX,XX +XXX,XX @@ struct MPS2SCC {
 
     uint32_t cfg0;
     uint32_t cfg1;
+    uint32_t cfg2;
     uint32_t cfg4;
+    uint32_t cfg5;
+    uint32_t cfg6;
     uint32_t cfgdata_rtn;
     uint32_t cfgdata_out;
     uint32_t cfgctrl;
diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/mps2-scc.c
+++ b/hw/misc/mps2-scc.c
@@ -XXX,XX +XXX,XX @@
 
 REG32(CFG0, 0)
 REG32(CFG1, 4)
+REG32(CFG2, 8)
 REG32(CFG3, 0xc)
 REG32(CFG4, 0x10)
+REG32(CFG5, 0x14)
+REG32(CFG6, 0x18)
 REG32(CFGDATA_RTN, 0xa0)
 REG32(CFGDATA_OUT, 0xa4)
 REG32(CFGCTRL, 0xa8)
@@ -XXX,XX +XXX,XX @@ REG32(DLL, 0x100)
 REG32(AID, 0xFF8)
 REG32(ID, 0xFFC)
 
+static int scc_partno(MPS2SCC *s)
+{
+    /* Return the partno field of the SCC_ID (0x524, 0x511, etc) */
+    return extract32(s->id, 4, 8);
+}
+
 /* Handle a write via the SYS_CFG channel to the specified function/device.
  * Return false on error (reported to guest via SYS_CFGCTRL ERROR bit).
  */
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
     case A_CFG1:
         r = s->cfg1;
         break;
+    case A_CFG2:
+        if (scc_partno(s) != 0x524) {
+            /* CFG2 reserved on other boards */
+            goto bad_offset;
+        }
+        r = s->cfg2;
+        break;
     case A_CFG3:
+        if (scc_partno(s) == 0x524) {
+            /* CFG3 reserved on AN524 */
+            goto bad_offset;
+        }
         /* These are user-settable DIP switches on the board. We don't
          * model that, so just return zeroes.
          */
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
     case A_CFG4:
         r = s->cfg4;
         break;
+    case A_CFG5:
+        if (scc_partno(s) != 0x524) {
+            /* CFG5 reserved on other boards */
+            goto bad_offset;
+        }
+        r = s->cfg5;
+        break;
+    case A_CFG6:
+        if (scc_partno(s) != 0x524) {
+            /* CFG6 reserved on other boards */
+            goto bad_offset;
+        }
+        r = s->cfg6;
+        break;
     case A_CFGDATA_RTN:
         r = s->cfgdata_rtn;
         break;
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
         r = s->id;
         break;
     default:
+    bad_offset:
         qemu_log_mask(LOG_GUEST_ERROR,
                       "MPS2 SCC read: bad offset %x\n", (int) offset);
         r = 0;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
             led_set_state(s->led[i], extract32(value, i, 1));
         }
         break;
+    case A_CFG2:
+        if (scc_partno(s) != 0x524) {
+            /* CFG2 reserved on other boards */
+            goto bad_offset;
+        }
+        /* AN524: QSPI Select signal */
+        s->cfg2 = value;
+        break;
+    case A_CFG5:
+        if (scc_partno(s) != 0x524) {
+            /* CFG5 reserved on other boards */
+            goto bad_offset;
+        }
+        /* AN524: ACLK frequency in Hz */
+        s->cfg5 = value;
+        break;
+    case A_CFG6:
+        if (scc_partno(s) != 0x524) {
+            /* CFG6 reserved on other boards */
+            goto bad_offset;
+        }
+        /* AN524: Clock divider for BRAM */
+        s->cfg6 = value;
+        break;
     case A_CFGDATA_OUT:
         s->cfgdata_out = value;
         break;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
         s->dll = deposit32(s->dll, 24, 8, extract32(value, 24, 8));
         break;
     default:
+    bad_offset:
         qemu_log_mask(LOG_GUEST_ERROR,
                       "MPS2 SCC write: bad offset 0x%x\n", (int) offset);
         break;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_reset(DeviceState *dev)
     trace_mps2_scc_reset();
     s->cfg0 = 0;
     s->cfg1 = 0;
+    s->cfg2 = 0;
+    s->cfg5 = 0;
+    s->cfg6 = 0;
     s->cfgdata_rtn = 0;
     s->cfgdata_out = 0;
     s->cfgctrl = 0x100000;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_realize(DeviceState *dev, Error **errp)
 
 static const VMStateDescription mps2_scc_vmstate = {
     .name = "mps2-scc",
-    .version_id = 2,
-    .minimum_version_id = 2,
+    .version_id = 3,
+    .minimum_version_id = 3,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(cfg0, MPS2SCC),
         VMSTATE_UINT32(cfg1, MPS2SCC),
+        VMSTATE_UINT32(cfg2, MPS2SCC),
+        /* cfg3, cfg4 are read-only so need not be migrated */
+        VMSTATE_UINT32(cfg5, MPS2SCC),
+        VMSTATE_UINT32(cfg6, MPS2SCC),
         VMSTATE_UINT32(cfgdata_rtn, MPS2SCC),
         VMSTATE_UINT32(cfgdata_out, MPS2SCC),
         VMSTATE_UINT32(cfgctrl, MPS2SCC),
-- 
2.20.1

On the MPS2 boards, the first 32 interrupt lines are entirely
internal to the SSE; interrupt lines for devices outside the SSE
start at 32.  In the application notes that document each FPGA image,
the interrupt wiring is documented from the point of view of the CPU,
so '0' is the first of the SSE's interrupts and the devices in the
FPGA image itself are '32' and up: so the UART 0 Receive interrupt is
32, the SPI #0 interrupt is 51, and so on.

Within our implementation, because the external interrupts must be
connected to the EXP_IRQ[0...n] lines of the SSE object, we made the
get_sse_irq_in() function take an irqno whose values start at 0 for
the first FPGA device interrupt.  In this numbering scheme the UART 0
Receive interrupt is 0, the SPI #0 interrupt is 19, and so on.

The result of these two different numbering schemes has been that
half of the devices were wired up to the wrong IRQs: the UART IRQs
are wired up correctly, but the DMA and SPI devices were passing
start-at-32 values to get_sse_irq_in() and so being mis-connected.

Fix the bug by making get_sse_irq_in() take values specified with the
same scheme that the hardware manuals use, to avoid confusion.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-12-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void make_ram_alias(MemoryRegion *mr, const char *name,
 
 static qemu_irq get_sse_irq_in(MPS2TZMachineState *mms, int irqno)
 {
-    /* Return a qemu_irq which will signal IRQ n to all CPUs in the SSE. */
+    /*
+     * Return a qemu_irq which will signal IRQ n to all CPUs in the
+     * SSE.  The irqno should be as the CPU sees it, so the first
+     * external-to-the-SSE interrupt is 32.
+     */
     MachineClass *mc = MACHINE_GET_CLASS(mms);
     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
 
-    assert(irqno < mmc->numirq);
+    assert(irqno >= 32 && irqno < (mmc->numirq + 32));
+
+    /*
+     * Convert from "CPU irq number" (as listed in the FPGA image
+     * documentation) to the SSE external-interrupt number.
+     */
+    irqno -= 32;
 
     if (mc->max_cpus > 1) {
         return qdev_get_gpio_in(DEVICE(&mms->cpu_irq_splitter[irqno]), 0);
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
     CMSDKAPBUART *uart = opaque;
     int i = uart - &mms->uart[0];
-    int rxirqno = i * 2;
-    int txirqno = i * 2 + 1;
-    int combirqno = i + 10;
+    int rxirqno = i * 2 + 32;
+    int txirqno = i * 2 + 33;
+    int combirqno = i + 42;
     SysBusDevice *s;
     DeviceState *orgate_dev = DEVICE(&mms->uart_irq_orgate);
 
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
 
     s = SYS_BUS_DEVICE(mms->lan9118);
     sysbus_realize_and_unref(s, &error_fatal);
-    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, 16));
+    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, 48));
     return sysbus_mmio_get_region(s, 0);
 }
 
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
                             &error_fatal);
     qdev_realize(DEVICE(&mms->uart_irq_orgate), NULL, &error_fatal);
     qdev_connect_gpio_out(DEVICE(&mms->uart_irq_orgate), 0,
-                          get_sse_irq_in(mms, 15));
+                          get_sse_irq_in(mms, 47));
 
     /* Most of the devices in the FPGA are behind Peripheral Protection
      * Controllers. The required order for initializing things is:
-- 
2.20.1

The mps2-tz code uses PPCPortInfo data structures to define what
devices are present and how they are wired up.  Currently we use
these to specify device types and addresses, but hard-code the
interrupt line wiring in each make_* helper function.  This works for
the two boards we have at the moment, but the AN524 has some devices
with different interrupt assignments.

This commit adds the framework to allow PPCPortInfo structures to
specify interrupt numbers.  We add an array of interrupt numbers to
the PPCPortInfo struct, and pass it through to the make_* helpers.
The following commit will change the make_* helpers over to using the
framework.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-13-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static qemu_irq get_sse_irq_in(MPS2TZMachineState *mms, int irqno)
  * needs to be plugged into the downstream end of the PPC port.
  */
 typedef MemoryRegion *MakeDevFn(MPS2TZMachineState *mms, void *opaque,
-                                const char *name, hwaddr size);
+                                const char *name, hwaddr size,
+                                const int *irqs);
 
 typedef struct PPCPortInfo {
     const char *name;
@@ -XXX,XX +XXX,XX @@ typedef struct PPCPortInfo {
     void *opaque;
     hwaddr addr;
     hwaddr size;
+    int irqs[3]; /* currently no device needs more IRQ lines than this */
 } PPCPortInfo;
 
 typedef struct PPCInfo {
@@ -XXX,XX +XXX,XX @@ typedef struct PPCInfo {
 } PPCInfo;
 
 static MemoryRegion *make_unimp_dev(MPS2TZMachineState *mms,
-                                       void *opaque,
-                                       const char *name, hwaddr size)
+                                    void *opaque,
+                                    const char *name, hwaddr size,
+                                    const int *irqs)
 {
     /* Initialize, configure and realize a TYPE_UNIMPLEMENTED_DEVICE,
      * and return a pointer to its MemoryRegion.
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_unimp_dev(MPS2TZMachineState *mms,
 }
 
 static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
-                               const char *name, hwaddr size)
+                               const char *name, hwaddr size,
+                               const int *irqs)
 {
     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
     CMSDKAPBUART *uart = opaque;
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
 }
 
 static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
-                              const char *name, hwaddr size)
+                              const char *name, hwaddr size,
+                              const int *irqs)
 {
     MPS2SCC *scc = opaque;
     DeviceState *sccdev;
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
 }
 
 static MemoryRegion *make_fpgaio(MPS2TZMachineState *mms, void *opaque,
-                                 const char *name, hwaddr size)
+                                 const char *name, hwaddr size,
+                                 const int *irqs)
 {
     MPS2FPGAIO *fpgaio = opaque;
     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_fpgaio(MPS2TZMachineState *mms, void *opaque,
 }
 
 static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
-                                  const char *name, hwaddr size)
+                                  const char *name, hwaddr size,
+                                  const int *irqs)
 {
     SysBusDevice *s;
     NICInfo *nd = &nd_table[0];
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
 }
 
 static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
-                              const char *name, hwaddr size)
+                              const char *name, hwaddr size,
+                              const int *irqs)
 {
     TZMPC *mpc = opaque;
     int i = mpc - &mms->ssram_mpc[0];
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
 }
 
 static MemoryRegion *make_dma(MPS2TZMachineState *mms, void *opaque,
-                              const char *name, hwaddr size)
+                              const char *name, hwaddr size,
+                              const int *irqs)
 {
     PL080State *dma = opaque;
     int i = dma - &mms->dma[0];
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_dma(MPS2TZMachineState *mms, void *opaque,
 }
 
 static MemoryRegion *make_spi(MPS2TZMachineState *mms, void *opaque,
-                              const char *name, hwaddr size)
+                              const char *name, hwaddr size,
+                              const int *irqs)
 {
     /*
      * The AN505 has five PL022 SPI controllers.
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_spi(MPS2TZMachineState *mms, void *opaque,
 }
 
 static MemoryRegion *make_i2c(MPS2TZMachineState *mms, void *opaque,
-                              const char *name, hwaddr size)
+                              const char *name, hwaddr size,
+                              const int *irqs)
 {
     ArmSbconI2CState *i2c = opaque;
     SysBusDevice *s;
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
                 continue;
             }
 
-            mr = pinfo->devfn(mms, pinfo->opaque, pinfo->name, pinfo->size);
+            mr = pinfo->devfn(mms, pinfo->opaque, pinfo->name, pinfo->size,
+                              pinfo->irqs);
             portname = g_strdup_printf("port[%d]", port);
             object_property_set_link(OBJECT(ppc), portname, OBJECT(mr),
                                      &error_fatal);
-- 
2.20.1

Move the specification of the IRQ information for the uart, ethernet,
dma and spi devices to the data structures.  (The other devices
handled by the PPCPortInfo structures don't have any interrupt lines
we need to wire up.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-14-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 52 +++++++++++++++++++++++-------------------------
 1 file changed, 25 insertions(+), 27 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
                                const char *name, hwaddr size,
                                const int *irqs)
 {
+    /* The irq[] array is tx, rx, combined, in that order */
     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
     CMSDKAPBUART *uart = opaque;
     int i = uart - &mms->uart[0];
-    int rxirqno = i * 2 + 32;
-    int txirqno = i * 2 + 33;
-    int combirqno = i + 42;
     SysBusDevice *s;
     DeviceState *orgate_dev = DEVICE(&mms->uart_irq_orgate);
 
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
     qdev_prop_set_uint32(DEVICE(uart), "pclk-frq", mmc->sysclk_frq);
     sysbus_realize(SYS_BUS_DEVICE(uart), &error_fatal);
     s = SYS_BUS_DEVICE(uart);
-    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, txirqno));
-    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, rxirqno));
+    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
+    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, irqs[1]));
     sysbus_connect_irq(s, 2, qdev_get_gpio_in(orgate_dev, i * 2));
     sysbus_connect_irq(s, 3, qdev_get_gpio_in(orgate_dev, i * 2 + 1));
-    sysbus_connect_irq(s, 4, get_sse_irq_in(mms, combirqno));
+    sysbus_connect_irq(s, 4, get_sse_irq_in(mms, irqs[2]));
     return sysbus_mmio_get_region(SYS_BUS_DEVICE(uart), 0);
 }
 
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
 
     s = SYS_BUS_DEVICE(mms->lan9118);
     sysbus_realize_and_unref(s, &error_fatal);
-    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, 48));
+    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
     return sysbus_mmio_get_region(s, 0);
 }
 
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_dma(MPS2TZMachineState *mms, void *opaque,
                               const char *name, hwaddr size,
                               const int *irqs)
 {
+    /* The irq[] array is DMACINTR, DMACINTERR, DMACINTTC, in that order */
     PL080State *dma = opaque;
     int i = dma - &mms->dma[0];
     SysBusDevice *s;
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_dma(MPS2TZMachineState *mms, void *opaque,
 
     s = SYS_BUS_DEVICE(dma);
     /* Wire up DMACINTR, DMACINTERR, DMACINTTC */
-    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, 58 + i * 3));
-    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, 56 + i * 3));
-    sysbus_connect_irq(s, 2, get_sse_irq_in(mms, 57 + i * 3));
+    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
+    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, irqs[1]));
+    sysbus_connect_irq(s, 2, get_sse_irq_in(mms, irqs[2]));
 
     g_free(mscname);
     return sysbus_mmio_get_region(s, 0);
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_spi(MPS2TZMachineState *mms, void *opaque,
      * lines are set via the "MISC" register in the MPS2 FPGAIO device.
      */
     PL022State *spi = opaque;
-    int i = spi - &mms->spi[0];
     SysBusDevice *s;
 
     object_initialize_child(OBJECT(mms), name, spi, TYPE_PL022);
     sysbus_realize(SYS_BUS_DEVICE(spi), &error_fatal);
     s = SYS_BUS_DEVICE(spi);
-    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, 51 + i));
+    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
     return sysbus_mmio_get_region(s, 0);
 }
 
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
         }, {
             .name = "apb_ppcexp1",
             .ports = {
-                { "spi0", make_spi, &mms->spi[0], 0x40205000, 0x1000 },
-                { "spi1", make_spi, &mms->spi[1], 0x40206000, 0x1000 },
-                { "spi2", make_spi, &mms->spi[2], 0x40209000, 0x1000 },
-                { "spi3", make_spi, &mms->spi[3], 0x4020a000, 0x1000 },
-                { "spi4", make_spi, &mms->spi[4], 0x4020b000, 0x1000 },
-                { "uart0", make_uart, &mms->uart[0], 0x40200000, 0x1000 },
-                { "uart1", make_uart, &mms->uart[1], 0x40201000, 0x1000 },
-                { "uart2", make_uart, &mms->uart[2], 0x40202000, 0x1000 },
-                { "uart3", make_uart, &mms->uart[3], 0x40203000, 0x1000 },
-                { "uart4", make_uart, &mms->uart[4], 0x40204000, 0x1000 },
+                { "spi0", make_spi, &mms->spi[0], 0x40205000, 0x1000, { 51 } },
+                { "spi1", make_spi, &mms->spi[1], 0x40206000, 0x1000, { 52 } },
+                { "spi2", make_spi, &mms->spi[2], 0x40209000, 0x1000, { 53 } },
+                { "spi3", make_spi, &mms->spi[3], 0x4020a000, 0x1000, { 54 } },
+                { "spi4", make_spi, &mms->spi[4], 0x4020b000, 0x1000, { 55 } },
+                { "uart0", make_uart, &mms->uart[0], 0x40200000, 0x1000, { 32, 33, 42 } },
+                { "uart1", make_uart, &mms->uart[1], 0x40201000, 0x1000, { 34, 35, 43 } },
+                { "uart2", make_uart, &mms->uart[2], 0x40202000, 0x1000, { 36, 37, 44 } },
+                { "uart3", make_uart, &mms->uart[3], 0x40203000, 0x1000, { 38, 39, 45 } },
+                { "uart4", make_uart, &mms->uart[4], 0x40204000, 0x1000, { 40, 41, 46 } },
                 { "i2c0", make_i2c, &mms->i2c[0], 0x40207000, 0x1000 },
                 { "i2c1", make_i2c, &mms->i2c[1], 0x40208000, 0x1000 },
                 { "i2c2", make_i2c, &mms->i2c[2], 0x4020c000, 0x1000 },
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
                 { "gpio1", make_unimp_dev, &mms->gpio[1], 0x40101000, 0x1000 },
                 { "gpio2", make_unimp_dev, &mms->gpio[2], 0x40102000, 0x1000 },
                 { "gpio3", make_unimp_dev, &mms->gpio[3], 0x40103000, 0x1000 },
-                { "eth", make_eth_dev, NULL, 0x42000000, 0x100000 },
+                { "eth", make_eth_dev, NULL, 0x42000000, 0x100000, { 48 } },
             },
         }, {
             .name = "ahb_ppcexp1",
             .ports = {
-                { "dma0", make_dma, &mms->dma[0], 0x40110000, 0x1000 },
-                { "dma1", make_dma, &mms->dma[1], 0x40111000, 0x1000 },
-                { "dma2", make_dma, &mms->dma[2], 0x40112000, 0x1000 },
-                { "dma3", make_dma, &mms->dma[3], 0x40113000, 0x1000 },
+                { "dma0", make_dma, &mms->dma[0], 0x40110000, 0x1000, { 58, 56, 57 } },
+                { "dma1", make_dma, &mms->dma[1], 0x40111000, 0x1000, { 61, 59, 60 } },
+                { "dma2", make_dma, &mms->dma[2], 0x40112000, 0x1000, { 64, 62, 63 } },
+                { "dma3", make_dma, &mms->dma[3], 0x40113000, 0x1000, { 67, 65, 66 } },
             },
         },
     };
-- 
2.20.1

We create an OR gate to wire together the overflow IRQs for all the
UARTs on the board; this has to have twice the number of inputs as
there are UARTs, since each UART feeds it a TX overflow and an RX
overflow interrupt line.  Replace the hardcoded '10' with a
calculation based on the size of the uart[] array in the
MPS2TZMachineState.  (We rely on OR gate inputs that are never wired
up or asserted being treated as always-zero.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-15-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
      */
     memory_region_add_subregion(system_memory, 0x80000000, machine->ram);
 
-    /* The overflow IRQs for all UARTs are ORed together.
+    /*
+     * The overflow IRQs for all UARTs are ORed together.
      * Tx, Rx and "combined" IRQs are sent to the NVIC separately.
-     * Create the OR gate for this.
+     * Create the OR gate for this: it has one input for the TX overflow
+     * and one for the RX overflow for each UART we might have.
+     * (If the board has fewer than the maximum possible number of UARTs
+     * those inputs are never wired up and are treated as always-zero.)
      */
     object_initialize_child(OBJECT(mms), "uart-irq-orgate",
                             &mms->uart_irq_orgate, TYPE_OR_IRQ);
-    object_property_set_int(OBJECT(&mms->uart_irq_orgate), "num-lines", 10,
+    object_property_set_int(OBJECT(&mms->uart_irq_orgate), "num-lines",
+                            2 * ARRAY_SIZE(mms->uart),
                             &error_fatal);
     qdev_realize(DEVICE(&mms->uart_irq_orgate), NULL, &error_fatal);
     qdev_connect_gpio_out(DEVICE(&mms->uart_irq_orgate), 0,
-- 
2.20.1

The AN505 and AN521 have the same device layout, but the AN524 is
somewhat different.  Allow for more than one PPCInfo array, which can
be selected based on the board type.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-16-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
     MemoryRegion *system_memory = get_system_memory();
     DeviceState *iotkitdev;
     DeviceState *dev_splitter;
+    const PPCInfo *ppcs;
+    int num_ppcs;
     int i;
 
     if (strcmp(machine->cpu_type, mc->default_cpu_type) != 0) {
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
      *  + wire up the PPC's control lines to the IoTKit object
      */
 
-    const PPCInfo ppcs[] = { {
+    const PPCInfo an505_ppcs[] = { {
             .name = "apb_ppcexp0",
             .ports = {
                 { "ssram-0", make_mpc, &mms->ssram_mpc[0], 0x58007000, 0x1000 },
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
         },
     };
 
-    for (i = 0; i < ARRAY_SIZE(ppcs); i++) {
+    switch (mmc->fpga_type) {
+    case FPGA_AN505:
+    case FPGA_AN521:
+        ppcs = an505_ppcs;
+        num_ppcs = ARRAY_SIZE(an505_ppcs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    for (i = 0; i < num_ppcs; i++) {
         const PPCInfo *ppcinfo = &ppcs[i];
         TZPPC *ppc = &mms->ppc[i];
         DeviceState *ppcdev;
-- 
2.20.1

The AN505 and AN521 have the same layout of RAM; the AN524 does not.
Replace the current hard-coding of where the RAM is and which parts
of it are behind which MPCs with a data-driven approach.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-17-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 175 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 138 insertions(+), 37 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
 #include "qom/object.h"
 
 #define MPS2TZ_NUMIRQ_MAX 92
+#define MPS2TZ_RAM_MAX 4
 
 typedef enum MPS2TZFPGAType {
     FPGA_AN505,
     FPGA_AN521,
 } MPS2TZFPGAType;
 
+/*
+ * Define the layout of RAM in a board, including which parts are
+ * behind which MPCs.
+ * mrindex specifies the index into mms->ram[] to use for the backing RAM;
+ * -1 means "use the system RAM".
+ */
+typedef struct RAMInfo {
+    const char *name;
+    uint32_t base;
+    uint32_t size;
+    int mpc; /* MPC number, -1 for "not behind an MPC" */
+    int mrindex;
+    int flags;
+} RAMInfo;
+
+/*
+ * Flag values:
+ *  IS_ALIAS: this RAM area is an alias to the upstream end of the
+ *    MPC specified by its .mpc value
+ */
+#define IS_ALIAS 1
+
 struct MPS2TZMachineClass {
     MachineClass parent;
     MPS2TZFPGAType fpga_type;
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineClass {
     uint32_t fpgaio_num_leds; /* Number of LEDs in FPGAIO LED0 register */
     bool fpgaio_has_switches; /* Does FPGAIO have SWITCH register? */
     int numirq; /* Number of external interrupts */
+    const RAMInfo *raminfo;
     const char *armsse_type;
 };
 
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
     MachineState parent;
 
     ARMSSE iotkit;
-    MemoryRegion ssram[3];
-    MemoryRegion ssram1_m;
+    MemoryRegion ram[MPS2TZ_RAM_MAX];
     MPS2SCC scc;
     MPS2FPGAIO fpgaio;
     TZPPC ppc[5];
-    TZMPC ssram_mpc[3];
+    TZMPC mpc[3];
     PL022State spi[5];
     ArmSbconI2CState i2c[4];
     UnimplementedDeviceState i2s_audio;
@@ -XXX,XX +XXX,XX @@ static const uint32_t an505_oscclk[] = {
     25000000,
 };
 
+static const RAMInfo an505_raminfo[] = { {
+        .name = "ssram-0",
+        .base = 0x00000000,
+        .size = 0x00400000,
+        .mpc = 0,
+        .mrindex = 0,
+    }, {
+        .name = "ssram-1",
+        .base = 0x28000000,
+        .size = 0x00200000,
+        .mpc = 1,
+        .mrindex = 1,
+    }, {
+        .name = "ssram-2",
+        .base = 0x28200000,
+        .size = 0x00200000,
+        .mpc = 2,
+        .mrindex = 2,
+    }, {
+        .name = "ssram-0-alias",
+        .base = 0x00400000,
+        .size = 0x00400000,
+        .mpc = 0,
+        .mrindex = 3,
+        .flags = IS_ALIAS,
+    }, {
+        /* Use the largest bit of contiguous RAM as our "system memory" */
+        .name = "mps.ram",
+        .base = 0x80000000,
+        .size = 16 * MiB,
+        .mpc = -1,
+        .mrindex = -1,
+    }, {
+        .name = NULL,
+    },
+};
+
+static const RAMInfo *find_raminfo_for_mpc(MPS2TZMachineState *mms, int mpc)
+{
+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+    const RAMInfo *p;
+
+    for (p = mmc->raminfo; p->name; p++) {
+        if (p->mpc == mpc && !(p->flags & IS_ALIAS)) {
+            return p;
+        }
+    }
+    /* if raminfo array doesn't have an entry for each MPC this is a bug */
+    g_assert_not_reached();
+}
+
+static MemoryRegion *mr_for_raminfo(MPS2TZMachineState *mms,
+                                    const RAMInfo *raminfo)
+{
+    /* Return an initialized MemoryRegion for the RAMInfo. */
+    MemoryRegion *ram;
+
+    if (raminfo->mrindex < 0) {
+        /* Means this RAMInfo is for QEMU's "system memory" */
+        MachineState *machine = MACHINE(mms);
+        return machine->ram;
+    }
+
+    assert(raminfo->mrindex < MPS2TZ_RAM_MAX);
+    ram = &mms->ram[raminfo->mrindex];
+
+    memory_region_init_ram(ram, NULL, raminfo->name,
+                           raminfo->size, &error_fatal);
+    return ram;
+}
+
 /* Create an alias of an entire original MemoryRegion @orig
  * located at @base in the memory map.
  */
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
                               const int *irqs)
 {
     TZMPC *mpc = opaque;
-    int i = mpc - &mms->ssram_mpc[0];
-    MemoryRegion *ssram = &mms->ssram[i];
+    int i = mpc - &mms->mpc[0];
     MemoryRegion *upstream;
-    char *mpcname = g_strdup_printf("%s-mpc", name);
-    static uint32_t ramsize[] = { 0x00400000, 0x00200000, 0x00200000 };
-    static uint32_t rambase[] = { 0x00000000, 0x28000000, 0x28200000 };
+    const RAMInfo *raminfo = find_raminfo_for_mpc(mms, i);
+    MemoryRegion *ram = mr_for_raminfo(mms, raminfo);
 
-    memory_region_init_ram(ssram, NULL, name, ramsize[i], &error_fatal);
-
-    object_initialize_child(OBJECT(mms), mpcname, mpc, TYPE_TZ_MPC);
-    object_property_set_link(OBJECT(mpc), "downstream", OBJECT(ssram),
+    object_initialize_child(OBJECT(mms), name, mpc, TYPE_TZ_MPC);
+    object_property_set_link(OBJECT(mpc), "downstream", OBJECT(ram),
                              &error_fatal);
     sysbus_realize(SYS_BUS_DEVICE(mpc), &error_fatal);
     /* Map the upstream end of the MPC into system memory */
     upstream = sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 1);
-    memory_region_add_subregion(get_system_memory(), rambase[i], upstream);
+    memory_region_add_subregion(get_system_memory(), raminfo->base, upstream);
     /* and connect its interrupt to the IoTKit */
     qdev_connect_gpio_out_named(DEVICE(mpc), "irq", 0,
                                 qdev_get_gpio_in_named(DEVICE(&mms->iotkit),
                                                        "mpcexp_status", i));
 
-    /* The first SSRAM is a special case as it has an alias; accesses to
-     * the alias region at 0x00400000 must also go to the MPC upstream.
-     */
-    if (i == 0) {
-        make_ram_alias(&mms->ssram1_m, "mps.ssram1_m", upstream, 0x00400000);
-    }
-
-    g_free(mpcname);
     /* Return the register interface MR for our caller to map behind the PPC */
     return sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 0);
 }
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_i2c(MPS2TZMachineState *mms, void *opaque,
     return sysbus_mmio_get_region(s, 0);
 }
 
+static void create_non_mpc_ram(MPS2TZMachineState *mms)
+{
+    /*
+     * Handle the RAMs which are either not behind MPCs or which are
+     * aliases to another MPC.
+     */
+    const RAMInfo *p;
+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+
+    for (p = mmc->raminfo; p->name; p++) {
+        if (p->flags & IS_ALIAS) {
+            SysBusDevice *mpc_sbd = SYS_BUS_DEVICE(&mms->mpc[p->mpc]);
+            MemoryRegion *upstream = sysbus_mmio_get_region(mpc_sbd, 1);
+            make_ram_alias(&mms->ram[p->mrindex], p->name, upstream, p->base);
+        } else if (p->mpc == -1) {
+            /* RAM not behind an MPC */
+            MemoryRegion *mr = mr_for_raminfo(mms, p);
+            memory_region_add_subregion(get_system_memory(), p->base, mr);
+        }
+    }
+}
+
 static void mps2tz_common_init(MachineState *machine)
 {
     MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
     qdev_connect_gpio_out_named(iotkitdev, "sec_resp_cfg", 0,
                                 qdev_get_gpio_in(dev_splitter, 0));
 
-    /* The IoTKit sets up much of the memory layout, including
+    /*
+     * The IoTKit sets up much of the memory layout, including
      * the aliases between secure and non-secure regions in the
-     * address space. The FPGA itself contains:
-     *
-     * 0x00000000..0x003fffff  SSRAM1
-     * 0x00400000..0x007fffff  alias of SSRAM1
-     * 0x28000000..0x283fffff  4MB SSRAM2 + SSRAM3
-     * 0x40100000..0x4fffffff  AHB Master Expansion 1 interface devices
-     * 0x80000000..0x80ffffff  16MB PSRAM
-     */
-
-    /* The FPGA images have an odd combination of different RAMs,
+     * address space, and also most of the devices in the system.
+     * The FPGA itself contains various RAMs and some additional devices.
+     * The FPGA images have an odd combination of different RAMs,
      * because in hardware they are different implementations and
      * connected to different buses, giving varying performance/size
      * tradeoffs. For QEMU they're all just RAM, though. We arbitrarily
-     * call the 16MB our "system memory", as it's the largest lump.
+     * call the largest lump our "system memory".
      */
-    memory_region_add_subregion(system_memory, 0x80000000, machine->ram);
 
     /*
      * The overflow IRQs for all UARTs are ORed together.
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
     const PPCInfo an505_ppcs[] = { {
             .name = "apb_ppcexp0",
             .ports = {
-                { "ssram-0", make_mpc, &mms->ssram_mpc[0], 0x58007000, 0x1000 },
-                { "ssram-1", make_mpc, &mms->ssram_mpc[1], 0x58008000, 0x1000 },
-                { "ssram-2", make_mpc, &mms->ssram_mpc[2], 0x58009000, 0x1000 },
+                { "ssram-0-mpc", make_mpc, &mms->mpc[0], 0x58007000, 0x1000 },
+                { "ssram-1-mpc", make_mpc, &mms->mpc[1], 0x58008000, 0x1000 },
+                { "ssram-2-mpc", make_mpc, &mms->mpc[2], 0x58009000, 0x1000 },
             },
         }, {
             .name = "apb_ppcexp1",
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
 
     create_unimplemented_device("FPGA NS PC", 0x48007000, 0x1000);
 
+    create_non_mpc_ram(mms);
+
     armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename, 0x400000);
 }
 
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
     mmc->fpgaio_num_leds = 2;
     mmc->fpgaio_has_switches = false;
     mmc->numirq = 92;
+    mmc->raminfo = an505_raminfo;
     mmc->armsse_type = TYPE_IOTKIT;
 }
 
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
     mmc->fpgaio_num_leds = 2;
     mmc->fpgaio_has_switches = false;
     mmc->numirq = 92;
+    mmc->raminfo = an505_raminfo; /* AN521 is the same as AN505 here */
     mmc->armsse_type = TYPE_SSE200;
 }
 
-- 
2.20.1

Instead of hardcoding the MachineClass default_ram_size and
default_ram_id fields, set them on class creation by finding the
entry in the RAMInfo array which is marked as being the QEMU system
RAM.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-18-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void mps2tz_class_init(ObjectClass *oc, void *data)
 
     mc->init = mps2tz_common_init;
     iic->check = mps2_tz_idau_check;
-    mc->default_ram_size = 16 * MiB;
-    mc->default_ram_id = "mps.ram";
+}
+
+static void mps2tz_set_default_ram_info(MPS2TZMachineClass *mmc)
+{
+    /*
+     * Set mc->default_ram_size and default_ram_id from the
+     * information in mmc->raminfo.
+     */
+    MachineClass *mc = MACHINE_CLASS(mmc);
+    const RAMInfo *p;
+
+    for (p = mmc->raminfo; p->name; p++) {
+        if (p->mrindex < 0) {
+            /* Found the entry for "system memory" */
+            mc->default_ram_size = p->size;
+            mc->default_ram_id = p->name;
+            return;
+        }
+    }
+    g_assert_not_reached();
 }
 
 static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an505_class_init(ObjectClass *oc, void *data)
     mmc->numirq = 92;
     mmc->raminfo = an505_raminfo;
     mmc->armsse_type = TYPE_IOTKIT;
+    mps2tz_set_default_ram_info(mmc);
 }
 
 static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
     mmc->numirq = 92;
     mmc->raminfo = an505_raminfo; /* AN521 is the same as AN505 here */
     mmc->armsse_type = TYPE_SSE200;
+    mps2tz_set_default_ram_info(mmc);
 }
 
 static const TypeInfo mps2tz_info = {
-- 
2.20.1

The AN505 and AN521 don't have any read-only memory, but the AN524
does; add a flag to ROMInfo to mark a region as ROM.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-19-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ typedef struct RAMInfo {
  * Flag values:
  *  IS_ALIAS: this RAM area is an alias to the upstream end of the
  *    MPC specified by its .mpc value
+ *  IS_ROM: this RAM area is read-only
  */
 #define IS_ALIAS 1
+#define IS_ROM 2
 
 struct MPS2TZMachineClass {
     MachineClass parent;
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *mr_for_raminfo(MPS2TZMachineState *mms,
     if (raminfo->mrindex < 0) {
         /* Means this RAMInfo is for QEMU's "system memory" */
         MachineState *machine = MACHINE(mms);
+        assert(!(raminfo->flags & IS_ROM));
         return machine->ram;
     }
 
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *mr_for_raminfo(MPS2TZMachineState *mms,
 
     memory_region_init_ram(ram, NULL, raminfo->name,
                            raminfo->size, &error_fatal);
+    if (raminfo->flags & IS_ROM) {
+        memory_region_set_readonly(ram, true);
+    }
     return ram;
 }
 
-- 
2.20.1

The armv7m_load_kernel() function takes a mem_size argument which it
expects to be the size of the memory region at guest address 0.  (It
uses this argument only as a limit on how large a raw image file it
can load at address zero).

Instead of hardcoding this value, find the RAMInfo corresponding to
the 0 address and extract its size.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-20-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void create_non_mpc_ram(MPS2TZMachineState *mms)
     }
 }
 
+static uint32_t boot_ram_size(MPS2TZMachineState *mms)
+{
+    /* Return the size of the RAM block at guest address zero */
+    const RAMInfo *p;
+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+
+    for (p = mmc->raminfo; p->name; p++) {
+        if (p->base == 0) {
+            return p->size;
+        }
+    }
+    g_assert_not_reached();
+}
+
 static void mps2tz_common_init(MachineState *machine)
 {
     MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
 
     create_non_mpc_ram(mms);
 
-    armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename, 0x400000);
+    armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename,
+                       boot_ram_size(mms));
 }
 
 static void mps2_tz_idau_check(IDAUInterface *ii, uint32_t address,
-- 
2.20.1

Add support for the mps3-an524 board; this is an SSE-200 based FPGA
image, like the existing mps2-an521.  It has a usefully larger amount
of RAM, and a PL031 RTC, as well as some more minor differences.

In real hardware this image runs on a newer generation of the FPGA
board, the MPS3 rather than the older MPS2.  Architecturally the two
boards are similar, so we implement the MPS3 boards in the mps2-tz.c
file as variations of the existing MPS2 boards.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-21-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 139 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 135 insertions(+), 4 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
  * This source file covers the following FPGA images, for TrustZone cores:
  *  "mps2-an505" -- Cortex-M33 as documented in ARM Application Note AN505
  *  "mps2-an521" -- Dual Cortex-M33 as documented in Application Note AN521
+ *  "mps2-an524" -- Dual Cortex-M33 as documented in Application Note AN524
  *
  * Links to the TRM for the board itself and to the various Application
  * Notes which document the FPGA images can be found here:
@@ -XXX,XX +XXX,XX @@
  * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
  * Application Note AN521:
  * http://infocenter.arm.com/help/topic/com.arm.doc.dai0521c/index.html
+ * Application Note AN524:
+ * https://developer.arm.com/documentation/dai0524/latest/
  *
  * The AN505 defers to the Cortex-M33 processor ARMv8M IoT Kit FVP User Guide
  * (ARM ECM0601256) for the details of some of the device layout:
  *   http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
- * Similarly, the AN521 uses the SSE-200, and the SSE-200 TRM defines
+ * Similarly, the AN521 and AN524 use the SSE-200, and the SSE-200 TRM defines
  * most of the device layout:
  *  http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
  *
@@ -XXX,XX +XXX,XX @@
 #include "hw/qdev-clock.h"
 #include "qom/object.h"
 
-#define MPS2TZ_NUMIRQ_MAX 92
+#define MPS2TZ_NUMIRQ_MAX 95
 #define MPS2TZ_RAM_MAX 4
 
 typedef enum MPS2TZFPGAType {
     FPGA_AN505,
     FPGA_AN521,
+    FPGA_AN524,
 } MPS2TZFPGAType;
 
 /*
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
     TZPPC ppc[5];
     TZMPC mpc[3];
     PL022State spi[5];
-    ArmSbconI2CState i2c[4];
+    ArmSbconI2CState i2c[5];
     UnimplementedDeviceState i2s_audio;
     UnimplementedDeviceState gpio[4];
     UnimplementedDeviceState gfx;
+    UnimplementedDeviceState cldc;
+    UnimplementedDeviceState rtc;
     PL080State dma[4];
     TZMSC msc[4];
-    CMSDKAPBUART uart[5];
+    CMSDKAPBUART uart[6];
     SplitIRQ sec_resp_splitter;
     qemu_or_irq uart_irq_orgate;
     DeviceState *lan9118;
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
 #define TYPE_MPS2TZ_MACHINE "mps2tz"
 #define TYPE_MPS2TZ_AN505_MACHINE MACHINE_TYPE_NAME("mps2-an505")
 #define TYPE_MPS2TZ_AN521_MACHINE MACHINE_TYPE_NAME("mps2-an521")
+#define TYPE_MPS3TZ_AN524_MACHINE MACHINE_TYPE_NAME("mps3-an524")
 
 OBJECT_DECLARE_TYPE(MPS2TZMachineState, MPS2TZMachineClass, MPS2TZ_MACHINE)
 
@@ -XXX,XX +XXX,XX @@ static const uint32_t an505_oscclk[] = {
     25000000,
 };
 
+static const uint32_t an524_oscclk[] = {
+    24000000,
+    32000000,
+    50000000,
+    50000000,
+    24576000,
+    23750000,
+};
+
 static const RAMInfo an505_raminfo[] = { {
         .name = "ssram-0",
         .base = 0x00000000,
@@ -XXX,XX +XXX,XX @@ static const RAMInfo an505_raminfo[] = { {
     },
 };
 
+static const RAMInfo an524_raminfo[] = { {
+        .name = "bram",
+        .base = 0x00000000,
+        .size = 512 * KiB,
+        .mpc = 0,
+        .mrindex = 0,
+    }, {
+        .name = "sram",
+        .base = 0x20000000,
+        .size = 32 * 4 * KiB,
+        .mpc = 1,
+        .mrindex = 1,
+    }, {
+        /* We don't model QSPI flash yet; for now expose it as simple ROM */
+        .name = "QSPI",
+        .base = 0x28000000,
+        .size = 8 * MiB,
+        .mpc = 1,
+        .mrindex = 2,
+        .flags = IS_ROM,
+    }, {
+        .name = "DDR",
+        .base = 0x60000000,
+        .size = 2 * GiB,
+        .mpc = 2,
+        .mrindex = -1,
+    }, {
+        .name = NULL,
+    },
+};
+
 static const RAMInfo *find_raminfo_for_mpc(MPS2TZMachineState *mms, int mpc)
 {
     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
         },
     };
 
+    const PPCInfo an524_ppcs[] = { {
+            .name = "apb_ppcexp0",
+            .ports = {
+                { "bram-mpc", make_mpc, &mms->mpc[0], 0x58007000, 0x1000 },
+                { "qspi-mpc", make_mpc, &mms->mpc[1], 0x58008000, 0x1000 },
+                { "ddr-mpc", make_mpc, &mms->mpc[2], 0x58009000, 0x1000 },
+            },
+        }, {
+            .name = "apb_ppcexp1",
+            .ports = {
+                { "i2c0", make_i2c, &mms->i2c[0], 0x41200000, 0x1000 },
+                { "i2c1", make_i2c, &mms->i2c[1], 0x41201000, 0x1000 },
+                { "spi0", make_spi, &mms->spi[0], 0x41202000, 0x1000, { 52 } },
+                { "spi1", make_spi, &mms->spi[1], 0x41203000, 0x1000, { 53 } },
+                { "spi2", make_spi, &mms->spi[2], 0x41204000, 0x1000, { 54 } },
+                { "i2c2", make_i2c, &mms->i2c[2], 0x41205000, 0x1000 },
+                { "i2c3", make_i2c, &mms->i2c[3], 0x41206000, 0x1000 },
+                { /* port 7 reserved */ },
+                { "i2c4", make_i2c, &mms->i2c[4], 0x41208000, 0x1000 },
+            },
+        }, {
+            .name = "apb_ppcexp2",
+            .ports = {
+                { "scc", make_scc, &mms->scc, 0x41300000, 0x1000 },
+                { "i2s-audio", make_unimp_dev, &mms->i2s_audio,
+                  0x41301000, 0x1000 },
+                { "fpgaio", make_fpgaio, &mms->fpgaio, 0x41302000, 0x1000 },
+                { "uart0", make_uart, &mms->uart[0], 0x41303000, 0x1000, { 32, 33, 42 } },
+                { "uart1", make_uart, &mms->uart[1], 0x41304000, 0x1000, { 34, 35, 43 } },
+                { "uart2", make_uart, &mms->uart[2], 0x41305000, 0x1000, { 36, 37, 44 } },
+                { "uart3", make_uart, &mms->uart[3], 0x41306000, 0x1000, { 38, 39, 45 } },
+                { "uart4", make_uart, &mms->uart[4], 0x41307000, 0x1000, { 40, 41, 46 } },
+                { "uart5", make_uart, &mms->uart[5], 0x41308000, 0x1000, { 124, 125, 126 } },
+
+                { /* port 9 reserved */ },
+                { "clcd", make_unimp_dev, &mms->cldc, 0x4130a000, 0x1000 },
+                { "rtc", make_unimp_dev, &mms->rtc, 0x4130b000, 0x1000 },
+            },
+        }, {
+            .name = "ahb_ppcexp0",
+            .ports = {
+                { "gpio0", make_unimp_dev, &mms->gpio[0], 0x41100000, 0x1000 },
+                { "gpio1", make_unimp_dev, &mms->gpio[1], 0x41101000, 0x1000 },
+                { "gpio2", make_unimp_dev, &mms->gpio[2], 0x41102000, 0x1000 },
+                { "gpio3", make_unimp_dev, &mms->gpio[3], 0x41103000, 0x1000 },
+                { "eth", make_eth_dev, NULL, 0x41400000, 0x100000, { 48 } },
+            },
+        },
+    };
+
     switch (mmc->fpga_type) {
     case FPGA_AN505:
     case FPGA_AN521:
         ppcs = an505_ppcs;
         num_ppcs = ARRAY_SIZE(an505_ppcs);
         break;
+    case FPGA_AN524:
+        ppcs = an524_ppcs;
+        num_ppcs = ARRAY_SIZE(an524_ppcs);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -XXX,XX +XXX,XX @@ static void mps2tz_an521_class_init(ObjectClass *oc, void *data)
     mps2tz_set_default_ram_info(mmc);
 }
 
+static void mps3tz_an524_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_CLASS(oc);
+
+    mc->desc = "ARM MPS3 with AN524 FPGA image for dual Cortex-M33";
+    mc->default_cpus = 2;
+    mc->min_cpus = mc->default_cpus;
+    mc->max_cpus = mc->default_cpus;
+    mmc->fpga_type = FPGA_AN524;
+    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
+    mmc->scc_id = 0x41045240;
+    mmc->sysclk_frq = 32 * 1000 * 1000; /* 32MHz */
+    mmc->oscclk = an524_oscclk;
+    mmc->len_oscclk = ARRAY_SIZE(an524_oscclk);
+    mmc->fpgaio_num_leds = 10;
+    mmc->fpgaio_has_switches = true;
+    mmc->numirq = 95;
+    mmc->raminfo = an524_raminfo;
+    mmc->armsse_type = TYPE_SSE200;
+    mps2tz_set_default_ram_info(mmc);
+}
+
 static const TypeInfo mps2tz_info = {
     .name = TYPE_MPS2TZ_MACHINE,
     .parent = TYPE_MACHINE,
@@ -XXX,XX +XXX,XX @@ static const TypeInfo mps2tz_an521_info = {
     .class_init = mps2tz_an521_class_init,
 };
 
+static const TypeInfo mps3tz_an524_info = {
+    .name = TYPE_MPS3TZ_AN524_MACHINE,
+    .parent = TYPE_MPS2TZ_MACHINE,
+    .class_init = mps3tz_an524_class_init,
+};
+
 static void mps2tz_machine_init(void)
 {
     type_register_static(&mps2tz_info);
     type_register_static(&mps2tz_an505_info);
     type_register_static(&mps2tz_an521_info);
+    type_register_static(&mps3tz_an524_info);
 }
 
 type_init(mps2tz_machine_init);
-- 
2.20.1

The AN524 has a USB controller (an ISP1763); we don't have a model of
it but we should provide a stub "unimplemented-device" for it.  This
is slightly complicated because the USB controller shares a PPC port
with the ethernet controller.

Implement a make_* function which provides creates a container
MemoryRegion with both the ethernet controller and an
unimplemented-device stub for the USB controller.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-22-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
 
     ARMSSE iotkit;
     MemoryRegion ram[MPS2TZ_RAM_MAX];
+    MemoryRegion eth_usb_container;
+
     MPS2SCC scc;
     MPS2FPGAIO fpgaio;
     TZPPC ppc[5];
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
     UnimplementedDeviceState gfx;
     UnimplementedDeviceState cldc;
     UnimplementedDeviceState rtc;
+    UnimplementedDeviceState usb;
     PL080State dma[4];
     TZMSC msc[4];
     CMSDKAPBUART uart[6];
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
     return sysbus_mmio_get_region(s, 0);
 }
 
+static MemoryRegion *make_eth_usb(MPS2TZMachineState *mms, void *opaque,
+                                  const char *name, hwaddr size,
+                                  const int *irqs)
+{
+    /*
+     * The AN524 makes the ethernet and USB share a PPC port.
+     * irqs[] is the ethernet IRQ.
+     */
+    SysBusDevice *s;
+    NICInfo *nd = &nd_table[0];
+
+    memory_region_init(&mms->eth_usb_container, OBJECT(mms),
+                       "mps2-tz-eth-usb-container", 0x200000);
+
+    /*
+     * In hardware this is a LAN9220; the LAN9118 is software compatible
+     * except that it doesn't support the checksum-offload feature.
+     */
+    qemu_check_nic_model(nd, "lan9118");
+    mms->lan9118 = qdev_new(TYPE_LAN9118);
+    qdev_set_nic_properties(mms->lan9118, nd);
+
+    s = SYS_BUS_DEVICE(mms->lan9118);
+    sysbus_realize_and_unref(s, &error_fatal);
+    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
+
+    memory_region_add_subregion(&mms->eth_usb_container,
+                                0, sysbus_mmio_get_region(s, 0));
+
+    /* The USB OTG controller is an ISP1763; we don't have a model of it. */
+    object_initialize_child(OBJECT(mms), "usb-otg",
+                            &mms->usb, TYPE_UNIMPLEMENTED_DEVICE);
+    qdev_prop_set_string(DEVICE(&mms->usb), "name", "usb-otg");
+    qdev_prop_set_uint64(DEVICE(&mms->usb), "size", 0x100000);
+    s = SYS_BUS_DEVICE(&mms->usb);
+    sysbus_realize(s, &error_fatal);
+
+    memory_region_add_subregion(&mms->eth_usb_container,
+                                0x100000, sysbus_mmio_get_region(s, 0));
+
+    return &mms->eth_usb_container;
+}
+
 static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
                               const char *name, hwaddr size,
                               const int *irqs)
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
                 { "gpio1", make_unimp_dev, &mms->gpio[1], 0x41101000, 0x1000 },
                 { "gpio2", make_unimp_dev, &mms->gpio[2], 0x41102000, 0x1000 },
                 { "gpio3", make_unimp_dev, &mms->gpio[3], 0x41103000, 0x1000 },
-                { "eth", make_eth_dev, NULL, 0x41400000, 0x100000, { 48 } },
+                { "eth-usb", make_eth_usb, NULL, 0x41400000, 0x200000, { 48 } },
             },
         },
     };
-- 
2.20.1

The AN524 has a PL031 RTC, which we have a model of; provide it
rather than an unimplemented-device stub.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-23-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/misc/tz-msc.h"
 #include "hw/arm/armsse.h"
 #include "hw/dma/pl080.h"
+#include "hw/rtc/pl031.h"
 #include "hw/ssi/pl022.h"
 #include "hw/i2c/arm_sbcon_i2c.h"
 #include "hw/net/lan9118.h"
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
     UnimplementedDeviceState gpio[4];
     UnimplementedDeviceState gfx;
     UnimplementedDeviceState cldc;
-    UnimplementedDeviceState rtc;
     UnimplementedDeviceState usb;
+    PL031State rtc;
     PL080State dma[4];
     TZMSC msc[4];
     CMSDKAPBUART uart[6];
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_i2c(MPS2TZMachineState *mms, void *opaque,
     return sysbus_mmio_get_region(s, 0);
 }
 
+static MemoryRegion *make_rtc(MPS2TZMachineState *mms, void *opaque,
+                              const char *name, hwaddr size,
+                              const int *irqs)
+{
+    PL031State *pl031 = opaque;
+    SysBusDevice *s;
+
+    object_initialize_child(OBJECT(mms), name, pl031, TYPE_PL031);
+    s = SYS_BUS_DEVICE(pl031);
+    sysbus_realize(s, &error_fatal);
+    /*
+     * The board docs don't give an IRQ number for the PL031, so
+     * presumably it is not connected.
+     */
+    return sysbus_mmio_get_region(s, 0);
+}
+
 static void create_non_mpc_ram(MPS2TZMachineState *mms)
 {
     /*
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
 
                 { /* port 9 reserved */ },
                 { "clcd", make_unimp_dev, &mms->cldc, 0x4130a000, 0x1000 },
-                { "rtc", make_unimp_dev, &mms->rtc, 0x4130b000, 0x1000 },
+                { "rtc", make_rtc, &mms->rtc, 0x4130b000, 0x1000 },
             },
         }, {
             .name = "ahb_ppcexp0",
-- 
2.20.1

Add brief documentation of the new mps3-an524 board.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-24-peter.maydell@linaro.org
---
 docs/system/arm/mps2.rst | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/docs/system/arm/mps2.rst b/docs/system/arm/mps2.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/mps2.rst
+++ b/docs/system/arm/mps2.rst
@@ -XXX,XX +XXX,XX @@
-Arm MPS2 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``)
-================================================================================================================
+Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``)
+=========================================================================================================================================
 
 These board models all use Arm M-profile CPUs.
 
-The Arm MPS2 and MPS2+ dev boards are FPGA based (the 2+ has a bigger
-FPGA but is otherwise the same as the 2). Since the CPU itself
-and most of the devices are in the FPGA, the details of the board
-as seen by the guest depend significantly on the FPGA image.
+The Arm MPS2, MPS2+ and MPS3 dev boards are FPGA based (the 2+ has a
+bigger FPGA but is otherwise the same as the 2; the 3 has a bigger
+FPGA again, can handle 4GB of RAM and has a USB controller and QSPI flash).
+
+Since the CPU itself and most of the devices are in the FPGA, the
+details of the board as seen by the guest depend significantly on the
+FPGA image.
 
 QEMU models the following FPGA images:
 
@@ -XXX,XX +XXX,XX @@ QEMU models the following FPGA images:
   Cortex-M3 'DesignStart' as documented in Arm Application Note AN511
 ``mps2-an521``
   Dual Cortex-M33 as documented in Arm Application Note AN521
+``mps3-an524``
+  Dual Cortex-M33 on an MPS3, as documented in Arm Application Note AN524
 
 Differences between QEMU and real hardware:
 
 - AN385/AN386 remapping of low 16K of memory to either ZBT SSRAM1 or to
   block RAM is unimplemented (QEMU always maps this to ZBT SSRAM1, as
   if zbt_boot_ctrl is always zero)
+- AN524 remapping of low memory to either BRAM or to QSPI flash is
+  unimplemented (QEMU always maps this to BRAM, ignoring the
+  SCC CFG_REG0 memory-remap bit)
 - QEMU provides a LAN9118 ethernet rather than LAN9220; the only guest
   visible difference is that the LAN9118 doesn't support checksum
   offloading
+- QEMU does not model the QSPI flash in MPS3 boards as real QSPI
+  flash, but only as simple ROM, so attempting to rewrite the flash
+  from the guest will fail
+- QEMU does not model the USB controller in MPS3 boards
-- 
2.20.1

Update old infocenter.arm.com URLs to the equivalent developer.arm.com
ones (the old URLs should redirect, but we might as well avoid the
redirection notice, and the new URLs are pleasantly shorter).

This commit covers the links to the MPS2 board TRM, the various
Application Notes, the IoTKit and SSE-200 documents.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210215115138.20465-25-peter.maydell@linaro.org
---
 include/hw/arm/armsse.h          |  4 ++--
 include/hw/misc/armsse-cpuid.h   |  2 +-
 include/hw/misc/armsse-mhu.h     |  2 +-
 include/hw/misc/iotkit-secctl.h  |  2 +-
 include/hw/misc/iotkit-sysctl.h  |  2 +-
 include/hw/misc/iotkit-sysinfo.h |  2 +-
 include/hw/misc/mps2-fpgaio.h    |  2 +-
 hw/arm/mps2-tz.c                 | 11 +++++------
 hw/misc/armsse-cpuid.c           |  2 +-
 hw/misc/armsse-mhu.c             |  2 +-
 hw/misc/iotkit-sysctl.c          |  2 +-
 hw/misc/iotkit-sysinfo.c         |  2 +-
 hw/misc/mps2-fpgaio.c            |  2 +-
 hw/misc/mps2-scc.c               |  2 +-
 14 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/include/hw/arm/armsse.h b/include/hw/arm/armsse.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/armsse.h
+++ b/include/hw/arm/armsse.h
@@ -XXX,XX +XXX,XX @@
  * hardware, which include the IoT Kit and the SSE-050, SSE-100 and
  * SSE-200. Currently we model:
  *  - the Arm IoT Kit which is documented in
- * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+ *    https://developer.arm.com/documentation/ecm0601256/latest
  *  - the SSE-200 which is documented in
- * http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
+ *    https://developer.arm.com/documentation/101104/latest/
  *
  * The IoTKit contains:
  *  a Cortex-M33
diff --git a/include/hw/misc/armsse-cpuid.h b/include/hw/misc/armsse-cpuid.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/armsse-cpuid.h
+++ b/include/hw/misc/armsse-cpuid.h
@@ -XXX,XX +XXX,XX @@
 /*
  * This is a model of the "CPU_IDENTITY" register block which is part of the
  * Arm SSE-200 and documented in
- * http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
+ * https://developer.arm.com/documentation/101104/latest/
  *
  * QEMU interface:
  *  + QOM property "CPUID": the value to use for the CPUID register
diff --git a/include/hw/misc/armsse-mhu.h b/include/hw/misc/armsse-mhu.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/armsse-mhu.h
+++ b/include/hw/misc/armsse-mhu.h
@@ -XXX,XX +XXX,XX @@
 /*
  * This is a model of the Message Handling Unit (MHU) which is part of the
  * Arm SSE-200 and documented in
- * http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
+ * https://developer.arm.com/documentation/101104/latest/
  *
  * QEMU interface:
  *  + sysbus MMIO region 0: the system information register bank
diff --git a/include/hw/misc/iotkit-secctl.h b/include/hw/misc/iotkit-secctl.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/iotkit-secctl.h
+++ b/include/hw/misc/iotkit-secctl.h
@@ -XXX,XX +XXX,XX @@
 
 /* This is a model of the security controller which is part of the
  * Arm IoT Kit and documented in
- * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+ * https://developer.arm.com/documentation/ecm0601256/latest
  *
  * QEMU interface:
  *  + sysbus MMIO region 0 is the "secure privilege control block" registers
diff --git a/include/hw/misc/iotkit-sysctl.h b/include/hw/misc/iotkit-sysctl.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/iotkit-sysctl.h
+++ b/include/hw/misc/iotkit-sysctl.h
@@ -XXX,XX +XXX,XX @@
 /*
  * This is a model of the "system control element" which is part of the
  * Arm IoTKit and documented in
- * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+ * https://developer.arm.com/documentation/ecm0601256/latest
  * Specifically, it implements the "system information block" and
  * "system control register" blocks.
  *
diff --git a/include/hw/misc/iotkit-sysinfo.h b/include/hw/misc/iotkit-sysinfo.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/iotkit-sysinfo.h
+++ b/include/hw/misc/iotkit-sysinfo.h
@@ -XXX,XX +XXX,XX @@
 /*
  * This is a model of the "system information block" which is part of the
  * Arm IoTKit and documented in
- * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+ * https://developer.arm.com/documentation/ecm0601256/latest
  * QEMU interface:
  *  + QOM property "SYS_VERSION": value to use for SYS_VERSION register
  *  + QOM property "SYS_CONFIG": value to use for SYS_CONFIG register
diff --git a/include/hw/misc/mps2-fpgaio.h b/include/hw/misc/mps2-fpgaio.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/mps2-fpgaio.h
+++ b/include/hw/misc/mps2-fpgaio.h
@@ -XXX,XX +XXX,XX @@
 /* This is a model of the FPGAIO register block in the AN505
  * FPGA image for the MPS2 dev board; it is documented in the
  * application note:
- * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
+ * https://developer.arm.com/documentation/dai0505/latest/
  *
  * QEMU interface:
  *  + sysbus MMIO region 0: the register bank
diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
  * https://developer.arm.com/products/system-design/development-boards/fpga-prototyping-boards/mps2
  *
  * Board TRM:
- * http://infocenter.arm.com/help/topic/com.arm.doc.100112_0200_06_en/versatile_express_cortex_m_prototyping_systems_v2m_mps2_and_v2m_mps2plus_technical_reference_100112_0200_06_en.pdf
+ * https://developer.arm.com/documentation/100112/latest/
  * Application Note AN505:
- * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
+ * https://developer.arm.com/documentation/dai0505/latest/
  * Application Note AN521:
- * http://infocenter.arm.com/help/topic/com.arm.doc.dai0521c/index.html
+ * https://developer.arm.com/documentation/dai0521/latest/
  * Application Note AN524:
  * https://developer.arm.com/documentation/dai0524/latest/
  *
  * The AN505 defers to the Cortex-M33 processor ARMv8M IoT Kit FVP User Guide
  * (ARM ECM0601256) for the details of some of the device layout:
- *   http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+ *  https://developer.arm.com/documentation/ecm0601256/latest
  * Similarly, the AN521 and AN524 use the SSE-200, and the SSE-200 TRM defines
  * most of the device layout:
- *  http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
- *
+ *  https://developer.arm.com/documentation/101104/latest/
  */
 
 #include "qemu/osdep.h"
diff --git a/hw/misc/armsse-cpuid.c b/hw/misc/armsse-cpuid.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/armsse-cpuid.c
+++ b/hw/misc/armsse-cpuid.c
@@ -XXX,XX +XXX,XX @@
 /*
  * This is a model of the "CPU_IDENTITY" register block which is part of the
  * Arm SSE-200 and documented in
- * http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
+ * https://developer.arm.com/documentation/101104/latest/
  *
  * It consists of one read-only CPUID register (set by QOM property), plus the
  * usual ID registers.
diff --git a/hw/misc/armsse-mhu.c b/hw/misc/armsse-mhu.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/armsse-mhu.c
+++ b/hw/misc/armsse-mhu.c
@@ -XXX,XX +XXX,XX @@
 /*
  * This is a model of the Message Handling Unit (MHU) which is part of the
  * Arm SSE-200 and documented in
- * http://infocenter.arm.com/help/topic/com.arm.doc.101104_0100_00_en/corelink_sse200_subsystem_for_embedded_technical_reference_manual_101104_0100_00_en.pdf
+ * https://developer.arm.com/documentation/101104/latest/
  */
 
 #include "qemu/osdep.h"
diff --git a/hw/misc/iotkit-sysctl.c b/hw/misc/iotkit-sysctl.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/iotkit-sysctl.c
+++ b/hw/misc/iotkit-sysctl.c
@@ -XXX,XX +XXX,XX @@
 /*
  * This is a model of the "system control element" which is part of the
  * Arm IoTKit and documented in
- * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+ * https://developer.arm.com/documentation/ecm0601256/latest
  * Specifically, it implements the "system control register" blocks.
  */
 
diff --git a/hw/misc/iotkit-sysinfo.c b/hw/misc/iotkit-sysinfo.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/iotkit-sysinfo.c
+++ b/hw/misc/iotkit-sysinfo.c
@@ -XXX,XX +XXX,XX @@
 /*
  * This is a model of the "system information block" which is part of the
  * Arm IoTKit and documented in
- * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ecm0601256/index.html
+ * https://developer.arm.com/documentation/ecm0601256/latest
  * It consists of 2 read-only version/config registers, plus the
  * usual ID registers.
  */
diff --git a/hw/misc/mps2-fpgaio.c b/hw/misc/mps2-fpgaio.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/mps2-fpgaio.c
+++ b/hw/misc/mps2-fpgaio.c
@@ -XXX,XX +XXX,XX @@
 /* This is a model of the "FPGA system control and I/O" block found
  * in the AN505 FPGA image for the MPS2 devboard.
  * It is documented in AN505:
- * http://infocenter.arm.com/help/topic/com.arm.doc.dai0505b/index.html
+ * https://developer.arm.com/documentation/dai0505/latest/
  */
 
 #include "qemu/osdep.h"
diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/mps2-scc.c
+++ b/hw/misc/mps2-scc.c
@@ -XXX,XX +XXX,XX @@
  * found in the FPGA images of MPS2 development boards.
  *
  * Documentation of it can be found in the MPS2 TRM:
- * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100112_0100_03_en/index.html
+ * https://developer.arm.com/documentation/100112/latest/
  * and also in the Application Notes documenting individual FPGA images.
  */
 
-- 
2.20.1

The following changes since commit 53f306f316549d20c76886903181413d20842423:

Merge remote-tracking branch 'remotes/ehabkost-gl/tags/x86-next-pull-request' into staging (2021-06-21 11:26:04 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210621

for you to fetch changes up to a83f1d9263d281f938a3984cda7104d55affd43a:

docs/system: arm: Add nRF boards description (2021-06-21 17:24:33 +0100)

----------------------------------------------------------------
target-arm queue:
 * Don't require 'virt' board to be compiled in for ACPI GHES code
 * docs: Document which architecture extensions we emulate
 * Fix bugs in M-profile FPCXT_NS accesses
 * First slice of MVE patches
 * Implement MTE3
 * docs/system: arm: Add nRF boards description

----------------------------------------------------------------
Alexandre Iooss (1):
      docs/system: arm: Add nRF boards description

Peter Collingbourne (1):
      target/arm: Implement MTE3

Peter Maydell (55):
      hw/acpi: Provide stub version of acpi_ghes_record_errors()
      hw/acpi: Provide function acpi_ghes_present()
      target/arm: Use acpi_ghes_present() to see if we report ACPI memory errors
      docs/system/arm: Document which architecture extensions we emulate
      target/arm/translate-vfp.c: Whitespace fixes
      target/arm: Handle FPU being disabled in FPCXT_NS accesses
      target/arm: Don't NOCP fault for FPCXT_NS accesses
      target/arm: Handle writeback in VLDR/VSTR sysreg with no memory access
      target/arm: Factor FP context update code out into helper function
      target/arm: Split vfp_access_check() into A and M versions
      target/arm: Handle FPU check for FPCXT_NS insns via vfp_access_check_m()
      target/arm: Implement MVE VLDR/VSTR (non-widening forms)
      target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
      target/arm: Implement MVE VCLZ
      target/arm: Implement MVE VCLS
      target/arm: Implement MVE VREV16, VREV32, VREV64
      target/arm: Implement MVE VMVN (register)
      target/arm: Implement MVE VABS
      target/arm: Implement MVE VNEG
      tcg: Make gen_dup_i32/i64() public as tcg_gen_dup_i32/i64
      target/arm: Implement MVE VDUP
      target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
      target/arm: Implement MVE VADD, VSUB, VMUL
      target/arm: Implement MVE VMULH
      target/arm: Implement MVE VRMULH
      target/arm: Implement MVE VMAX, VMIN
      target/arm: Implement MVE VABD
      target/arm: Implement MVE VHADD, VHSUB
      target/arm: Implement MVE VMULL
      target/arm: Implement MVE VMLALDAV
      target/arm: Implement MVE VMLSLDAV
      target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
      target/arm: Implement MVE VADD (scalar)
      target/arm: Implement MVE VSUB, VMUL (scalar)
      target/arm: Implement MVE VHADD, VHSUB (scalar)
      target/arm: Implement MVE VBRSR
      target/arm: Implement MVE VPST
      target/arm: Implement MVE VQADD and VQSUB
      target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
      target/arm: Implement MVE VQDMULL scalar
      target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
      target/arm: Implement MVE VQADD, VQSUB (vector)
      target/arm: Implement MVE VQSHL (vector)
      target/arm: Implement MVE VQRSHL
      target/arm: Implement MVE VSHL insn
      target/arm: Implement MVE VRSHL
      target/arm: Implement MVE VQDMLADH and VQRDMLADH
      target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
      target/arm: Implement MVE VQDMULL (vector)
      target/arm: Implement MVE VRHADD
      target/arm: Implement MVE VADC, VSBC
      target/arm: Implement MVE VCADD
      target/arm: Implement MVE VHCADD
      target/arm: Implement MVE VADDV
      target/arm: Make VMOV scalar <-> gpreg beatwise for MVE

docs/system/arm/emulation.rst |  103 ++++
 docs/system/arm/nrf.rst       |   51 ++
 docs/system/target-arm.rst    |    7 +
 include/hw/acpi/ghes.h        |    9 +
 include/tcg/tcg-op.h          |    8 +
 include/tcg/tcg.h             |    1 -
 target/arm/helper-mve.h       |  357 +++++++++++++
 target/arm/helper.h           |    2 +
 target/arm/internals.h        |   11 +
 target/arm/translate-a32.h    |    3 +
 target/arm/translate.h        |   10 +
 target/arm/m-nocp.decode      |   24 +
 target/arm/mve.decode         |  240 +++++++++
 target/arm/vfp.decode         |   14 -
 hw/acpi/ghes-stub.c           |   22 +
 hw/acpi/ghes.c                |   17 +
 target/arm/cpu64.c            |    2 +-
 target/arm/kvm64.c            |    6 +-
 target/arm/mte_helper.c       |   82 +--
 target/arm/mve_helper.c       | 1160 +++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-m-nocp.c |  550 +++++++++++++++++++
 target/arm/translate-mve.c    |  759 +++++++++++++++++++++++++++
 target/arm/translate-vfp.c    |  741 +++++++-------------------
 tcg/tcg-op-gvec.c             |   20 +-
 MAINTAINERS                   |    1 +
 hw/acpi/meson.build           |    6 +-
 target/arm/meson.build        |    1 +
 27 files changed, 3578 insertions(+), 629 deletions(-)
 create mode 100644 docs/system/arm/emulation.rst
 create mode 100644 docs/system/arm/nrf.rst
 create mode 100644 target/arm/helper-mve.h
 create mode 100644 hw/acpi/ghes-stub.c
 create mode 100644 target/arm/mve_helper.c

Generic code in target/arm wants to call acpi_ghes_record_errors();
provide a stub version so that we don't fail to link when
CONFIG_ACPI_APEI is not set. This requires us to add a new
ghes-stub.c file to contain it and the meson.build mechanics
to use it when appropriate.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
Message-id: 20210603171259.27962-2-peter.maydell@linaro.org
---
 hw/acpi/ghes-stub.c | 17 +++++++++++++++++
 hw/acpi/meson.build |  6 +++---
 2 files changed, 20 insertions(+), 3 deletions(-)
 create mode 100644 hw/acpi/ghes-stub.c

diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/acpi/ghes-stub.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Support for generating APEI tables and recording CPER for Guests:
+ * stub functions.
+ *
+ * Copyright (c) 2021 Linaro, Ltd
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/acpi/ghes.h"
+
+int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+{
+    return -1;
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -XXX,XX +XXX,XX @@ acpi_ss.add(when: 'CONFIG_ACPI_PCI', if_true: files('pci.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_VMGENID', if_true: files('vmgenid.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_HW_REDUCED', if_true: files('generic_event_device.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_HMAT', if_true: files('hmat.c'))
-acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'))
+acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'), if_false: files('ghes-stub.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_X86', if_true: files('core.c', 'piix4.c', 'pcihp.c'), if_false: files('acpi-stub.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_X86_ICH', if_true: files('ich9.c', 'tco.c'))
 acpi_ss.add(when: 'CONFIG_IPMI', if_true: files('ipmi.c'), if_false: files('ipmi-stub.c'))
 acpi_ss.add(when: 'CONFIG_PC', if_false: files('acpi-x86-stub.c'))
 acpi_ss.add(when: 'CONFIG_TPM', if_true: files('tpm.c'))
-softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c'))
+softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c'))
 softmmu_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
 softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('acpi-stub.c', 'aml-build-stub.c',
-                                                  'acpi-x86-stub.c', 'ipmi-stub.c'))
+                                                  'acpi-x86-stub.c', 'ipmi-stub.c', 'ghes-stub.c'))
-- 
2.20.1

Allow code elsewhere in the system to check whether the ACPI GHES
table is present, so it can determine whether it is OK to try to
record an error by calling acpi_ghes_record_errors().

(We don't need to migrate the new 'present' field in AcpiGhesState,
because it is set once at system initialization and doesn't change.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
Message-id: 20210603171259.27962-3-peter.maydell@linaro.org
---
 include/hw/acpi/ghes.h |  9 +++++++++
 hw/acpi/ghes-stub.c    |  5 +++++
 hw/acpi/ghes.c         | 17 +++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -XXX,XX +XXX,XX @@ enum {
 
 typedef struct AcpiGhesState {
     uint64_t ghes_addr_le;
+    bool present; /* True if GHES is present at all on this board */
 } AcpiGhesState;
 
 void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
@@ -XXX,XX +XXX,XX @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
 void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
 int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
+
+/**
+ * acpi_ghes_present: Report whether ACPI GHES table is present
+ *
+ * Returns: true if the system has an ACPI GHES table and it is
+ * safe to call acpi_ghes_record_errors() to record a memory error.
+ */
+bool acpi_ghes_present(void);
 #endif
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
 {
     return -1;
 }
+
+bool acpi_ghes_present(void)
+{
+    return false;
+}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -XXX,XX +XXX,XX @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
     /* Create a read-write fw_cfg file for Address */
     fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
         NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
+
+    ags->present = true;
 }
 
 int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
 
     return ret;
 }
+
+bool acpi_ghes_present(void)
+{
+    AcpiGedState *acpi_ged_state;
+    AcpiGhesState *ags;
+
+    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
+                                                       NULL));
+
+    if (!acpi_ged_state) {
+        return false;
+    }
+    ags = &acpi_ged_state->ghes_state;
+    return ags->present;
+}
-- 
2.20.1

The virt_is_acpi_enabled() function is specific to the virt board, as
is the check for its 'ras' property.  Use the new acpi_ghes_present()
function to check whether we should report memory errors via
acpi_ghes_record_errors().

This avoids a link error if QEMU was built without support for the
virt board, and provides a mechanism that can be used by any future
board models that want to add ACPI memory error reporting support
(they only need to call acpi_ghes_add_fw_cfg()).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
Message-id: 20210603171259.27962-4-peter.maydell@linaro.org
---
 target/arm/kvm64.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
 {
     ram_addr_t ram_addr;
     hwaddr paddr;
-    Object *obj = qdev_get_machine();
-    VirtMachineState *vms = VIRT_MACHINE(obj);
-    bool acpi_enabled = virt_is_acpi_enabled(vms);
 
     assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
 
-    if (acpi_enabled && addr &&
-            object_property_get_bool(obj, "ras", NULL)) {
+    if (acpi_ghes_present() && addr) {
         ram_addr = qemu_ram_addr_from_host(addr);
         if (ram_addr != RAM_ADDR_INVALID &&
             kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
-- 
2.20.1

These days the Arm architecture has a wide range of fine-grained
optional extra architectural features. We implement quite a lot
of these but by no means all of them. Document what we do implement,
so that users can find out without having to dig through back-issues
of our Changelog on the wiki.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210617140328.28622-1-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 docs/system/arm/emulation.rst | 102 ++++++++++++++++++++++++++++++++++
 docs/system/target-arm.rst    |   6 ++
 2 files changed, 108 insertions(+)
 create mode 100644 docs/system/arm/emulation.rst

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@
+A-profile CPU architecture support
+==================================
+
+QEMU's TCG emulation includes support for the Armv5, Armv6, Armv7 and
+Armv8 versions of the A-profile architecture. It also has support for
+the following architecture extensions:
+
+- FEAT_AA32BF16 (AArch32 BFloat16 instructions)
+- FEAT_AA32HPD (AArch32 hierarchical permission disables)
+- FEAT_AA32I8MM (AArch32 Int8 matrix multiplication instructions)
+- FEAT_AES (AESD and AESE instructions)
+- FEAT_BF16 (AArch64 BFloat16 instructions)
+- FEAT_BTI (Branch Target Identification)
+- FEAT_DIT (Data Independent Timing instructions)
+- FEAT_DPB (DC CVAP instruction)
+- FEAT_DotProd (Advanced SIMD dot product instructions)
+- FEAT_FCMA (Floating-point complex number instructions)
+- FEAT_FHM (Floating-point half-precision multiplication instructions)
+- FEAT_FP16 (Half-precision floating-point data processing)
+- FEAT_FRINTTS (Floating-point to integer instructions)
+- FEAT_FlagM (Flag manipulation instructions v2)
+- FEAT_FlagM2 (Enhancements to flag manipulation instructions)
+- FEAT_HPDS (Hierarchical permission disables)
+- FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
+- FEAT_JSCVT (JavaScript conversion instructions)
+- FEAT_LOR (Limited ordering regions)
+- FEAT_LRCPC (Load-acquire RCpc instructions)
+- FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
+- FEAT_LSE (Large System Extensions)
+- FEAT_MTE (Memory Tagging Extension)
+- FEAT_MTE2 (Memory Tagging Extension)
+- FEAT_PAN (Privileged access never)
+- FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN)
+- FEAT_PAuth (Pointer authentication)
+- FEAT_PMULL (PMULL, PMULL2 instructions)
+- FEAT_PMUv3p1 (PMU Extensions v3.1)
+- FEAT_PMUv3p4 (PMU Extensions v3.4)
+- FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
+- FEAT_RNG (Random number generator)
+- FEAT_SB (Speculation Barrier)
+- FEAT_SEL2 (Secure EL2)
+- FEAT_SHA1 (SHA1 instructions)
+- FEAT_SHA256 (SHA256 instructions)
+- FEAT_SHA3 (Advanced SIMD SHA3 instructions)
+- FEAT_SHA512 (Advanced SIMD SHA512 instructions)
+- FEAT_SM3 (Advanced SIMD SM3 instructions)
+- FEAT_SM4 (Advanced SIMD SM4 instructions)
+- FEAT_SPECRES (Speculation restriction instructions)
+- FEAT_SSBS (Speculative Store Bypass Safe)
+- FEAT_TLBIOS (TLB invalidate instructions in Outer Shareable domain)
+- FEAT_TLBIRANGE (TLB invalidate range instructions)
+- FEAT_TTCNP (Translation table Common not private translations)
+- FEAT_TTST (Small translation tables)
+- FEAT_UAO (Unprivileged Access Override control)
+- FEAT_VHE (Virtualization Host Extensions)
+- FEAT_VMID16 (16-bit VMID)
+- FEAT_XNX (Translation table stage 2 Unprivileged Execute-never)
+- SVE (The Scalable Vector Extension)
+- SVE2 (The Scalable Vector Extension v2)
+
+For information on the specifics of these extensions, please refer
+to the `Armv8-A Arm Architecture Reference Manual
+<https://developer.arm.com/documentation/ddi0487/latest>`_.
+
+When a specific named CPU is being emulated, only those features which
+are present in hardware for that CPU are emulated. (If a feature is
+not in the list above then it is not supported, even if the real
+hardware should have it.) The ``max`` CPU enables all features.
+
+R-profile CPU architecture support
+==================================
+
+QEMU's TCG emulation support for R-profile CPUs is currently limited.
+We emulate only the Cortex-R5 and Cortex-R5F CPUs.
+
+M-profile CPU architecture support
+==================================
+
+QEMU's TCG emulation includes support for Armv6-M, Armv7-M, Armv8-M, and
+Armv8.1-M versions of the M-profile architucture.  It also has support
+for the following architecture extensions:
+
+- FP (Floating-point Extension)
+- FPCXT (FPCXT access instructions)
+- HP (Half-precision floating-point instructions)
+- LOB (Low Overhead loops and Branch future)
+- M (Main Extension)
+- MPU (Memory Protection Unit Extension)
+- PXN (Privileged Execute Never)
+- RAS (Reliability, Serviceability and Availability): "minimum RAS Extension" only
+- S (Security Extension)
+- ST (System Timer Extension)
+
+For information on the specifics of these extensions, please refer
+to the `Armv8-M Arm Architecture Reference Manual
+<https://developer.arm.com/documentation/ddi0553/latest>`_.
+
+When a specific named CPU is being emulated, only those features which
+are present in hardware for that CPU are emulated. (If a feature is
+not in the list above then it is not supported, even if the real
+hardware should have it.) There is no equivalent of the ``max`` CPU for
+M-profile.
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -XXX,XX +XXX,XX @@ undocumented; you can get a complete list by running
    arm/virt
    arm/xlnx-versal-virt
 
+Emulated CPU architecture support
+=================================
+
+.. toctree::
+   arm/emulation
+
 Arm CPU features
 ================
 
-- 
2.20.1

In the code for handling VFP system register accesses there is some
stray whitespace after a unary '-' operator, and also some incorrect
indent in a couple of function prototypes.  We're about to move this
code to another file, so fix the code style issues first so
checkpatch doesn't complain about the code-movement patch.

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
 }
 
 static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
-
                                   fp_sysreg_loadfn *loadfn,
-                                 void *opaque)
+                                  void *opaque)
 {
     /* Do a write to an M-profile floating point system register */
     TCGv_i32 tmp;
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
 }
 
 static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
-                                fp_sysreg_storefn *storefn,
-                                void *opaque)
+                                 fp_sysreg_storefn *storefn,
+                                 void *opaque)
 {
     /* Do a read from an M-profile floating point system register */
     TCGv_i32 tmp;
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
     TCGv_i32 addr;
 
     if (!a->a) {
-        offset = - offset;
+        offset = -offset;
     }
 
     addr = load_reg(s, a->rn);
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
     TCGv_i32 value = tcg_temp_new_i32();
 
     if (!a->a) {
-        offset = - offset;
+        offset = -offset;
     }
 
     addr = load_reg(s, a->rn);
-- 
2.20.1

If the guest makes an FPCXT_NS access when the FPU is disabled,
one of two things happens:
 * if there is no active FP context, then the insn behaves the
   same way as if the FPU was enabled: writes ignored, reads
   same value as FPDSCR_NS
 * if there is an active FP context, then we take a NOCP
   exception

Add code to the sysreg read/write functions which emits
code to take the NOCP exception in the latter case.

At the moment this will never be used, because the NOCP checks in
m-nocp.decode happen first, and so the trans functions are never
called when the FPU is disabled.  The code will be needed when we
move the sysreg access insns to before the NOCP patterns in the
following commit.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-3-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
         lab_end = gen_new_label();
         /* fpInactive case: write is a NOP, so branch to end */
         gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
-        /* !fpInactive: PreserveFPState(), and reads same as FPCXT_S */
+        /*
+         * !fpInactive: if FPU disabled, take NOCP exception;
+         * otherwise PreserveFPState(), and then FPCXT_NS writes
+         * behave the same as FPCXT_S writes.
+         */
+        if (s->fp_excp_el) {
+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                               syn_uncategorized(), s->fp_excp_el);
+            /*
+             * This was only a conditional exception, so override
+             * gen_exception_insn()'s default to DISAS_NORETURN
+             */
+            s->base.is_jmp = DISAS_NEXT;
+            break;
+        }
         gen_preserve_fp_state(s);
         /* fall through */
     case ARM_VFP_FPCXT_S:
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         tcg_gen_br(lab_end);
 
         gen_set_label(lab_active);
-        /* !fpInactive: Reads the same as FPCXT_S, but side effects differ */
+        /*
+         * !fpInactive: if FPU disabled, take NOCP exception;
+         * otherwise PreserveFPState(), and then FPCXT_NS
+         * reads the same as FPCXT_S.
+         */
+        if (s->fp_excp_el) {
+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                               syn_uncategorized(), s->fp_excp_el);
+            /*
+             * This was only a conditional exception, so override
+             * gen_exception_insn()'s default to DISAS_NORETURN
+             */
+            s->base.is_jmp = DISAS_NEXT;
+            break;
+        }
         gen_preserve_fp_state(s);
         tmp = tcg_temp_new_i32();
         sfpa = tcg_temp_new_i32();
-- 
2.20.1

The M-profile architecture requires that accesses to FPCXT_NS when
there is no active FP state must not take a NOCP fault even if the
FPU is disabled. We were not implementing this correctly, because
in our decode we catch the NOCP faults early in m-nocp.decode.

Fix this bug by moving all the handling of M-profile FP system
register accesses from vfp.decode into m-nocp.decode and putting
it above the NOCP blocks. This provides the correct behaviour:
 * for accesses other than FPCXT_NS the trans functions call
   vfp_access_check(), which will check for FPU disabled and
   raise a NOCP exception if necessary
 * for FPCXT_NS we have the special case code that doesn't
   call vfp_access_check()
 * when these trans functions want to raise an UNDEF they return
   false, so the decoder will fall through into the NOCP blocks.
   This means that NOCP correctly takes precedence over UNDEF
   for these insns. (This is a difference from the other insns
   handled by m-nocp.decode, where UNDEF takes precedence and
   which we implement by having those trans functions call
   unallocated_encoding() in the appropriate places.)

[Note for backport to stable: this commit has a semantic dependency
on commit 9a486856e9173af, which was not marked as cc-stable because
we didn't know we'd need it for a for-stable bugfix.]

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-4-peter.maydell@linaro.org
---
 target/arm/translate-a32.h    |   1 +
 target/arm/m-nocp.decode      |  24 ++
 target/arm/vfp.decode         |  14 -
 target/arm/translate-m-nocp.c | 514 +++++++++++++++++++++++++++++++++
 target/arm/translate-vfp.c    | 517 +---------------------------------
 5 files changed, 542 insertions(+), 528 deletions(-)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@ bool disas_neon_shared(DisasContext *s, uint32_t insn);
 void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
 void arm_gen_condlabel(DisasContext *s);
 bool vfp_access_check(DisasContext *s);
+void gen_preserve_fp_state(DisasContext *s);
 void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop);
 void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop);
 void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop);
diff --git a/target/arm/m-nocp.decode b/target/arm/m-nocp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m-nocp.decode
+++ b/target/arm/m-nocp.decode
@@ -XXX,XX +XXX,XX @@
 
 &nocp cp
 
+# M-profile VLDR/VSTR to sysreg
+%vldr_sysreg 22:1 13:3
+%imm7_0x4 0:7 !function=times_4
+
+&vldr_sysreg rn reg imm a w p
+@vldr_sysreg .... ... . a:1 . . . rn:4 ... . ... .. ....... \
+             reg=%vldr_sysreg imm=%imm7_0x4 &vldr_sysreg
+
 {
   # Special cases which do not take an early NOCP: VLLDM and VLSTM
   VLLDM_VLSTM  1110 1100 001 l:1 rn:4 0000 1010 op:1 000 0000
@@ -XXX,XX +XXX,XX @@
   VSCCLRM      1110 1100 1.01 1111 .... 1011 imm:7 0   vd=%vd_dp size=3
   VSCCLRM      1110 1100 1.01 1111 .... 1010 imm:8     vd=%vd_sp size=2
 
+  # FP system register accesses: these are a special case because accesses
+  # to FPCXT_NS succeed even if the FPU is disabled. We therefore need
+  # to handle them before the big NOCP blocks. Note that within these
+  # insns NOCP still has higher priority than UNDEFs; this is implemented
+  # by their returning 'false' for UNDEF so as to fall through into the
+  # NOCP check (in contrast to VLLDM etc, which call unallocated_encoding()
+  # for the UNDEFs there that must take precedence over NOCP.)
+
+  VMSR_VMRS    ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
+
+  # P=0 W=0 is SEE "Related encodings", so split into two patterns
+  VLDR_sysreg  ---- 110 1 . . w:1 1 .... ... 0 111 11 ....... @vldr_sysreg p=1
+  VLDR_sysreg  ---- 110 0 . . 1   1 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
+  VSTR_sysreg  ---- 110 1 . . w:1 0 .... ... 0 111 11 ....... @vldr_sysreg p=1
+  VSTR_sysreg  ---- 110 0 . . 1   0 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
+
   NOCP         111- 1110 ---- ---- ---- cp:4 ---- ---- &nocp
   NOCP         111- 110- ---- ---- ---- cp:4 ---- ---- &nocp
   # From v8.1M onwards this range will also NOCP:
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR_hp ---- 1101 u:1 .0 l:1 rn:4 .... 1001 imm:8      vd=%vd_sp
 VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8      vd=%vd_sp
 VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8      vd=%vd_dp
 
-# M-profile VLDR/VSTR to sysreg
-%vldr_sysreg 22:1 13:3
-%imm7_0x4 0:7 !function=times_4
-
-&vldr_sysreg rn reg imm a w p
-@vldr_sysreg .... ... . a:1 . . . rn:4 ... . ... .. ....... \
-             reg=%vldr_sysreg imm=%imm7_0x4 &vldr_sysreg
-
-# P=0 W=0 is SEE "Related encodings", so split into two patterns
-VLDR_sysreg  ---- 110 1 . . w:1 1 .... ... 0 111 11 ....... @vldr_sysreg p=1
-VLDR_sysreg  ---- 110 0 . . 1   1 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
-VSTR_sysreg  ---- 110 1 . . w:1 0 .... ... 0 111 11 ....... @vldr_sysreg p=1
-VSTR_sysreg  ---- 110 0 . . 1   0 .... ... 0 111 11 ....... @vldr_sysreg p=0 w=1
-
 # We split the load/store multiple up into two patterns to avoid
 # overlap with other insns in the "Advanced SIMD load/store and 64-bit move"
 # grouping:
diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-m-nocp.c
+++ b/target/arm/translate-m-nocp.c
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/osdep.h"
 #include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
 #include "translate.h"
 #include "translate-a32.h"
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
     return true;
 }
 
+/*
+ * M-profile provides two different sets of instructions that can
+ * access floating point system registers: VMSR/VMRS (which move
+ * to/from a general purpose register) and VLDR/VSTR sysreg (which
+ * move directly to/from memory). In some cases there are also side
+ * effects which must happen after any write to memory (which could
+ * cause an exception). So we implement the common logic for the
+ * sysreg access in gen_M_fp_sysreg_write() and gen_M_fp_sysreg_read(),
+ * which take pointers to callback functions which will perform the
+ * actual "read/write general purpose register" and "read/write
+ * memory" operations.
+ */
+
+/*
+ * Emit code to store the sysreg to its final destination; frees the
+ * TCG temp 'value' it is passed.
+ */
+typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
+/*
+ * Emit code to load the value to be copied to the sysreg; returns
+ * a new TCG temporary
+ */
+typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
+
+/* Common decode/access checks for fp sysreg read/write */
+typedef enum FPSysRegCheckResult {
+    FPSysRegCheckFailed, /* caller should return false */
+    FPSysRegCheckDone, /* caller should return true */
+    FPSysRegCheckContinue, /* caller should continue generating code */
+} FPSysRegCheckResult;
+
+static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
+{
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
+        return FPSysRegCheckFailed;
+    }
+
+    switch (regno) {
+    case ARM_VFP_FPSCR:
+    case QEMU_VFP_FPSCR_NZCV:
+        break;
+    case ARM_VFP_FPSCR_NZCVQC:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+            return FPSysRegCheckFailed;
+        }
+        break;
+    case ARM_VFP_FPCXT_S:
+    case ARM_VFP_FPCXT_NS:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+            return FPSysRegCheckFailed;
+        }
+        if (!s->v8m_secure) {
+            return FPSysRegCheckFailed;
+        }
+        break;
+    case ARM_VFP_VPR:
+    case ARM_VFP_P0:
+        if (!dc_isar_feature(aa32_mve, s)) {
+            return FPSysRegCheckFailed;
+        }
+        break;
+    default:
+        return FPSysRegCheckFailed;
+    }
+
+    /*
+     * FPCXT_NS is a special case: it has specific handling for
+     * "current FP state is inactive", and must do the PreserveFPState()
+     * but not the usual full set of actions done by ExecuteFPCheck().
+     * So we don't call vfp_access_check() and the callers must handle this.
+     */
+    if (regno != ARM_VFP_FPCXT_NS && !vfp_access_check(s)) {
+        return FPSysRegCheckDone;
+    }
+    return FPSysRegCheckContinue;
+}
+
+static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
+                                  TCGLabel *label)
+{
+    /*
+     * FPCXT_NS is a special case: it has specific handling for
+     * "current FP state is inactive", and must do the PreserveFPState()
+     * but not the usual full set of actions done by ExecuteFPCheck().
+     * We don't have a TB flag that matches the fpInactive check, so we
+     * do it at runtime as we don't expect FPCXT_NS accesses to be frequent.
+     *
+     * Emit code that checks fpInactive and does a conditional
+     * branch to label based on it:
+     *  if cond is TCG_COND_NE then branch if fpInactive != 0 (ie if inactive)
+     *  if cond is TCG_COND_EQ then branch if fpInactive == 0 (ie if active)
+     */
+    assert(cond == TCG_COND_EQ || cond == TCG_COND_NE);
+
+    /* fpInactive = FPCCR_NS.ASPEN == 1 && CONTROL.FPCA == 0 */
+    TCGv_i32 aspen, fpca;
+    aspen = load_cpu_field(v7m.fpccr[M_REG_NS]);
+    fpca = load_cpu_field(v7m.control[M_REG_S]);
+    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
+    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
+    tcg_gen_andi_i32(fpca, fpca, R_V7M_CONTROL_FPCA_MASK);
+    tcg_gen_or_i32(fpca, fpca, aspen);
+    tcg_gen_brcondi_i32(tcg_invert_cond(cond), fpca, 0, label);
+    tcg_temp_free_i32(aspen);
+    tcg_temp_free_i32(fpca);
+}
+
+static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
+                                  fp_sysreg_loadfn *loadfn,
+                                  void *opaque)
+{
+    /* Do a write to an M-profile floating point system register */
+    TCGv_i32 tmp;
+    TCGLabel *lab_end = NULL;
+
+    switch (fp_sysreg_checks(s, regno)) {
+    case FPSysRegCheckFailed:
+        return false;
+    case FPSysRegCheckDone:
+        return true;
+    case FPSysRegCheckContinue:
+        break;
+    }
+
+    switch (regno) {
+    case ARM_VFP_FPSCR:
+        tmp = loadfn(s, opaque);
+        gen_helper_vfp_set_fpscr(cpu_env, tmp);
+        tcg_temp_free_i32(tmp);
+        gen_lookup_tb(s);
+        break;
+    case ARM_VFP_FPSCR_NZCVQC:
+    {
+        TCGv_i32 fpscr;
+        tmp = loadfn(s, opaque);
+        if (dc_isar_feature(aa32_mve, s)) {
+            /* QC is only present for MVE; otherwise RES0 */
+            TCGv_i32 qc = tcg_temp_new_i32();
+            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
+            /*
+             * The 4 vfp.qc[] fields need only be "zero" vs "non-zero";
+             * here writing the same value into all elements is simplest.
+             */
+            tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
+                                 16, 16, qc);
+        }
+        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
+        fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
+        tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
+        tcg_gen_or_i32(fpscr, fpscr, tmp);
+        store_cpu_field(fpscr, vfp.xregs[ARM_VFP_FPSCR]);
+        tcg_temp_free_i32(tmp);
+        break;
+    }
+    case ARM_VFP_FPCXT_NS:
+        lab_end = gen_new_label();
+        /* fpInactive case: write is a NOP, so branch to end */
+        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
+        /*
+         * !fpInactive: if FPU disabled, take NOCP exception;
+         * otherwise PreserveFPState(), and then FPCXT_NS writes
+         * behave the same as FPCXT_S writes.
+         */
+        if (s->fp_excp_el) {
+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                               syn_uncategorized(), s->fp_excp_el);
+            /*
+             * This was only a conditional exception, so override
+             * gen_exception_insn()'s default to DISAS_NORETURN
+             */
+            s->base.is_jmp = DISAS_NEXT;
+            break;
+        }
+        gen_preserve_fp_state(s);
+        /* fall through */
+    case ARM_VFP_FPCXT_S:
+    {
+        TCGv_i32 sfpa, control;
+        /*
+         * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
+         * bits [27:0] from value and zeroes bits [31:28].
+         */
+        tmp = loadfn(s, opaque);
+        sfpa = tcg_temp_new_i32();
+        tcg_gen_shri_i32(sfpa, tmp, 31);
+        control = load_cpu_field(v7m.control[M_REG_S]);
+        tcg_gen_deposit_i32(control, control, sfpa,
+                            R_V7M_CONTROL_SFPA_SHIFT, 1);
+        store_cpu_field(control, v7m.control[M_REG_S]);
+        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
+        gen_helper_vfp_set_fpscr(cpu_env, tmp);
+        tcg_temp_free_i32(tmp);
+        tcg_temp_free_i32(sfpa);
+        break;
+    }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = loadfn(s, opaque);
+        store_cpu_field(tmp, v7m.vpr);
+        break;
+    case ARM_VFP_P0:
+    {
+        TCGv_i32 vpr;
+        tmp = loadfn(s, opaque);
+        vpr = load_cpu_field(v7m.vpr);
+        tcg_gen_deposit_i32(vpr, vpr, tmp,
+                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        store_cpu_field(vpr, v7m.vpr);
+        tcg_temp_free_i32(tmp);
+        break;
+    }
+    default:
+        g_assert_not_reached();
+    }
+    if (lab_end) {
+        gen_set_label(lab_end);
+    }
+    return true;
+}
+
+static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
+                                 fp_sysreg_storefn *storefn,
+                                 void *opaque)
+{
+    /* Do a read from an M-profile floating point system register */
+    TCGv_i32 tmp;
+    TCGLabel *lab_end = NULL;
+    bool lookup_tb = false;
+
+    switch (fp_sysreg_checks(s, regno)) {
+    case FPSysRegCheckFailed:
+        return false;
+    case FPSysRegCheckDone:
+        return true;
+    case FPSysRegCheckContinue:
+        break;
+    }
+
+    if (regno == ARM_VFP_FPSCR_NZCVQC && !dc_isar_feature(aa32_mve, s)) {
+        /* QC is RES0 without MVE, so NZCVQC simplifies to NZCV */
+        regno = QEMU_VFP_FPSCR_NZCV;
+    }
+
+    switch (regno) {
+    case ARM_VFP_FPSCR:
+        tmp = tcg_temp_new_i32();
+        gen_helper_vfp_get_fpscr(tmp, cpu_env);
+        storefn(s, opaque, tmp);
+        break;
+    case ARM_VFP_FPSCR_NZCVQC:
+        tmp = tcg_temp_new_i32();
+        gen_helper_vfp_get_fpscr(tmp, cpu_env);
+        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
+        storefn(s, opaque, tmp);
+        break;
+    case QEMU_VFP_FPSCR_NZCV:
+        /*
+         * Read just NZCV; this is a special case to avoid the
+         * helper call for the "VMRS to CPSR.NZCV" insn.
+         */
+        tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
+        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
+        storefn(s, opaque, tmp);
+        break;
+    case ARM_VFP_FPCXT_S:
+    {
+        TCGv_i32 control, sfpa, fpscr;
+        /* Bits [27:0] from FPSCR, bit [31] from CONTROL.SFPA */
+        tmp = tcg_temp_new_i32();
+        sfpa = tcg_temp_new_i32();
+        gen_helper_vfp_get_fpscr(tmp, cpu_env);
+        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
+        control = load_cpu_field(v7m.control[M_REG_S]);
+        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
+        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
+        tcg_gen_or_i32(tmp, tmp, sfpa);
+        tcg_temp_free_i32(sfpa);
+        /*
+         * Store result before updating FPSCR etc, in case
+         * it is a memory write which causes an exception.
+         */
+        storefn(s, opaque, tmp);
+        /*
+         * Now we must reset FPSCR from FPDSCR_NS, and clear
+         * CONTROL.SFPA; so we'll end the TB here.
+         */
+        tcg_gen_andi_i32(control, control, ~R_V7M_CONTROL_SFPA_MASK);
+        store_cpu_field(control, v7m.control[M_REG_S]);
+        fpscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
+        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
+        tcg_temp_free_i32(fpscr);
+        lookup_tb = true;
+        break;
+    }
+    case ARM_VFP_FPCXT_NS:
+    {
+        TCGv_i32 control, sfpa, fpscr, fpdscr, zero;
+        TCGLabel *lab_active = gen_new_label();
+
+        lookup_tb = true;
+
+        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
+        /* fpInactive case: reads as FPDSCR_NS */
+        TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
+        storefn(s, opaque, tmp);
+        lab_end = gen_new_label();
+        tcg_gen_br(lab_end);
+
+        gen_set_label(lab_active);
+        /*
+         * !fpInactive: if FPU disabled, take NOCP exception;
+         * otherwise PreserveFPState(), and then FPCXT_NS
+         * reads the same as FPCXT_S.
+         */
+        if (s->fp_excp_el) {
+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                               syn_uncategorized(), s->fp_excp_el);
+            /*
+             * This was only a conditional exception, so override
+             * gen_exception_insn()'s default to DISAS_NORETURN
+             */
+            s->base.is_jmp = DISAS_NEXT;
+            break;
+        }
+        gen_preserve_fp_state(s);
+        tmp = tcg_temp_new_i32();
+        sfpa = tcg_temp_new_i32();
+        fpscr = tcg_temp_new_i32();
+        gen_helper_vfp_get_fpscr(fpscr, cpu_env);
+        tcg_gen_andi_i32(tmp, fpscr, ~FPCR_NZCV_MASK);
+        control = load_cpu_field(v7m.control[M_REG_S]);
+        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
+        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
+        tcg_gen_or_i32(tmp, tmp, sfpa);
+        tcg_temp_free_i32(control);
+        /* Store result before updating FPSCR, in case it faults */
+        storefn(s, opaque, tmp);
+        /* If SFPA is zero then set FPSCR from FPDSCR_NS */
+        fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
+        zero = tcg_const_i32(0);
+        tcg_gen_movcond_i32(TCG_COND_EQ, fpscr, sfpa, zero, fpdscr, fpscr);
+        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
+        tcg_temp_free_i32(zero);
+        tcg_temp_free_i32(sfpa);
+        tcg_temp_free_i32(fpdscr);
+        tcg_temp_free_i32(fpscr);
+        break;
+    }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = load_cpu_field(v7m.vpr);
+        storefn(s, opaque, tmp);
+        break;
+    case ARM_VFP_P0:
+        tmp = load_cpu_field(v7m.vpr);
+        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        storefn(s, opaque, tmp);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (lab_end) {
+        gen_set_label(lab_end);
+    }
+    if (lookup_tb) {
+        gen_lookup_tb(s);
+    }
+    return true;
+}
+
+static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
+{
+    arg_VMSR_VMRS *a = opaque;
+
+    if (a->rt == 15) {
+        /* Set the 4 flag bits in the CPSR */
+        gen_set_nzcv(value);
+        tcg_temp_free_i32(value);
+    } else {
+        store_reg(s, a->rt, value);
+    }
+}
+
+static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
+{
+    arg_VMSR_VMRS *a = opaque;
+
+    return load_reg(s, a->rt);
+}
+
+static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
+{
+    /*
+     * Accesses to R15 are UNPREDICTABLE; we choose to undef.
+     * FPSCR -> r15 is a special case which writes to the PSR flags;
+     * set a->reg to a special value to tell gen_M_fp_sysreg_read()
+     * we only care about the top 4 bits of FPSCR there.
+     */
+    if (a->rt == 15) {
+        if (a->l && a->reg == ARM_VFP_FPSCR) {
+            a->reg = QEMU_VFP_FPSCR_NZCV;
+        } else {
+            return false;
+        }
+    }
+
+    if (a->l) {
+        /* VMRS, move FP system register to gp register */
+        return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_gpr, a);
+    } else {
+        /* VMSR, move gp register to FP system register */
+        return gen_M_fp_sysreg_write(s, a->reg, gpr_to_fp_sysreg, a);
+    }
+}
+
+static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
+{
+    arg_vldr_sysreg *a = opaque;
+    uint32_t offset = a->imm;
+    TCGv_i32 addr;
+
+    if (!a->a) {
+        offset = -offset;
+    }
+
+    addr = load_reg(s, a->rn);
+    if (a->p) {
+        tcg_gen_addi_i32(addr, addr, offset);
+    }
+
+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
+        gen_helper_v8m_stackcheck(cpu_env, addr);
+    }
+
+    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
+                    MO_UL | MO_ALIGN | s->be_data);
+    tcg_temp_free_i32(value);
+
+    if (a->w) {
+        /* writeback */
+        if (!a->p) {
+            tcg_gen_addi_i32(addr, addr, offset);
+        }
+        store_reg(s, a->rn, addr);
+    } else {
+        tcg_temp_free_i32(addr);
+    }
+}
+
+static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
+{
+    arg_vldr_sysreg *a = opaque;
+    uint32_t offset = a->imm;
+    TCGv_i32 addr;
+    TCGv_i32 value = tcg_temp_new_i32();
+
+    if (!a->a) {
+        offset = -offset;
+    }
+
+    addr = load_reg(s, a->rn);
+    if (a->p) {
+        tcg_gen_addi_i32(addr, addr, offset);
+    }
+
+    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
+        gen_helper_v8m_stackcheck(cpu_env, addr);
+    }
+
+    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
+                    MO_UL | MO_ALIGN | s->be_data);
+
+    if (a->w) {
+        /* writeback */
+        if (!a->p) {
+            tcg_gen_addi_i32(addr, addr, offset);
+        }
+        store_reg(s, a->rn, addr);
+    } else {
+        tcg_temp_free_i32(addr);
+    }
+    return value;
+}
+
+static bool trans_VLDR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        return false;
+    }
+    if (a->rn == 15) {
+        return false;
+    }
+    return gen_M_fp_sysreg_write(s, a->reg, memory_to_fp_sysreg, a);
+}
+
+static bool trans_VSTR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        return false;
+    }
+    if (a->rn == 15) {
+        return false;
+    }
+    return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_memory, a);
+}
+
 static bool trans_NOCP(DisasContext *s, arg_nocp *a)
 {
     /*
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static inline long vfp_f16_offset(unsigned reg, bool top)
  * Generate code for M-profile lazy FP state preservation if needed;
  * this corresponds to the pseudocode PreserveFPState() function.
  */
-static void gen_preserve_fp_state(DisasContext *s)
+void gen_preserve_fp_state(DisasContext *s)
 {
     if (s->v7m_lspact) {
         /*
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
     return true;
 }
 
-/*
- * M-profile provides two different sets of instructions that can
- * access floating point system registers: VMSR/VMRS (which move
- * to/from a general purpose register) and VLDR/VSTR sysreg (which
- * move directly to/from memory). In some cases there are also side
- * effects which must happen after any write to memory (which could
- * cause an exception). So we implement the common logic for the
- * sysreg access in gen_M_fp_sysreg_write() and gen_M_fp_sysreg_read(),
- * which take pointers to callback functions which will perform the
- * actual "read/write general purpose register" and "read/write
- * memory" operations.
- */
-
-/*
- * Emit code to store the sysreg to its final destination; frees the
- * TCG temp 'value' it is passed.
- */
-typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
-/*
- * Emit code to load the value to be copied to the sysreg; returns
- * a new TCG temporary
- */
-typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
-
-/* Common decode/access checks for fp sysreg read/write */
-typedef enum FPSysRegCheckResult {
-    FPSysRegCheckFailed, /* caller should return false */
-    FPSysRegCheckDone, /* caller should return true */
-    FPSysRegCheckContinue, /* caller should continue generating code */
-} FPSysRegCheckResult;
-
-static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
-{
-    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-        return FPSysRegCheckFailed;
-    }
-
-    switch (regno) {
-    case ARM_VFP_FPSCR:
-    case QEMU_VFP_FPSCR_NZCV:
-        break;
-    case ARM_VFP_FPSCR_NZCVQC:
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return FPSysRegCheckFailed;
-        }
-        break;
-    case ARM_VFP_FPCXT_S:
-    case ARM_VFP_FPCXT_NS:
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return FPSysRegCheckFailed;
-        }
-        if (!s->v8m_secure) {
-            return FPSysRegCheckFailed;
-        }
-        break;
-    case ARM_VFP_VPR:
-    case ARM_VFP_P0:
-        if (!dc_isar_feature(aa32_mve, s)) {
-            return FPSysRegCheckFailed;
-        }
-        break;
-    default:
-        return FPSysRegCheckFailed;
-    }
-
-    /*
-     * FPCXT_NS is a special case: it has specific handling for
-     * "current FP state is inactive", and must do the PreserveFPState()
-     * but not the usual full set of actions done by ExecuteFPCheck().
-     * So we don't call vfp_access_check() and the callers must handle this.
-     */
-    if (regno != ARM_VFP_FPCXT_NS && !vfp_access_check(s)) {
-        return FPSysRegCheckDone;
-    }
-    return FPSysRegCheckContinue;
-}
-
-static void gen_branch_fpInactive(DisasContext *s, TCGCond cond,
-                                  TCGLabel *label)
-{
-    /*
-     * FPCXT_NS is a special case: it has specific handling for
-     * "current FP state is inactive", and must do the PreserveFPState()
-     * but not the usual full set of actions done by ExecuteFPCheck().
-     * We don't have a TB flag that matches the fpInactive check, so we
-     * do it at runtime as we don't expect FPCXT_NS accesses to be frequent.
-     *
-     * Emit code that checks fpInactive and does a conditional
-     * branch to label based on it:
-     *  if cond is TCG_COND_NE then branch if fpInactive != 0 (ie if inactive)
-     *  if cond is TCG_COND_EQ then branch if fpInactive == 0 (ie if active)
-     */
-    assert(cond == TCG_COND_EQ || cond == TCG_COND_NE);
-
-    /* fpInactive = FPCCR_NS.ASPEN == 1 && CONTROL.FPCA == 0 */
-    TCGv_i32 aspen, fpca;
-    aspen = load_cpu_field(v7m.fpccr[M_REG_NS]);
-    fpca = load_cpu_field(v7m.control[M_REG_S]);
-    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
-    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
-    tcg_gen_andi_i32(fpca, fpca, R_V7M_CONTROL_FPCA_MASK);
-    tcg_gen_or_i32(fpca, fpca, aspen);
-    tcg_gen_brcondi_i32(tcg_invert_cond(cond), fpca, 0, label);
-    tcg_temp_free_i32(aspen);
-    tcg_temp_free_i32(fpca);
-}
-
-static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
-                                  fp_sysreg_loadfn *loadfn,
-                                  void *opaque)
-{
-    /* Do a write to an M-profile floating point system register */
-    TCGv_i32 tmp;
-    TCGLabel *lab_end = NULL;
-
-    switch (fp_sysreg_checks(s, regno)) {
-    case FPSysRegCheckFailed:
-        return false;
-    case FPSysRegCheckDone:
-        return true;
-    case FPSysRegCheckContinue:
-        break;
-    }
-
-    switch (regno) {
-    case ARM_VFP_FPSCR:
-        tmp = loadfn(s, opaque);
-        gen_helper_vfp_set_fpscr(cpu_env, tmp);
-        tcg_temp_free_i32(tmp);
-        gen_lookup_tb(s);
-        break;
-    case ARM_VFP_FPSCR_NZCVQC:
-    {
-        TCGv_i32 fpscr;
-        tmp = loadfn(s, opaque);
-        if (dc_isar_feature(aa32_mve, s)) {
-            /* QC is only present for MVE; otherwise RES0 */
-            TCGv_i32 qc = tcg_temp_new_i32();
-            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
-            /*
-             * The 4 vfp.qc[] fields need only be "zero" vs "non-zero";
-             * here writing the same value into all elements is simplest.
-             */
-            tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
-                                 16, 16, qc);
-        }
-        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
-        fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
-        tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
-        tcg_gen_or_i32(fpscr, fpscr, tmp);
-        store_cpu_field(fpscr, vfp.xregs[ARM_VFP_FPSCR]);
-        tcg_temp_free_i32(tmp);
-        break;
-    }
-    case ARM_VFP_FPCXT_NS:
-        lab_end = gen_new_label();
-        /* fpInactive case: write is a NOP, so branch to end */
-        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
-        /*
-         * !fpInactive: if FPU disabled, take NOCP exception;
-         * otherwise PreserveFPState(), and then FPCXT_NS writes
-         * behave the same as FPCXT_S writes.
-         */
-        if (s->fp_excp_el) {
-            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-                               syn_uncategorized(), s->fp_excp_el);
-            /*
-             * This was only a conditional exception, so override
-             * gen_exception_insn()'s default to DISAS_NORETURN
-             */
-            s->base.is_jmp = DISAS_NEXT;
-            break;
-        }
-        gen_preserve_fp_state(s);
-        /* fall through */
-    case ARM_VFP_FPCXT_S:
-    {
-        TCGv_i32 sfpa, control;
-        /*
-         * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
-         * bits [27:0] from value and zeroes bits [31:28].
-         */
-        tmp = loadfn(s, opaque);
-        sfpa = tcg_temp_new_i32();
-        tcg_gen_shri_i32(sfpa, tmp, 31);
-        control = load_cpu_field(v7m.control[M_REG_S]);
-        tcg_gen_deposit_i32(control, control, sfpa,
-                            R_V7M_CONTROL_SFPA_SHIFT, 1);
-        store_cpu_field(control, v7m.control[M_REG_S]);
-        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
-        gen_helper_vfp_set_fpscr(cpu_env, tmp);
-        tcg_temp_free_i32(tmp);
-        tcg_temp_free_i32(sfpa);
-        break;
-    }
-    case ARM_VFP_VPR:
-        /* Behaves as NOP if not privileged */
-        if (IS_USER(s)) {
-            break;
-        }
-        tmp = loadfn(s, opaque);
-        store_cpu_field(tmp, v7m.vpr);
-        break;
-    case ARM_VFP_P0:
-    {
-        TCGv_i32 vpr;
-        tmp = loadfn(s, opaque);
-        vpr = load_cpu_field(v7m.vpr);
-        tcg_gen_deposit_i32(vpr, vpr, tmp,
-                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
-        store_cpu_field(vpr, v7m.vpr);
-        tcg_temp_free_i32(tmp);
-        break;
-    }
-    default:
-        g_assert_not_reached();
-    }
-    if (lab_end) {
-        gen_set_label(lab_end);
-    }
-    return true;
-}
-
-static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
-                                 fp_sysreg_storefn *storefn,
-                                 void *opaque)
-{
-    /* Do a read from an M-profile floating point system register */
-    TCGv_i32 tmp;
-    TCGLabel *lab_end = NULL;
-    bool lookup_tb = false;
-
-    switch (fp_sysreg_checks(s, regno)) {
-    case FPSysRegCheckFailed:
-        return false;
-    case FPSysRegCheckDone:
-        return true;
-    case FPSysRegCheckContinue:
-        break;
-    }
-
-    if (regno == ARM_VFP_FPSCR_NZCVQC && !dc_isar_feature(aa32_mve, s)) {
-        /* QC is RES0 without MVE, so NZCVQC simplifies to NZCV */
-        regno = QEMU_VFP_FPSCR_NZCV;
-    }
-
-    switch (regno) {
-    case ARM_VFP_FPSCR:
-        tmp = tcg_temp_new_i32();
-        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-        storefn(s, opaque, tmp);
-        break;
-    case ARM_VFP_FPSCR_NZCVQC:
-        tmp = tcg_temp_new_i32();
-        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
-        storefn(s, opaque, tmp);
-        break;
-    case QEMU_VFP_FPSCR_NZCV:
-        /*
-         * Read just NZCV; this is a special case to avoid the
-         * helper call for the "VMRS to CPSR.NZCV" insn.
-         */
-        tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
-        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
-        storefn(s, opaque, tmp);
-        break;
-    case ARM_VFP_FPCXT_S:
-    {
-        TCGv_i32 control, sfpa, fpscr;
-        /* Bits [27:0] from FPSCR, bit [31] from CONTROL.SFPA */
-        tmp = tcg_temp_new_i32();
-        sfpa = tcg_temp_new_i32();
-        gen_helper_vfp_get_fpscr(tmp, cpu_env);
-        tcg_gen_andi_i32(tmp, tmp, ~FPCR_NZCV_MASK);
-        control = load_cpu_field(v7m.control[M_REG_S]);
-        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
-        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
-        tcg_gen_or_i32(tmp, tmp, sfpa);
-        tcg_temp_free_i32(sfpa);
-        /*
-         * Store result before updating FPSCR etc, in case
-         * it is a memory write which causes an exception.
-         */
-        storefn(s, opaque, tmp);
-        /*
-         * Now we must reset FPSCR from FPDSCR_NS, and clear
-         * CONTROL.SFPA; so we'll end the TB here.
-         */
-        tcg_gen_andi_i32(control, control, ~R_V7M_CONTROL_SFPA_MASK);
-        store_cpu_field(control, v7m.control[M_REG_S]);
-        fpscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-        tcg_temp_free_i32(fpscr);
-        lookup_tb = true;
-        break;
-    }
-    case ARM_VFP_FPCXT_NS:
-    {
-        TCGv_i32 control, sfpa, fpscr, fpdscr, zero;
-        TCGLabel *lab_active = gen_new_label();
-
-        lookup_tb = true;
-
-        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
-        /* fpInactive case: reads as FPDSCR_NS */
-        TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-        storefn(s, opaque, tmp);
-        lab_end = gen_new_label();
-        tcg_gen_br(lab_end);
-
-        gen_set_label(lab_active);
-        /*
-         * !fpInactive: if FPU disabled, take NOCP exception;
-         * otherwise PreserveFPState(), and then FPCXT_NS
-         * reads the same as FPCXT_S.
-         */
-        if (s->fp_excp_el) {
-            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-                               syn_uncategorized(), s->fp_excp_el);
-            /*
-             * This was only a conditional exception, so override
-             * gen_exception_insn()'s default to DISAS_NORETURN
-             */
-            s->base.is_jmp = DISAS_NEXT;
-            break;
-        }
-        gen_preserve_fp_state(s);
-        tmp = tcg_temp_new_i32();
-        sfpa = tcg_temp_new_i32();
-        fpscr = tcg_temp_new_i32();
-        gen_helper_vfp_get_fpscr(fpscr, cpu_env);
-        tcg_gen_andi_i32(tmp, fpscr, ~FPCR_NZCV_MASK);
-        control = load_cpu_field(v7m.control[M_REG_S]);
-        tcg_gen_andi_i32(sfpa, control, R_V7M_CONTROL_SFPA_MASK);
-        tcg_gen_shli_i32(sfpa, sfpa, 31 - R_V7M_CONTROL_SFPA_SHIFT);
-        tcg_gen_or_i32(tmp, tmp, sfpa);
-        tcg_temp_free_i32(control);
-        /* Store result before updating FPSCR, in case it faults */
-        storefn(s, opaque, tmp);
-        /* If SFPA is zero then set FPSCR from FPDSCR_NS */
-        fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-        zero = tcg_const_i32(0);
-        tcg_gen_movcond_i32(TCG_COND_EQ, fpscr, sfpa, zero, fpdscr, fpscr);
-        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-        tcg_temp_free_i32(zero);
-        tcg_temp_free_i32(sfpa);
-        tcg_temp_free_i32(fpdscr);
-        tcg_temp_free_i32(fpscr);
-        break;
-    }
-    case ARM_VFP_VPR:
-        /* Behaves as NOP if not privileged */
-        if (IS_USER(s)) {
-            break;
-        }
-        tmp = load_cpu_field(v7m.vpr);
-        storefn(s, opaque, tmp);
-        break;
-    case ARM_VFP_P0:
-        tmp = load_cpu_field(v7m.vpr);
-        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
-        storefn(s, opaque, tmp);
-        break;
-    default:
-        g_assert_not_reached();
-    }
-
-    if (lab_end) {
-        gen_set_label(lab_end);
-    }
-    if (lookup_tb) {
-        gen_lookup_tb(s);
-    }
-    return true;
-}
-
-static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
-{
-    arg_VMSR_VMRS *a = opaque;
-
-    if (a->rt == 15) {
-        /* Set the 4 flag bits in the CPSR */
-        gen_set_nzcv(value);
-        tcg_temp_free_i32(value);
-    } else {
-        store_reg(s, a->rt, value);
-    }
-}
-
-static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
-{
-    arg_VMSR_VMRS *a = opaque;
-
-    return load_reg(s, a->rt);
-}
-
-static bool gen_M_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
-{
-    /*
-     * Accesses to R15 are UNPREDICTABLE; we choose to undef.
-     * FPSCR -> r15 is a special case which writes to the PSR flags;
-     * set a->reg to a special value to tell gen_M_fp_sysreg_read()
-     * we only care about the top 4 bits of FPSCR there.
-     */
-    if (a->rt == 15) {
-        if (a->l && a->reg == ARM_VFP_FPSCR) {
-            a->reg = QEMU_VFP_FPSCR_NZCV;
-        } else {
-            return false;
-        }
-    }
-
-    if (a->l) {
-        /* VMRS, move FP system register to gp register */
-        return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_gpr, a);
-    } else {
-        /* VMSR, move gp register to FP system register */
-        return gen_M_fp_sysreg_write(s, a->reg, gpr_to_fp_sysreg, a);
-    }
-}
-
 static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
 {
     TCGv_i32 tmp;
     bool ignore_vfp_enabled = false;
 
     if (arm_dc_feature(s, ARM_FEATURE_M)) {
-        return gen_M_VMSR_VMRS(s, a);
+        /* M profile version was already handled in m-nocp.decode */
+        return false;
     }
 
     if (!dc_isar_feature(aa32_fpsp_v2, s)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
     return true;
 }
 
-static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
-{
-    arg_vldr_sysreg *a = opaque;
-    uint32_t offset = a->imm;
-    TCGv_i32 addr;
-
-    if (!a->a) {
-        offset = -offset;
-    }
-
-    addr = load_reg(s, a->rn);
-    if (a->p) {
-        tcg_gen_addi_i32(addr, addr, offset);
-    }
-
-    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
-        gen_helper_v8m_stackcheck(cpu_env, addr);
-    }
-
-    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
-                    MO_UL | MO_ALIGN | s->be_data);
-    tcg_temp_free_i32(value);
-
-    if (a->w) {
-        /* writeback */
-        if (!a->p) {
-            tcg_gen_addi_i32(addr, addr, offset);
-        }
-        store_reg(s, a->rn, addr);
-    } else {
-        tcg_temp_free_i32(addr);
-    }
-}
-
-static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
-{
-    arg_vldr_sysreg *a = opaque;
-    uint32_t offset = a->imm;
-    TCGv_i32 addr;
-    TCGv_i32 value = tcg_temp_new_i32();
-
-    if (!a->a) {
-        offset = -offset;
-    }
-
-    addr = load_reg(s, a->rn);
-    if (a->p) {
-        tcg_gen_addi_i32(addr, addr, offset);
-    }
-
-    if (s->v8m_stackcheck && a->rn == 13 && a->w) {
-        gen_helper_v8m_stackcheck(cpu_env, addr);
-    }
-
-    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
-                    MO_UL | MO_ALIGN | s->be_data);
-
-    if (a->w) {
-        /* writeback */
-        if (!a->p) {
-            tcg_gen_addi_i32(addr, addr, offset);
-        }
-        store_reg(s, a->rn, addr);
-    } else {
-        tcg_temp_free_i32(addr);
-    }
-    return value;
-}
-
-static bool trans_VLDR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
-{
-    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-        return false;
-    }
-    if (a->rn == 15) {
-        return false;
-    }
-    return gen_M_fp_sysreg_write(s, a->reg, memory_to_fp_sysreg, a);
-}
-
-static bool trans_VSTR_sysreg(DisasContext *s, arg_vldr_sysreg *a)
-{
-    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-        return false;
-    }
-    if (a->rn == 15) {
-        return false;
-    }
-    return gen_M_fp_sysreg_read(s, a->reg, fp_sysreg_to_memory, a);
-}
 
 static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
 {
-- 
2.20.1

A few subcases of VLDR/VSTR sysreg succeed but do not perform a
memory access:
 * VSTR of VPR when unprivileged
 * VLDR to VPR when unprivileged
 * VLDR to FPCXT_NS when fpInactive

In these cases, even though we don't do the memory access we should
still update the base register and perform the stack limit check if
the insn's addressing mode specifies writeback.  Our implementation
failed to do this, because we handle these side-effects inside the
memory_to_fp_sysreg() and fp_sysreg_to_memory() callback functions,
which are only called if there's something to load or store.

Fix this by adding an extra argument to the callbacks which is set to
true to actually perform the access and false to only do side effects
like writeback, and calling the callback with do_access = false
for the three cases listed above.

This produces slightly suboptimal code for the case of a write
to FPCXT_NS when the FPU is inactive and the insn didn't have
side effects (ie no writeback, or via VMSR), in which case we'll
generate a conditional branch over an unconditional branch.
But this doesn't seem to be important enough to merit requiring
the callback to report back whether it generated any code or not.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-5-peter.maydell@linaro.org
---
 target/arm/translate-m-nocp.c | 102 ++++++++++++++++++++++++----------
 1 file changed, 72 insertions(+), 30 deletions(-)

diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-m-nocp.c
+++ b/target/arm/translate-m-nocp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
 
 /*
  * Emit code to store the sysreg to its final destination; frees the
- * TCG temp 'value' it is passed.
+ * TCG temp 'value' it is passed. do_access is true to do the store,
+ * and false to skip it and only perform side-effects like base
+ * register writeback.
  */
-typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value);
+typedef void fp_sysreg_storefn(DisasContext *s, void *opaque, TCGv_i32 value,
+                               bool do_access);
 /*
  * Emit code to load the value to be copied to the sysreg; returns
- * a new TCG temporary
+ * a new TCG temporary. do_access is true to do the store,
+ * and false to skip it and only perform side-effects like base
+ * register writeback.
  */
-typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque);
+typedef TCGv_i32 fp_sysreg_loadfn(DisasContext *s, void *opaque,
+                                  bool do_access);
 
 /* Common decode/access checks for fp sysreg read/write */
 typedef enum FPSysRegCheckResult {
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
 
     switch (regno) {
     case ARM_VFP_FPSCR:
-        tmp = loadfn(s, opaque);
+        tmp = loadfn(s, opaque, true);
         gen_helper_vfp_set_fpscr(cpu_env, tmp);
         tcg_temp_free_i32(tmp);
         gen_lookup_tb(s);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
     case ARM_VFP_FPSCR_NZCVQC:
     {
         TCGv_i32 fpscr;
-        tmp = loadfn(s, opaque);
+        tmp = loadfn(s, opaque, true);
         if (dc_isar_feature(aa32_mve, s)) {
             /* QC is only present for MVE; otherwise RES0 */
             TCGv_i32 qc = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
         break;
     }
     case ARM_VFP_FPCXT_NS:
+    {
+        TCGLabel *lab_active = gen_new_label();
+
         lab_end = gen_new_label();
-        /* fpInactive case: write is a NOP, so branch to end */
-        gen_branch_fpInactive(s, TCG_COND_NE, lab_end);
+        gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
+        /*
+         * fpInactive case: write is a NOP, so only do side effects
+         * like register writeback before we branch to end
+         */
+        loadfn(s, opaque, false);
+        tcg_gen_br(lab_end);
+
+        gen_set_label(lab_active);
         /*
          * !fpInactive: if FPU disabled, take NOCP exception;
          * otherwise PreserveFPState(), and then FPCXT_NS writes
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
             break;
         }
         gen_preserve_fp_state(s);
-        /* fall through */
+    }
+    /* fall through */
     case ARM_VFP_FPCXT_S:
     {
         TCGv_i32 sfpa, control;
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
          * Set FPSCR and CONTROL.SFPA from value; the new FPSCR takes
          * bits [27:0] from value and zeroes bits [31:28].
          */
-        tmp = loadfn(s, opaque);
+        tmp = loadfn(s, opaque, true);
         sfpa = tcg_temp_new_i32();
         tcg_gen_shri_i32(sfpa, tmp, 31);
         control = load_cpu_field(v7m.control[M_REG_S]);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
     case ARM_VFP_VPR:
         /* Behaves as NOP if not privileged */
         if (IS_USER(s)) {
+            loadfn(s, opaque, false);
             break;
         }
-        tmp = loadfn(s, opaque);
+        tmp = loadfn(s, opaque, true);
         store_cpu_field(tmp, v7m.vpr);
         break;
     case ARM_VFP_P0:
     {
         TCGv_i32 vpr;
-        tmp = loadfn(s, opaque);
+        tmp = loadfn(s, opaque, true);
         vpr = load_cpu_field(v7m.vpr);
         tcg_gen_deposit_i32(vpr, vpr, tmp,
                             R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
     case ARM_VFP_FPSCR:
         tmp = tcg_temp_new_i32();
         gen_helper_vfp_get_fpscr(tmp, cpu_env);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         break;
     case ARM_VFP_FPSCR_NZCVQC:
         tmp = tcg_temp_new_i32();
         gen_helper_vfp_get_fpscr(tmp, cpu_env);
         tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         break;
     case QEMU_VFP_FPSCR_NZCV:
         /*
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
          */
         tmp = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
         tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         break;
     case ARM_VFP_FPCXT_S:
     {
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
          * Store result before updating FPSCR etc, in case
          * it is a memory write which causes an exception.
          */
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         /*
          * Now we must reset FPSCR from FPDSCR_NS, and clear
          * CONTROL.SFPA; so we'll end the TB here.
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
         /* fpInactive case: reads as FPDSCR_NS */
         TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         lab_end = gen_new_label();
         tcg_gen_br(lab_end);
 
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         tcg_gen_or_i32(tmp, tmp, sfpa);
         tcg_temp_free_i32(control);
         /* Store result before updating FPSCR, in case it faults */
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         /* If SFPA is zero then set FPSCR from FPDSCR_NS */
         fpdscr = load_cpu_field(v7m.fpdscr[M_REG_NS]);
         zero = tcg_const_i32(0);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
     case ARM_VFP_VPR:
         /* Behaves as NOP if not privileged */
         if (IS_USER(s)) {
+            storefn(s, opaque, NULL, false);
             break;
         }
         tmp = load_cpu_field(v7m.vpr);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         break;
     case ARM_VFP_P0:
         tmp = load_cpu_field(v7m.vpr);
         tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
-        storefn(s, opaque, tmp);
+        storefn(s, opaque, tmp, true);
         break;
     default:
         g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
     return true;
 }
 
-static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
+static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value,
+                             bool do_access)
 {
     arg_VMSR_VMRS *a = opaque;
 
+    if (!do_access) {
+        return;
+    }
+
     if (a->rt == 15) {
         /* Set the 4 flag bits in the CPSR */
         gen_set_nzcv(value);
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_gpr(DisasContext *s, void *opaque, TCGv_i32 value)
     }
 }
 
-static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque)
+static TCGv_i32 gpr_to_fp_sysreg(DisasContext *s, void *opaque, bool do_access)
 {
     arg_VMSR_VMRS *a = opaque;
 
+    if (!do_access) {
+        return NULL;
+    }
     return load_reg(s, a->rt);
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
     }
 }
 
-static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
+static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value,
+                                bool do_access)
 {
     arg_vldr_sysreg *a = opaque;
     uint32_t offset = a->imm;
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
         offset = -offset;
     }
 
+    if (!do_access && !a->w) {
+        return;
+    }
+
     addr = load_reg(s, a->rn);
     if (a->p) {
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
         gen_helper_v8m_stackcheck(cpu_env, addr);
     }
 
-    gen_aa32_st_i32(s, value, addr, get_mem_index(s),
-                    MO_UL | MO_ALIGN | s->be_data);
-    tcg_temp_free_i32(value);
+    if (do_access) {
+        gen_aa32_st_i32(s, value, addr, get_mem_index(s),
+                        MO_UL | MO_ALIGN | s->be_data);
+        tcg_temp_free_i32(value);
+    }
 
     if (a->w) {
         /* writeback */
@@ -XXX,XX +XXX,XX @@ static void fp_sysreg_to_memory(DisasContext *s, void *opaque, TCGv_i32 value)
     }
 }
 
-static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
+static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque,
+                                    bool do_access)
 {
     arg_vldr_sysreg *a = opaque;
     uint32_t offset = a->imm;
     TCGv_i32 addr;
-    TCGv_i32 value = tcg_temp_new_i32();
+    TCGv_i32 value = NULL;
 
     if (!a->a) {
         offset = -offset;
     }
 
+    if (!do_access && !a->w) {
+        return NULL;
+    }
+
     addr = load_reg(s, a->rn);
     if (a->p) {
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 memory_to_fp_sysreg(DisasContext *s, void *opaque)
         gen_helper_v8m_stackcheck(cpu_env, addr);
     }
 
-    gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
-                    MO_UL | MO_ALIGN | s->be_data);
+    if (do_access) {
+        value = tcg_temp_new_i32();
+        gen_aa32_ld_i32(s, value, addr, get_mem_index(s),
+                        MO_UL | MO_ALIGN | s->be_data);
+    }
 
     if (a->w) {
         /* writeback */
-- 
2.20.1

Factor the code in full_vfp_access_check() which updates the
ownership of the FP context and creates a new FP context
out into its own function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-6-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 104 +++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 46 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ void gen_preserve_fp_state(DisasContext *s)
     }
 }
 
+/*
+ * Generate code for M-profile FP context handling: update the
+ * ownership of the FP context, and create a new context if
+ * necessary. This corresponds to the parts of the pseudocode
+ * ExecuteFPCheck() after the inital PreserveFPState() call.
+ */
+static void gen_update_fp_context(DisasContext *s)
+{
+    /* Update ownership of FP context: set FPCCR.S to match current state */
+    if (s->v8m_fpccr_s_wrong) {
+        TCGv_i32 tmp;
+
+        tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
+        if (s->v8m_secure) {
+            tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
+        } else {
+            tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
+        }
+        store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
+        /* Don't need to do this for any further FP insns in this TB */
+        s->v8m_fpccr_s_wrong = false;
+    }
+
+    if (s->v7m_new_fp_ctxt_needed) {
+        /*
+         * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA,
+         * the FPSCR, and VPR.
+         */
+        TCGv_i32 control, fpscr;
+        uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
+
+        fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
+        gen_helper_vfp_set_fpscr(cpu_env, fpscr);
+        tcg_temp_free_i32(fpscr);
+        if (dc_isar_feature(aa32_mve, s)) {
+            TCGv_i32 z32 = tcg_const_i32(0);
+            store_cpu_field(z32, v7m.vpr);
+        }
+
+        /*
+         * We don't need to arrange to end the TB, because the only
+         * parts of FPSCR which we cache in the TB flags are the VECLEN
+         * and VECSTRIDE, and those don't exist for M-profile.
+         */
+
+        if (s->v8m_secure) {
+            bits |= R_V7M_CONTROL_SFPA_MASK;
+        }
+        control = load_cpu_field(v7m.control[M_REG_S]);
+        tcg_gen_ori_i32(control, control, bits);
+        store_cpu_field(control, v7m.control[M_REG_S]);
+        /* Don't need to do this for any further FP insns in this TB */
+        s->v7m_new_fp_ctxt_needed = false;
+    }
+}
+
 /*
  * Check that VFP access is enabled. If it is, do the necessary
  * M-profile lazy-FP handling and then return true.
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
         /* Trigger lazy-state preservation if necessary */
         gen_preserve_fp_state(s);
 
-        /* Update ownership of FP context: set FPCCR.S to match current state */
-        if (s->v8m_fpccr_s_wrong) {
-            TCGv_i32 tmp;
-
-            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
-            if (s->v8m_secure) {
-                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
-            } else {
-                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
-            }
-            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
-            /* Don't need to do this for any further FP insns in this TB */
-            s->v8m_fpccr_s_wrong = false;
-        }
-
-        if (s->v7m_new_fp_ctxt_needed) {
-            /*
-             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA,
-             * the FPSCR, and VPR.
-             */
-            TCGv_i32 control, fpscr;
-            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
-
-            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
-            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-            tcg_temp_free_i32(fpscr);
-            if (dc_isar_feature(aa32_mve, s)) {
-                TCGv_i32 z32 = tcg_const_i32(0);
-                store_cpu_field(z32, v7m.vpr);
-            }
-
-            /*
-             * We don't need to arrange to end the TB, because the only
-             * parts of FPSCR which we cache in the TB flags are the VECLEN
-             * and VECSTRIDE, and those don't exist for M-profile.
-             */
-
-            if (s->v8m_secure) {
-                bits |= R_V7M_CONTROL_SFPA_MASK;
-            }
-            control = load_cpu_field(v7m.control[M_REG_S]);
-            tcg_gen_ori_i32(control, control, bits);
-            store_cpu_field(control, v7m.control[M_REG_S]);
-            /* Don't need to do this for any further FP insns in this TB */
-            s->v7m_new_fp_ctxt_needed = false;
-        }
+        /* Update ownership of FP context and create new FP context if needed */
+        gen_update_fp_context(s);
     }
 
     return true;
-- 
2.20.1

vfp_access_check and its helper routine full_vfp_access_check() has
gradually grown and is now an awkward mix of A-profile only and
M-profile only pieces.  Refactor it into an A-profile only and an
M-profile only version, taking advantage of the fact that now the
only direct call to full_vfp_access_check() is in A-profile-only
code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-7-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 79 +++++++++++++++++++++++---------------
 1 file changed, 48 insertions(+), 31 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static void gen_update_fp_context(DisasContext *s)
 }
 
 /*
- * Check that VFP access is enabled. If it is, do the necessary
- * M-profile lazy-FP handling and then return true.
- * If not, emit code to generate an appropriate exception and
- * return false.
+ * Check that VFP access is enabled, A-profile specific version.
+ *
+ * If VFP is enabled, return true. If not, emit code to generate an
+ * appropriate exception and return false.
  * The ignore_vfp_enabled argument specifies that we should ignore
- * whether VFP is enabled via FPEXC[EN]: this should be true for FMXR/FMRX
+ * whether VFP is enabled via FPEXC.EN: this should be true for FMXR/FMRX
  * accesses to FPSID, FPEXC, MVFR0, MVFR1, MVFR2, and false for all other insns.
  */
-static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+static bool vfp_access_check_a(DisasContext *s, bool ignore_vfp_enabled)
 {
     if (s->fp_excp_el) {
-        if (arm_dc_feature(s, ARM_FEATURE_M)) {
-            /*
-             * M-profile mostly catches the "FPU disabled" case early, in
-             * disas_m_nocp(), but a few insns (eg LCTP, WLSTP, DLSTP)
-             * which do coprocessor-checks are outside the large ranges of
-             * the encoding space handled by the patterns in m-nocp.decode,
-             * and for them we may need to raise NOCP here.
-             */
-            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-                               syn_uncategorized(), s->fp_excp_el);
-        } else {
-            gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                               syn_fp_access_trap(1, 0xe, false),
-                               s->fp_excp_el);
-        }
+        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
         unallocated_encoding(s);
         return false;
     }
+    return true;
+}
 
-    if (arm_dc_feature(s, ARM_FEATURE_M)) {
-        /* Handle M-profile lazy FP state mechanics */
-
-        /* Trigger lazy-state preservation if necessary */
-        gen_preserve_fp_state(s);
-
-        /* Update ownership of FP context and create new FP context if needed */
-        gen_update_fp_context(s);
+/*
+ * Check that VFP access is enabled, M-profile specific version.
+ *
+ * If VFP is enabled, do the necessary M-profile lazy-FP handling and then
+ * return true. If not, emit code to generate an appropriate exception and
+ * return false.
+ */
+static bool vfp_access_check_m(DisasContext *s)
+{
+    if (s->fp_excp_el) {
+        /*
+         * M-profile mostly catches the "FPU disabled" case early, in
+         * disas_m_nocp(), but a few insns (eg LCTP, WLSTP, DLSTP)
+         * which do coprocessor-checks are outside the large ranges of
+         * the encoding space handled by the patterns in m-nocp.decode,
+         * and for them we may need to raise NOCP here.
+         */
+        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                           syn_uncategorized(), s->fp_excp_el);
+        return false;
     }
 
+    /* Handle M-profile lazy FP state mechanics */
+
+    /* Trigger lazy-state preservation if necessary */
+    gen_preserve_fp_state(s);
+
+    /* Update ownership of FP context and create new FP context if needed */
+    gen_update_fp_context(s);
+
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
  */
 bool vfp_access_check(DisasContext *s)
 {
-    return full_vfp_access_check(s, false);
+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
+        return vfp_access_check_m(s);
+    } else {
+        return vfp_access_check_a(s, false);
+    }
 }
 
 static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
         return false;
     }
 
-    if (!full_vfp_access_check(s, ignore_vfp_enabled)) {
+    /*
+     * Call vfp_access_check_a() directly, because we need to tell
+     * it to ignore FPEXC.EN for some register accesses.
+     */
+    if (!vfp_access_check_a(s, ignore_vfp_enabled)) {
         return true;
     }
 
-- 
2.20.1

Instead of open-coding the "take NOCP exception if FPU disabled,
otherwise call gen_preserve_fp_state()" code in the accessors for
FPCXT_NS, add an argument to vfp_access_check_m() which tells it to
skip the gen_update_fp_context() call, so we can use it for the
FPCXT_NS case.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-8-peter.maydell@linaro.org
---
 target/arm/translate-a32.h    |  2 +-
 target/arm/translate-m-nocp.c | 10 ++--------
 target/arm/translate-vfp.c    | 13 ++++++++-----
 3 files changed, 11 insertions(+), 14 deletions(-)

Implement the forms of the MVE VLDR and VSTR insns which perform
non-widening loads of bytes, halfwords or words from memory into
vector elements of the same width (encodings T5, T6, T7).

(At the moment we know for MVE and M-profile in general that
vfp_access_check() can never return false, but we include the
conventional return-true-on-failure check for consistency
with non-M-profile translation code.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-2-peter.maydell@linaro.org
---
 target/arm/{translate-mve.c => helper-mve.h} |  19 +-
 target/arm/helper.h                          |   2 +
 target/arm/internals.h                       |  11 ++
 target/arm/mve.decode                        |  22 +++
 target/arm/mve_helper.c                      | 172 +++++++++++++++++++
 target/arm/translate-mve.c                   | 119 +++++++++++++
 target/arm/meson.build                       |   1 +
 7 files changed, 334 insertions(+), 12 deletions(-)
 copy target/arm/{translate-mve.c => helper-mve.h} (61%)
 create mode 100644 target/arm/mve_helper.c

diff --git a/target/arm/translate-mve.c b/target/arm/helper-mve.h
similarity index 61%
copy from target/arm/translate-mve.c
copy to target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@
 /*
- *  ARM translation: M-profile MVE instructions
+ *  M-profile MVE specific helper definitions
  *
  *  Copyright (c) 2021 Linaro, Ltd.
  *
@@ -XXX,XX +XXX,XX @@
  * You should have received a copy of the GNU Lesser General Public
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
-
-#include "qemu/osdep.h"
-#include "tcg/tcg-op.h"
-#include "tcg/tcg-op-gvec.h"
-#include "exec/exec-all.h"
-#include "exec/gen-icount.h"
-#include "translate.h"
-#include "translate-a32.h"
-
-/* Include the generated decoder */
-#include "decode-mve.c.inc"
+DEF_HELPER_FLAGS_3(mve_vldrb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vldrh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vldrw, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrw, TCG_CALL_NO_WG, void, env, ptr, i32)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
 #include "helper-a64.h"
 #include "helper-sve.h"
 #endif
+
+#include "helper-mve.h"
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t useronly_maybe_clean_ptr(uint32_t desc, uint64_t ptr)
     return ptr;
 }
 
+/* Values for M-profile PSR.ECI for MVE insns */
+enum MVEECIState {
+    ECI_NONE = 0, /* No completed beats */
+    ECI_A0 = 1, /* Completed: A0 */
+    ECI_A0A1 = 2, /* Completed: A0, A1 */
+    /* 3 is reserved */
+    ECI_A0A1A2 = 4, /* Completed: A0, A1, A2 */
+    ECI_A0A1A2B0 = 5, /* Completed: A0, A1, A2, B0 */
+    /* All other values reserved */
+};
+
 #endif
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 #
 # This file is processed by scripts/decodetree.py
 #
+
+%qd 22:1 13:3
+
+&vldr_vstr rn qd imm p a w size l
+
+@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd
+
+# Vector loads and stores
+
+# Non-widening loads/stores (P=0 W=0 is 'related encoding')
+VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111100 .......   @vldr_vstr \
+                 size=0 p=0 w=1
+VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111101 .......   @vldr_vstr \
+                 size=1 p=0 w=1
+VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111110 .......   @vldr_vstr \
+                 size=2 p=0 w=1
+VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111100 .......   @vldr_vstr \
+                 size=0 p=1
+VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
+                 size=1 p=1
+VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
+                 size=2 p=1
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * M-profile MVE Operations
+ *
+ * Copyright (c) 2021 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "internals.h"
+#include "vec_internal.h"
+#include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
+#include "exec/exec-all.h"
+
+static uint16_t mve_element_mask(CPUARMState *env)
+{
+    /*
+     * Return the mask of which elements in the MVE vector should be
+     * updated. This is a combination of multiple things:
+     *  (1) by default, we update every lane in the vector
+     *  (2) VPT predication stores its state in the VPR register;
+     *  (3) low-overhead-branch tail predication will mask out part
+     *      the vector on the final iteration of the loop
+     *  (4) if EPSR.ECI is set then we must execute only some beats
+     *      of the insn
+     * We combine all these into a 16-bit result with the same semantics
+     * as VPR.P0: 0 to mask the lane, 1 if it is active.
+     * 8-bit vector ops will look at all bits of the result;
+     * 16-bit ops will look at bits 0, 2, 4, ...;
+     * 32-bit ops will look at bits 0, 4, 8 and 12.
+     * Compare pseudocode GetCurInstrBeat(), though that only returns
+     * the 4-bit slice of the mask corresponding to a single beat.
+     */
+    uint16_t mask = FIELD_EX32(env->v7m.vpr, V7M_VPR, P0);
+
+    if (!(env->v7m.vpr & R_V7M_VPR_MASK01_MASK)) {
+        mask |= 0xff;
+    }
+    if (!(env->v7m.vpr & R_V7M_VPR_MASK23_MASK)) {
+        mask |= 0xff00;
+    }
+
+    if (env->v7m.ltpsize < 4 &&
+        env->regs[14] <= (1 << (4 - env->v7m.ltpsize))) {
+        /*
+         * Tail predication active, and this is the last loop iteration.
+         * The element size is (1 << ltpsize), and we only want to process
+         * loopcount elements, so we want to retain the least significant
+         * (loopcount * esize) predicate bits and zero out bits above that.
+         */
+        int masklen = env->regs[14] << env->v7m.ltpsize;
+        assert(masklen <= 16);
+        mask &= MAKE_64BIT_MASK(0, masklen);
+    }
+
+    if ((env->condexec_bits & 0xf) == 0) {
+        /*
+         * ECI bits indicate which beats are already executed;
+         * we handle this by effectively predicating them out.
+         */
+        int eci = env->condexec_bits >> 4;
+        switch (eci) {
+        case ECI_NONE:
+            break;
+        case ECI_A0:
+            mask &= 0xfff0;
+            break;
+        case ECI_A0A1:
+            mask &= 0xff00;
+            break;
+        case ECI_A0A1A2:
+        case ECI_A0A1A2B0:
+            mask &= 0xf000;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    return mask;
+}
+
+static void mve_advance_vpt(CPUARMState *env)
+{
+    /* Advance the VPT and ECI state if necessary */
+    uint32_t vpr = env->v7m.vpr;
+    unsigned mask01, mask23;
+
+    if ((env->condexec_bits & 0xf) == 0) {
+        env->condexec_bits = (env->condexec_bits == (ECI_A0A1A2B0 << 4)) ?
+            (ECI_A0 << 4) : (ECI_NONE << 4);
+    }
+
+    if (!(vpr & (R_V7M_VPR_MASK01_MASK | R_V7M_VPR_MASK23_MASK))) {
+        /* VPT not enabled, nothing to do */
+        return;
+    }
+
+    mask01 = FIELD_EX32(vpr, V7M_VPR, MASK01);
+    mask23 = FIELD_EX32(vpr, V7M_VPR, MASK23);
+    if (mask01 > 8) {
+        /* high bit set, but not 0b1000: invert the relevant half of P0 */
+        vpr ^= 0xff;
+    }
+    if (mask23 > 8) {
+        /* high bit set, but not 0b1000: invert the relevant half of P0 */
+        vpr ^= 0xff00;
+    }
+    vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
+    vpr = FIELD_DP32(vpr, V7M_VPR, MASK23, mask23 << 1);
+    env->v7m.vpr = vpr;
+}
+
+
+#define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE)                         \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
+    {                                                                   \
+        TYPE *d = vd;                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned b, e;                                                  \
+        /*                                                              \
+         * R_SXTM allows the dest reg to become UNKNOWN for abandoned   \
+         * beats so we don't care if we update part of the dest and     \
+         * then take an exception.                                      \
+         */                                                             \
+        for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
+            if (mask & (1 << b)) {                                      \
+                d[H##ESIZE(e)] = cpu_##LDTYPE##_data_ra(env, addr, GETPC()); \
+            }                                                           \
+            addr += MSIZE;                                              \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+#define DO_VSTR(OP, MSIZE, STTYPE, ESIZE, TYPE)                         \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
+    {                                                                   \
+        TYPE *d = vd;                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned b, e;                                                  \
+        for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
+            if (mask & (1 << b)) {                                      \
+                cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
+            }                                                           \
+            addr += MSIZE;                                              \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+DO_VLDR(vldrb, 1, ldub, 1, uint8_t)
+DO_VLDR(vldrh, 2, lduw, 2, uint16_t)
+DO_VLDR(vldrw, 4, ldl, 4, uint32_t)
+
+DO_VSTR(vstrb, 1, stb, 1, uint8_t)
+DO_VSTR(vstrh, 2, stw, 2, uint16_t)
+DO_VSTR(vstrw, 4, stl, 4, uint32_t)
+
+#undef DO_VLDR
+#undef DO_VSTR
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
 
 /* Include the generated decoder */
 #include "decode-mve.c.inc"
+
+typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+
+/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
+static inline long mve_qreg_offset(unsigned reg)
+{
+    return offsetof(CPUARMState, vfp.zregs[reg].d[0]);
+}
+
+static TCGv_ptr mve_qreg_ptr(unsigned reg)
+{
+    TCGv_ptr ret = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(ret, cpu_env, mve_qreg_offset(reg));
+    return ret;
+}
+
+static bool mve_check_qreg_bank(DisasContext *s, int qmask)
+{
+    /*
+     * Check whether Qregs are in range. For v8.1M only Q0..Q7
+     * are supported, see VFPSmallRegisterBank().
+     */
+    return qmask < 8;
+}
+
+static bool mve_eci_check(DisasContext *s)
+{
+    /*
+     * This is a beatwise insn: check that ECI is valid (not a
+     * reserved value) and note that we are handling it.
+     * Return true if OK, false if we generated an exception.
+     */
+    s->eci_handled = true;
+    switch (s->eci) {
+    case ECI_NONE:
+    case ECI_A0:
+    case ECI_A0A1:
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        return true;
+    default:
+        /* Reserved value: INVSTATE UsageFault */
+        gen_exception_insn(s, s->pc_curr, EXCP_INVSTATE, syn_uncategorized(),
+                           default_exception_el(s));
+        return false;
+    }
+}
+
+static void mve_update_eci(DisasContext *s)
+{
+    /*
+     * The helper function will always update the CPUState field,
+     * so we only need to update the DisasContext field.
+     */
+    if (s->eci) {
+        s->eci = (s->eci == ECI_A0A1A2B0) ? ECI_A0 : ECI_NONE;
+    }
+}
+
+static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
+{
+    TCGv_i32 addr;
+    uint32_t offset;
+    TCGv_ptr qreg;
+
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !mve_check_qreg_bank(s, a->qd) ||
+        !fn) {
+        return false;
+    }
+
+    /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+    if (a->rn == 15 || (a->rn == 13 && a->w)) {
+        return false;
+    }
+
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    offset = a->imm << a->size;
+    if (!a->a) {
+        offset = -offset;
+    }
+    addr = load_reg(s, a->rn);
+    if (a->p) {
+        tcg_gen_addi_i32(addr, addr, offset);
+    }
+
+    qreg = mve_qreg_ptr(a->qd);
+    fn(cpu_env, qreg, addr);
+    tcg_temp_free_ptr(qreg);
+
+    /*
+     * Writeback always happens after the last beat of the insn,
+     * regardless of predication
+     */
+    if (a->w) {
+        if (!a->p) {
+            tcg_gen_addi_i32(addr, addr, offset);
+        }
+        store_reg(s, a->rn, addr);
+    } else {
+        tcg_temp_free_i32(addr);
+    }
+    mve_update_eci(s);
+    return true;
+}
+
+static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
+{
+    static MVEGenLdStFn * const ldstfns[4][2] = {
+        { gen_helper_mve_vstrb, gen_helper_mve_vldrb },
+        { gen_helper_mve_vstrh, gen_helper_mve_vldrh },
+        { gen_helper_mve_vstrw, gen_helper_mve_vldrw },
+        { NULL, NULL }
+    };
+    return do_ldst(s, a, ldstfns[a->size][a->l]);
+}
diff --git a/target/arm/meson.build b/target/arm/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
   'helper.c',
   'iwmmxt_helper.c',
   'm_helper.c',
+  'mve_helper.c',
   'neon_helper.c',
   'op_helper.c',
   'tlb_helper.c',
-- 
2.20.1

Implement the variants of MVE VLDR (encodings T1, T2) which perform
"widening" loads where bytes or halfwords are loaded from memory and
zero or sign-extended into halfword or word length vector elements,
and the narrowing MVE VSTR (encodings T1, T2) where bytes or
halfwords are stored from halfword or word elements.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-3-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 10 ++++++++++
 target/arm/mve.decode      | 25 +++++++++++++++++++++++--
 target/arm/mve_helper.c    | 11 +++++++++++
 target/arm/translate-mve.c | 14 ++++++++++++++
 4 files changed, 58 insertions(+), 2 deletions(-)

Implement the MVE VCLZ insn (and the necessary machinery
for MVE 1-input vector ops).

Note that for non-load instructions predication is always performed
at a byte level granularity regardless of element size (R_ZLSJ),
and so the masking logic here differs from that used in the VLDR
and VSTR helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-4-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  4 ++
 target/arm/mve.decode      |  8 ++++
 target/arm/mve_helper.c    | 82 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 38 ++++++++++++++++++
 4 files changed, 132 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vldrh_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 #
 
 %qd 22:1 13:3
+%qm 5:1 1:3
 
 &vldr_vstr rn qd imm p a w size l u
+&1op qd qm size
 
 @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
 @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
 
+@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
+
 # Vector loads and stores
 
 # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
                  size=1 p=1
 VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
                  size=2 p=1
+
+# Vector miscellaneous
+
+VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
 
 #undef DO_VLDR
 #undef DO_VSTR
+
+/*
+ * The mergemask(D, R, M) macro performs the operation "*D = R" but
+ * storing only the bytes which correspond to 1 bits in M,
+ * leaving other bytes in *D unchanged. We use _Generic
+ * to select the correct implementation based on the type of D.
+ */
+
+static void mergemask_ub(uint8_t *d, uint8_t r, uint16_t mask)
+{
+    if (mask & 1) {
+        *d = r;
+    }
+}
+
+static void mergemask_sb(int8_t *d, int8_t r, uint16_t mask)
+{
+    mergemask_ub((uint8_t *)d, r, mask);
+}
+
+static void mergemask_uh(uint16_t *d, uint16_t r, uint16_t mask)
+{
+    uint16_t bmask = expand_pred_b_data[mask & 3];
+    *d = (*d & ~bmask) | (r & bmask);
+}
+
+static void mergemask_sh(int16_t *d, int16_t r, uint16_t mask)
+{
+    mergemask_uh((uint16_t *)d, r, mask);
+}
+
+static void mergemask_uw(uint32_t *d, uint32_t r, uint16_t mask)
+{
+    uint32_t bmask = expand_pred_b_data[mask & 0xf];
+    *d = (*d & ~bmask) | (r & bmask);
+}
+
+static void mergemask_sw(int32_t *d, int32_t r, uint16_t mask)
+{
+    mergemask_uw((uint32_t *)d, r, mask);
+}
+
+static void mergemask_uq(uint64_t *d, uint64_t r, uint16_t mask)
+{
+    uint64_t bmask = expand_pred_b_data[mask & 0xff];
+    *d = (*d & ~bmask) | (r & bmask);
+}
+
+static void mergemask_sq(int64_t *d, int64_t r, uint16_t mask)
+{
+    mergemask_uq((uint64_t *)d, r, mask);
+}
+
+#define mergemask(D, R, M)                      \
+    _Generic(D,                                 \
+             uint8_t *: mergemask_ub,           \
+             int8_t *:  mergemask_sb,           \
+             uint16_t *: mergemask_uh,          \
+             int16_t *:  mergemask_sh,          \
+             uint32_t *: mergemask_uw,          \
+             int32_t *:  mergemask_sw,          \
+             uint64_t *: mergemask_uq,          \
+             int64_t *:  mergemask_sq)(D, R, M)
+
+#define DO_1OP(OP, ESIZE, TYPE, FN)                                     \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
+    {                                                                   \
+        TYPE *d = vd, *m = vm;                                          \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            mergemask(&d[H##ESIZE(e)], FN(m[H##ESIZE(e)]), mask);       \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+#define DO_CLZ_B(N)   (clz32(N) - 24)
+#define DO_CLZ_H(N)   (clz32(N) - 16)
+
+DO_1OP(vclzb, 1, uint8_t, DO_CLZ_B)
+DO_1OP(vclzh, 2, uint16_t, DO_CLZ_H)
+DO_1OP(vclzw, 4, uint32_t, clz32)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
 #include "decode-mve.c.inc"
 
 typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 
 /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
 static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
 DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
 DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
 DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
+
+static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
+{
+    TCGv_ptr qd, qm;
+
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !mve_check_qreg_bank(s, a->qd | a->qm) ||
+        !fn) {
+        return false;
+    }
+
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    qd = mve_qreg_ptr(a->qd);
+    qm = mve_qreg_ptr(a->qm);
+    fn(cpu_env, qd, qm);
+    tcg_temp_free_ptr(qd);
+    tcg_temp_free_ptr(qm);
+    mve_update_eci(s);
+    return true;
+}
+
+#define DO_1OP(INSN, FN)                                        \
+    static bool trans_##INSN(DisasContext *s, arg_1op *a)       \
+    {                                                           \
+        static MVEGenOneOpFn * const fns[] = {                  \
+            gen_helper_mve_##FN##b,                             \
+            gen_helper_mve_##FN##h,                             \
+            gen_helper_mve_##FN##w,                             \
+            NULL,                                               \
+        };                                                      \
+        return do_1op(s, a, fns[a->size]);                      \
+    }
+
+DO_1OP(VCLZ, vclz)
-- 
2.20.1

Implement the MVE VCLS insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-5-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 4 ++++
 target/arm/mve.decode      | 1 +
 target/arm/mve_helper.c    | 7 +++++++
 target/arm/translate-mve.c | 1 +
 4 files changed, 13 insertions(+)

Implement the MVE instructions VREV16, VREV32 and VREV64.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-6-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  4 ++++
 target/arm/mve_helper.c    |  7 +++++++
 target/arm/translate-mve.c | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 51 insertions(+)

Implement the MVE VMVN(register) operation.  Note that for
predication this operation is byte-by-byte.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-7-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 2 ++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 4 ++++
 target/arm/translate-mve.c | 5 +++++
 4 files changed, 14 insertions(+)

Implement the MVE VABS functions (both integer and floating point).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-8-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  6 ++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 13 +++++++++++++
 target/arm/translate-mve.c | 15 +++++++++++++++
 4 files changed, 37 insertions(+)

Implement the MVE VNEG insn (both integer and floating point forms).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-9-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  6 ++++++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    | 12 ++++++++++++
 target/arm/translate-mve.c | 15 +++++++++++++++
 4 files changed, 35 insertions(+)

The Arm MVE VDUP implementation would like to be able to emit code to
duplicate a byte or halfword value into an i32.  We have code to do
this already in tcg-op-gvec.c, so all we need to do is make the
functions global.

For consistency with other functions made available to the frontends:
 * we rename to tcg_gen_dup_*
 * we expose both the _i32 and _i64 forms
 * we provide the #define for a _tl form

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210617121628.20116-10-peter.maydell@linaro.org
---
 include/tcg/tcg-op.h |  8 ++++++++
 include/tcg/tcg.h    |  1 -
 tcg/tcg-op-gvec.c    | 20 ++++++++++----------
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -XXX,XX +XXX,XX @@ void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_umax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_abs_i32(TCGv_i32, TCGv_i32);
 
+/* Replicate a value of size @vece from @in to all the lanes in @out */
+void tcg_gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in);
+
 static inline void tcg_gen_discard_i32(TCGv_i32 arg)
 {
     tcg_gen_op1_i32(INDEX_op_discard, arg);
@@ -XXX,XX +XXX,XX @@ void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_umax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_abs_i64(TCGv_i64, TCGv_i64);
 
+/* Replicate a value of size @vece from @in to all the lanes in @out */
+void tcg_gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in);
+
 #if TCG_TARGET_REG_BITS == 64
 static inline void tcg_gen_discard_i64(TCGv_i64 arg)
 {
@@ -XXX,XX +XXX,XX @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
 #define tcg_gen_atomic_smax_fetch_tl tcg_gen_atomic_smax_fetch_i64
 #define tcg_gen_atomic_umax_fetch_tl tcg_gen_atomic_umax_fetch_i64
 #define tcg_gen_dup_tl_vec  tcg_gen_dup_i64_vec
+#define tcg_gen_dup_tl tcg_gen_dup_i64
 #else
 #define tcg_gen_movi_tl tcg_gen_movi_i32
 #define tcg_gen_mov_tl tcg_gen_mov_i32
@@ -XXX,XX +XXX,XX @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
 #define tcg_gen_atomic_smax_fetch_tl tcg_gen_atomic_smax_fetch_i32
 #define tcg_gen_atomic_umax_fetch_tl tcg_gen_atomic_umax_fetch_i32
 #define tcg_gen_dup_tl_vec  tcg_gen_dup_i32_vec
+#define tcg_gen_dup_tl tcg_gen_dup_i32
 #endif
 
 #if UINTPTR_MAX == UINT32_MAX
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -XXX,XX +XXX,XX @@ uint64_t dup_const(unsigned vece, uint64_t c);
         : (qemu_build_not_reached_always(), 0))                    \
      : dup_const(VECE, C))
 
-
 /*
  * Memory helpers that will be used by TCG generated code.
  */
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -XXX,XX +XXX,XX @@ uint64_t (dup_const)(unsigned vece, uint64_t c)
 }
 
 /* Duplicate IN into OUT as per VECE.  */
-static void gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
+void tcg_gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
 {
     switch (vece) {
     case MO_8:
@@ -XXX,XX +XXX,XX @@ static void gen_dup_i32(unsigned vece, TCGv_i32 out, TCGv_i32 in)
     }
 }
 
-static void gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in)
+void tcg_gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in)
 {
     switch (vece) {
     case MO_8:
@@ -XXX,XX +XXX,XX @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz,
                 && (vece != MO_32 || !check_size_impl(oprsz, 4))) {
                 t_64 = tcg_temp_new_i64();
                 tcg_gen_extu_i32_i64(t_64, in_32);
-                gen_dup_i64(vece, t_64, t_64);
+                tcg_gen_dup_i64(vece, t_64, t_64);
             } else {
                 t_32 = tcg_temp_new_i32();
-                gen_dup_i32(vece, t_32, in_32);
+                tcg_gen_dup_i32(vece, t_32, in_32);
             }
         } else if (in_64) {
             /* We are given a 64-bit variable input.  */
             t_64 = tcg_temp_new_i64();
-            gen_dup_i64(vece, t_64, in_64);
+            tcg_gen_dup_i64(vece, t_64, in_64);
         } else {
             /* We are given a constant input.  */
             /* For 64-bit hosts, use 64-bit constants for "simple" constants
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_2s(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
     } else if (g->fni8 && check_size_impl(oprsz, 8)) {
         TCGv_i64 t64 = tcg_temp_new_i64();
 
-        gen_dup_i64(g->vece, t64, c);
+        tcg_gen_dup_i64(g->vece, t64, c);
         expand_2s_i64(dofs, aofs, oprsz, t64, g->scalar_first, g->fni8);
         tcg_temp_free_i64(t64);
     } else if (g->fni4 && check_size_impl(oprsz, 4)) {
         TCGv_i32 t32 = tcg_temp_new_i32();
 
         tcg_gen_extrl_i64_i32(t32, c);
-        gen_dup_i32(g->vece, t32, t32);
+        tcg_gen_dup_i32(g->vece, t32, t32);
         expand_2s_i32(dofs, aofs, oprsz, t32, g->scalar_first, g->fni4);
         tcg_temp_free_i32(t32);
     } else {
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_ands(unsigned vece, uint32_t dofs, uint32_t aofs,
                        TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
 {
     TCGv_i64 tmp = tcg_temp_new_i64();
-    gen_dup_i64(vece, tmp, c);
+    tcg_gen_dup_i64(vece, tmp, c);
     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_ands);
     tcg_temp_free_i64(tmp);
 }
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_xors(unsigned vece, uint32_t dofs, uint32_t aofs,
                        TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
 {
     TCGv_i64 tmp = tcg_temp_new_i64();
-    gen_dup_i64(vece, tmp, c);
+    tcg_gen_dup_i64(vece, tmp, c);
     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_xors);
     tcg_temp_free_i64(tmp);
 }
@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_ors(unsigned vece, uint32_t dofs, uint32_t aofs,
                       TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
 {
     TCGv_i64 tmp = tcg_temp_new_i64();
-    gen_dup_i64(vece, tmp, c);
+    tcg_gen_dup_i64(vece, tmp, c);
     tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &gop_ors);
     tcg_temp_free_i64(tmp);
 }
-- 
2.20.1

Implement the MVE VDUP insn, which duplicates a value from
a general-purpose register into every lane of a vector
register (subject to predication).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-11-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  2 ++
 target/arm/mve.decode      | 10 ++++++++++
 target/arm/mve_helper.c    | 16 ++++++++++++++++
 target/arm/translate-mve.c | 27 +++++++++++++++++++++++++++
 4 files changed, 55 insertions(+)

Implement the MVE vector logical operations operating
on two registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-12-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  6 ++++++
 target/arm/mve.decode      |  9 +++++++++
 target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
 target/arm/translate-mve.c | 37 +++++++++++++++++++++++++++++++++++++
 4 files changed, 78 insertions(+)

Implement the MVE VADD, VSUB and VMUL insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-13-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 12 ++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 14 ++++++++++++++
 target/arm/translate-mve.c | 16 ++++++++++++++++
 4 files changed, 47 insertions(+)

Implement the MVE VMULH insn, which performs a vector
multiply and returns the high half of the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-14-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 38 insertions(+)

Implement the MVE VRMULH insn, which performs a rounding multiply
and then returns the high half.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-15-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 22 ++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 34 insertions(+)

Implement the MVE VMAX and VMIN insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-16-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 14 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 14 ++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 37 insertions(+)

Implement the MVE VABD insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-17-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 7 +++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 5 +++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 17 insertions(+)

Implement MVE VHADD and VHSUB insns, which perform an addition
or subtraction and then halve the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-18-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 14 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 48 insertions(+)

Implement the MVE VMULL insn, which multiplies two single
width integer elements to produce a double width result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-19-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 14 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 34 ++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 57 insertions(+)

Implement the MVE VMLALDAV insn, which multiplies pairs of integer
elements, accumulating them into a 64-bit result in a pair of
general-purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-20-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++
 target/arm/translate.h     | 10 ++++
 target/arm/mve.decode      | 15 ++++++
 target/arm/mve_helper.c    | 34 ++++++++++++++
 target/arm/translate-mve.c | 96 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 163 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+
+DEF_HELPER_FLAGS_4(mve_vmlaldavuh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline int negate(DisasContext *s, int x)
     return -x;
 }
 
+static inline int plus_1(DisasContext *s, int x)
+{
+    return x + 1;
+}
+
 static inline int plus_2(DisasContext *s, int x)
 {
     return x + 2;
@@ -XXX,XX +XXX,XX @@ static inline int times_4(DisasContext *s, int x)
     return x * 4;
 }
 
+static inline int times_2_plus_1(DisasContext *s, int x)
+{
+    return x * 2 + 1;
+}
+
 static inline int arm_dc_feature(DisasContext *dc, int feature)
 {
     return (dc->features & (1ULL << feature)) != 0;
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
 VDUP             1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
 VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
 VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
+
+# multiply-add long dual accumulate
+# rdahi: bits [3:1] from insn, bit 0 is 1
+# rdalo: bits [3:1] from insn, bit 0 is 0
+%rdahi 20:3 !function=times_2_plus_1
+%rdalo 13:3 !function=times_2
+# size bit is 0 for 16 bit, 1 for 32 bit
+%size_16 16:1 !function=plus_1
+
+&vmlaldav rdahi rdalo size qn qm x a
+
+@vmlaldav        .... .... . ... ... . ... . .... .... qm:3 . \
+                 qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
+VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
+VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_S(vhadds, do_vhadd_s)
 DO_2OP_U(vhaddu, do_vhadd_u)
 DO_2OP_S(vhsubs, do_vhsub_s)
 DO_2OP_U(vhsubu, do_vhsub_u)
+
+
+/*
+ * Multiply add long dual accumulate ops.
+ */
+#define DO_LDAV(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC)                 \
+    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
+                                    void *vm, uint64_t a)               \
+    {                                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        TYPE *n = vn, *m = vm;                                          \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            if (mask & 1) {                                             \
+                if (e & 1) {                                            \
+                    a ODDACC                                            \
+                        (int64_t)n[H##ESIZE(e - 1 * XCHG)] * m[H##ESIZE(e)]; \
+                } else {                                                \
+                    a EVENACC                                           \
+                        (int64_t)n[H##ESIZE(e + 1 * XCHG)] * m[H##ESIZE(e)]; \
+                }                                                       \
+            }                                                           \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+        return a;                                                       \
+    }
+
+DO_LDAV(vmlaldavsh, 2, int16_t, false, +=, +=)
+DO_LDAV(vmlaldavxsh, 2, int16_t, true, +=, +=)
+DO_LDAV(vmlaldavsw, 4, int32_t, false, +=, +=)
+DO_LDAV(vmlaldavxsw, 4, int32_t, true, +=, +=)
+
+DO_LDAV(vmlaldavuh, 2, uint16_t, false, +=, +=)
+DO_LDAV(vmlaldavuw, 4, uint32_t, false, +=, +=)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
 typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
+typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
 
 /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
 static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
     }
 }
 
+static bool mve_skip_first_beat(DisasContext *s)
+{
+    /* Return true if PSR.ECI says we must skip the first beat of this insn */
+    switch (s->eci) {
+    case ECI_NONE:
+        return false;
+    case ECI_A0:
+    case ECI_A0A1:
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        return true;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
 {
     TCGv_i32 addr;
@@ -XXX,XX +XXX,XX @@ DO_2OP(VMULL_BS, vmullbs)
 DO_2OP(VMULL_BU, vmullbu)
 DO_2OP(VMULL_TS, vmullts)
 DO_2OP(VMULL_TU, vmulltu)
+
+static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
+                             MVEGenDualAccOpFn *fn)
+{
+    TCGv_ptr qn, qm;
+    TCGv_i64 rda;
+    TCGv_i32 rdalo, rdahi;
+
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !mve_check_qreg_bank(s, a->qn | a->qm) ||
+        !fn) {
+        return false;
+    }
+    /*
+     * rdahi == 13 is UNPREDICTABLE; rdahi == 15 is a related
+     * encoding; rdalo always has bit 0 clear so cannot be 13 or 15.
+     */
+    if (a->rdahi == 13 || a->rdahi == 15) {
+        return false;
+    }
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    qn = mve_qreg_ptr(a->qn);
+    qm = mve_qreg_ptr(a->qm);
+
+    /*
+     * This insn is subject to beat-wise execution. Partial execution
+     * of an A=0 (no-accumulate) insn which does not execute the first
+     * beat must start with the current rda value, not 0.
+     */
+    if (a->a || mve_skip_first_beat(s)) {
+        rda = tcg_temp_new_i64();
+        rdalo = load_reg(s, a->rdalo);
+        rdahi = load_reg(s, a->rdahi);
+        tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
+        tcg_temp_free_i32(rdalo);
+        tcg_temp_free_i32(rdahi);
+    } else {
+        rda = tcg_const_i64(0);
+    }
+
+    fn(rda, cpu_env, qn, qm, rda);
+    tcg_temp_free_ptr(qn);
+    tcg_temp_free_ptr(qm);
+
+    rdalo = tcg_temp_new_i32();
+    rdahi = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(rdalo, rda);
+    tcg_gen_extrh_i64_i32(rdahi, rda);
+    store_reg(s, a->rdalo, rdalo);
+    store_reg(s, a->rdahi, rdahi);
+    tcg_temp_free_i64(rda);
+    mve_update_eci(s);
+    return true;
+}
+
+static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
+{
+    static MVEGenDualAccOpFn * const fns[4][2] = {
+        { NULL, NULL },
+        { gen_helper_mve_vmlaldavsh, gen_helper_mve_vmlaldavxsh },
+        { gen_helper_mve_vmlaldavsw, gen_helper_mve_vmlaldavxsw },
+        { NULL, NULL },
+    };
+    return do_long_dual_acc(s, a, fns[a->size][a->x]);
+}
+
+static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
+{
+    static MVEGenDualAccOpFn * const fns[4][2] = {
+        { NULL, NULL },
+        { gen_helper_mve_vmlaldavuh, NULL },
+        { gen_helper_mve_vmlaldavuw, NULL },
+        { NULL, NULL },
+    };
+    return do_long_dual_acc(s, a, fns[a->size][a->x]);
+}
-- 
2.20.1

Implement the MVE insn VMLSLDAV, which multiplies source elements,
alternately adding and subtracting them, and accumulates into a
64-bit result in a pair of general purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-21-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  5 +++++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    |  5 +++++
 target/arm/translate-mve.c | 11 +++++++++++
 4 files changed, 23 insertions(+)

Implement the MVE VRMLALDAVH and VRMLSLDAVH insns, which accumulate
the results of a rounded multiply of pairs of elements into a 72-bit
accumulator, returning the top 64 bits in a pair of general purpose
registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-22-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  7 +++++++
 target/arm/mve_helper.c    | 37 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 24 ++++++++++++++++++++++++
 4 files changed, 76 insertions(+)

Implement the scalar form of the MVE VADD insn. This takes the
scalar operand from a general purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-23-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  4 ++++
 target/arm/mve.decode      |  7 ++++++
 target/arm/mve_helper.c    | 22 +++++++++++++++++++
 target/arm/translate-mve.c | 45 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 78 insertions(+)

Implement the scalar forms of the MVE VSUB and VMUL insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-24-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 2 ++
 target/arm/mve_helper.c    | 2 ++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 14 insertions(+)

Implement the scalar variants of the MVE VHADD and VHSUB insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-25-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 ++++++++++++++++
 target/arm/mve.decode      |  4 ++++
 target/arm/mve_helper.c    |  8 ++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 32 insertions(+)

Implement the MVE VBRSR insn, which reverses a specified
number of bits in each element, setting the rest to zero.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-26-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  4 ++++
 target/arm/mve.decode      |  1 +
 target/arm/mve_helper.c    | 43 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  1 +
 4 files changed, 49 insertions(+)

Implement the MVE VPST insn, which sets the predicate mask
fields in the VPR to the immediate value encoded in the insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-27-peter.maydell@linaro.org
---
 target/arm/mve.decode      |  4 +++
 target/arm/translate-mve.c | 59 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
+
+# Predicate operations
+%mask_22_13      22:1 13:3
+VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
     }
 }
 
+static void mve_update_and_store_eci(DisasContext *s)
+{
+    /*
+     * For insns which don't call a helper function that will call
+     * mve_advance_vpt(), this version updates s->eci and also stores
+     * it out to the CPUState field.
+     */
+    if (s->eci) {
+        mve_update_eci(s);
+        store_cpu_field(tcg_constant_i32(s->eci << 4), condexec_bits);
+    }
+}
+
 static bool mve_skip_first_beat(DisasContext *s)
 {
     /* Return true if PSR.ECI says we must skip the first beat of this insn */
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
     };
     return do_long_dual_acc(s, a, fns[a->x]);
 }
+
+static bool trans_VPST(DisasContext *s, arg_VPST *a)
+{
+    TCGv_i32 vpr;
+
+    /* mask == 0 is a "related encoding" */
+    if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
+        return false;
+    }
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+    /*
+     * Set the VPR mask fields. We take advantage of MASK01 and MASK23
+     * being adjacent fields in the register.
+     *
+     * This insn is not predicated, but it is subject to beat-wise
+     * execution, and the mask is updated on the odd-numbered beats.
+     * So if PSR.ECI says we should skip beat 1, we mustn't update the
+     * 01 mask field.
+     */
+    vpr = load_cpu_field(v7m.vpr);
+    switch (s->eci) {
+    case ECI_NONE:
+    case ECI_A0:
+        /* Update both 01 and 23 fields */
+        tcg_gen_deposit_i32(vpr, vpr,
+                            tcg_constant_i32(a->mask | (a->mask << 4)),
+                            R_V7M_VPR_MASK01_SHIFT,
+                            R_V7M_VPR_MASK01_LENGTH + R_V7M_VPR_MASK23_LENGTH);
+        break;
+    case ECI_A0A1:
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        /* Update only the 23 mask field */
+        tcg_gen_deposit_i32(vpr, vpr,
+                            tcg_constant_i32(a->mask),
+                            R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    store_cpu_field(vpr, v7m.vpr);
+    mve_update_and_store_eci(s);
+    return true;
+}
-- 
2.20.1

Implement the MVE VQADD and VQSUB insns, which perform saturating
addition of a scalar to each element.  Note that individual bytes of
each result element are used or discarded according to the predicate
mask, but FPSCR.QC is only set if the predicate mask for the lowest
byte of the element is set.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-28-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 ++++++++++
 target/arm/mve.decode      |  5 +++
 target/arm/mve_helper.c    | 62 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 +++
 4 files changed, 87 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vhsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vhsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vqadds_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqadds_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqadds_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqaddu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqaddu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqaddu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqsubs_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubs_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubs_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+
+VQADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+VQADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 
 # Predicate operations
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhaddu, do_vhadd_u)
 DO_2OP_S(vhsubs, do_vhsub_s)
 DO_2OP_U(vhsubu, do_vhsub_u)
 
+static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
+{
+    if (val > max) {
+        *s = true;
+        return max;
+    } else if (val < min) {
+        *s = true;
+        return min;
+    }
+    return val;
+}
+
+#define DO_SQADD_B(n, m, s) do_sat_bhw((int64_t)n + m, INT8_MIN, INT8_MAX, s)
+#define DO_SQADD_H(n, m, s) do_sat_bhw((int64_t)n + m, INT16_MIN, INT16_MAX, s)
+#define DO_SQADD_W(n, m, s) do_sat_bhw((int64_t)n + m, INT32_MIN, INT32_MAX, s)
+
+#define DO_UQADD_B(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT8_MAX, s)
+#define DO_UQADD_H(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT16_MAX, s)
+#define DO_UQADD_W(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT32_MAX, s)
+
+#define DO_SQSUB_B(n, m, s) do_sat_bhw((int64_t)n - m, INT8_MIN, INT8_MAX, s)
+#define DO_SQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, INT16_MIN, INT16_MAX, s)
+#define DO_SQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, INT32_MIN, INT32_MAX, s)
+
+#define DO_UQSUB_B(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT8_MAX, s)
+#define DO_UQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT16_MAX, s)
+#define DO_UQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT32_MAX, s)
 
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
@@ -XXX,XX +XXX,XX @@ DO_2OP_U(vhsubu, do_vhsub_u)
         mve_advance_vpt(env);                                           \
     }
 
+#define DO_2OP_SAT_SCALAR(OP, ESIZE, TYPE, FN)                          \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+                                uint32_t rm)                            \
+    {                                                                   \
+        TYPE *d = vd, *n = vn;                                          \
+        TYPE m = rm;                                                    \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        bool qc = false;                                                \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            bool sat = false;                                           \
+            mergemask(&d[H##ESIZE(e)], FN(n[H##ESIZE(e)], m, &sat),     \
+                      mask);                                            \
+            qc |= sat & mask & 1;                                       \
+        }                                                               \
+        if (qc) {                                                       \
+            env->vfp.qc[0] = qc;                                        \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
 /* provide unsigned 2-op scalar helpers for all sizes */
 #define DO_2OP_SCALAR_U(OP, FN)                 \
     DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR_U(vhaddu_scalar, do_vhadd_u)
 DO_2OP_SCALAR_S(vhsubs_scalar, do_vhsub_s)
 DO_2OP_SCALAR_U(vhsubu_scalar, do_vhsub_u)
 
+DO_2OP_SAT_SCALAR(vqaddu_scalarb, 1, uint8_t, DO_UQADD_B)
+DO_2OP_SAT_SCALAR(vqaddu_scalarh, 2, uint16_t, DO_UQADD_H)
+DO_2OP_SAT_SCALAR(vqaddu_scalarw, 4, uint32_t, DO_UQADD_W)
+DO_2OP_SAT_SCALAR(vqadds_scalarb, 1, int8_t, DO_SQADD_B)
+DO_2OP_SAT_SCALAR(vqadds_scalarh, 2, int16_t, DO_SQADD_H)
+DO_2OP_SAT_SCALAR(vqadds_scalarw, 4, int32_t, DO_SQADD_W)
+
+DO_2OP_SAT_SCALAR(vqsubu_scalarb, 1, uint8_t, DO_UQSUB_B)
+DO_2OP_SAT_SCALAR(vqsubu_scalarh, 2, uint16_t, DO_UQSUB_H)
+DO_2OP_SAT_SCALAR(vqsubu_scalarw, 4, uint32_t, DO_UQSUB_W)
+DO_2OP_SAT_SCALAR(vqsubs_scalarb, 1, int8_t, DO_SQSUB_B)
+DO_2OP_SAT_SCALAR(vqsubs_scalarh, 2, int16_t, DO_SQSUB_H)
+DO_2OP_SAT_SCALAR(vqsubs_scalarw, 4, int32_t, DO_SQSUB_W)
+
 static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
 {
     m &= 0xff;
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VHADD_S_scalar, vhadds_scalar)
 DO_2OP_SCALAR(VHADD_U_scalar, vhaddu_scalar)
 DO_2OP_SCALAR(VHSUB_S_scalar, vhsubs_scalar)
 DO_2OP_SCALAR(VHSUB_U_scalar, vhsubu_scalar)
+DO_2OP_SCALAR(VQADD_S_scalar, vqadds_scalar)
+DO_2OP_SCALAR(VQADD_U_scalar, vqaddu_scalar)
+DO_2OP_SCALAR(VQSUB_S_scalar, vqsubs_scalar)
+DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
 DO_2OP_SCALAR(VBRSR, vbrsr)
 
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
-- 
2.20.1

Implement the MVE VQDMULH and VQRDMULH scalar insns, which multiply
elements by the scalar, double, possibly round, take the high half
and saturate.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-29-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 38 insertions(+)

Implement the MVE VQDMULL scalar insn. This multiplies the top or
bottom half of each element by the scalar, doubles and saturates
to a double-width result.

Note that this encoding overlaps with VQADD and VQSUB; it uses
what in VQADD and VQSUB would be the 'size=0b11' encoding.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-30-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  5 +++
 target/arm/mve.decode      | 23 +++++++++++---
 target/arm/mve_helper.c    | 65 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 30 ++++++++++++++++++
 4 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 %qm 5:1 1:3
 %qn 7:1 17:3
 
+# VQDMULL has size in bit 28: 0 for 16 bit, 1 for 32 bit
+%size_28 28:1 !function=plus_1
+
 &vldr_vstr rn qd imm p a w size l u
 &1op qd qm size
 &2op qd qm qn size
@@ -XXX,XX +XXX,XX @@
 @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
 
 @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
+@2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 
 # Vector loads and stores
 
@@ -XXX,XX +XXX,XX @@ VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 
-VQADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
-VQADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
-VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
-VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+{
+  VQADD_S_scalar  1110  1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+  VQADD_U_scalar  1111  1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+  VQDMULLB_scalar 111 . 1110 0 . 11 ... 0 ... 0 1111 . 110 .... @2scalar_nosz \
+                  size=%size_28
+}
+
+{
+  VQSUB_S_scalar  1110  1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+  VQSUB_U_scalar  1111  1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+  VQDMULLT_scalar 111 . 1110 0 . 11 ... 0 ... 1 1111 . 110 .... @2scalar_nosz \
+                  size=%size_28
+}
+
 VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 
 VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 
+
 # Predicate operations
 %mask_22_13      22:1 13:3
 VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
 DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
 DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 
+/*
+ * Long saturating scalar ops. As with DO_2OP_L, TYPE and H are for the
+ * input (smaller) type and LESIZE, LTYPE, LH for the output (long) type.
+ * SATMASK specifies which bits of the predicate mask matter for determining
+ * whether to propagate a saturation indication into FPSCR.QC -- for
+ * the 16x16->32 case we must check only the bit corresponding to the T or B
+ * half that we used, but for the 32x32->64 case we propagate if the mask
+ * bit is set for either half.
+ */
+#define DO_2OP_SAT_SCALAR_L(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN, SATMASK) \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+                                uint32_t rm)                            \
+    {                                                                   \
+        LTYPE *d = vd;                                                  \
+        TYPE *n = vn;                                                   \
+        TYPE m = rm;                                                    \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned le;                                                    \
+        bool qc = false;                                                \
+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
+            bool sat = false;                                           \
+            LTYPE r = FN((LTYPE)n[H##ESIZE(le * 2 + TOP)], m, &sat);    \
+            mergemask(&d[H##LESIZE(le)], r, mask);                      \
+            qc |= sat && (mask & SATMASK);                              \
+        }                                                               \
+        if (qc) {                                                       \
+            env->vfp.qc[0] = qc;                                        \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+static inline int32_t do_qdmullh(int16_t n, int16_t m, bool *sat)
+{
+    int64_t r = ((int64_t)n * m) * 2;
+    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat);
+}
+
+static inline int64_t do_qdmullw(int32_t n, int32_t m, bool *sat)
+{
+    /* The multiply can't overflow, but the doubling might */
+    int64_t r = (int64_t)n * m;
+    if (r > INT64_MAX / 2) {
+        *sat = true;
+        return INT64_MAX;
+    } else if (r < INT64_MIN / 2) {
+        *sat = true;
+        return INT64_MIN;
+    } else {
+        return r * 2;
+    }
+}
+
+#define SATMASK16B 1
+#define SATMASK16T (1 << 2)
+#define SATMASK32 ((1 << 4) | 1)
+
+DO_2OP_SAT_SCALAR_L(vqdmullb_scalarh, 0, 2, int16_t, 4, int32_t, \
+                    do_qdmullh, SATMASK16B)
+DO_2OP_SAT_SCALAR_L(vqdmullb_scalarw, 0, 4, int32_t, 8, int64_t, \
+                    do_qdmullw, SATMASK32)
+DO_2OP_SAT_SCALAR_L(vqdmullt_scalarh, 1, 2, int16_t, 4, int32_t, \
+                    do_qdmullh, SATMASK16T)
+DO_2OP_SAT_SCALAR_L(vqdmullt_scalarw, 1, 4, int32_t, 8, int64_t, \
+                    do_qdmullw, SATMASK32)
+
 static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
 {
     m &= 0xff;
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
 DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
 DO_2OP_SCALAR(VBRSR, vbrsr)
 
+static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
+{
+    static MVEGenTwoOpScalarFn * const fns[] = {
+        NULL,
+        gen_helper_mve_vqdmullb_scalarh,
+        gen_helper_mve_vqdmullb_scalarw,
+        NULL,
+    };
+    if (a->qd == a->qn && a->size == MO_32) {
+        /* UNPREDICTABLE; we choose to undef */
+        return false;
+    }
+    return do_2op_scalar(s, a, fns[a->size]);
+}
+
+static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
+{
+    static MVEGenTwoOpScalarFn * const fns[] = {
+        NULL,
+        gen_helper_mve_vqdmullt_scalarh,
+        gen_helper_mve_vqdmullt_scalarw,
+        NULL,
+    };
+    if (a->qd == a->qn && a->size == MO_32) {
+        /* UNPREDICTABLE; we choose to undef */
+        return false;
+    }
+    return do_2op_scalar(s, a, fns[a->size]);
+}
+
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                              MVEGenDualAccOpFn *fn)
 {
-- 
2.20.1

Implement the vector forms of the MVE VQDMULH and VQRDMULH insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-31-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 27 +++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 40 insertions(+)

Implement the vector forms of the MVE VQADD and VQSUB insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-32-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 ++++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 14 ++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 39 insertions(+)

Implement the MVE VQSHL insn (encoding T4, which is the
vector-shift-by-vector version).

The DO_SQSHL_OP and DO_UQSHL_OP macros here are derived from
the neon_helper.c code for qshl_u{8,16,32} and qshl_s{8,16,32}.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-33-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      | 12 ++++++++++++
 target/arm/mve_helper.c    | 34 ++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 56 insertions(+)

Implement the MV VQRSHL (vector) insn.  Again, the code to perform
the actual shifts is borrowed from neon_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-34-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 6 ++++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 19 insertions(+)

Implement the MVE VSHL insn (vector form).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-35-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 6 ++++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 19 insertions(+)

Implement the MVE VRSHL insn (vector form).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-36-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 4 ++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 17 insertions(+)

Implement the MVE VQDMLADH and VQRDMLADH insns.  These multiply
elements, and then add pairs of products, double, possibly round,
saturate and return the high half of the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-37-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 +++++++
 target/arm/mve.decode      |  5 +++
 target/arm/mve_helper.c    | 89 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++
 4 files changed, 114 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vqdmladhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmladhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 VQRSHL_S         111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
 VQRSHL_U         111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
 
+VQDMLADH         1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
+VQDMLADHX        1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
+VQRDMLADH        1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
+VQRDMLADHX       1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
 DO_2OP_SAT_S(vqrshls, DO_SQRSHL_OP)
 DO_2OP_SAT_U(vqrshlu, DO_UQRSHL_OP)
 
+/*
+ * Multiply add dual returning high half
+ * The 'FN' here takes four inputs A, B, C, D, a 0/1 indicator of
+ * whether to add the rounding constant, and the pointer to the
+ * saturation flag, and should do "(A * B + C * D) * 2 + rounding constant",
+ * saturate to twice the input size and return the high half; or
+ * (A * B - C * D) etc for VQDMLSDH.
+ */
+#define DO_VQDMLADH_OP(OP, ESIZE, TYPE, XCHG, ROUND, FN)                \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+                                void *vm)                               \
+    {                                                                   \
+        TYPE *d = vd, *n = vn, *m = vm;                                 \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        bool qc = false;                                                \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            bool sat = false;                                           \
+            if ((e & 1) == XCHG) {                                      \
+                TYPE r = FN(n[H##ESIZE(e)],                             \
+                            m[H##ESIZE(e - XCHG)],                      \
+                            n[H##ESIZE(e + (1 - 2 * XCHG))],            \
+                            m[H##ESIZE(e + (1 - XCHG))],                \
+                            ROUND, &sat);                               \
+                mergemask(&d[H##ESIZE(e)], r, mask);                    \
+                qc |= sat & mask & 1;                                   \
+            }                                                           \
+        }                                                               \
+        if (qc) {                                                       \
+            env->vfp.qc[0] = qc;                                        \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+static int8_t do_vqdmladh_b(int8_t a, int8_t b, int8_t c, int8_t d,
+                            int round, bool *sat)
+{
+    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 7);
+    return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
+}
+
+static int16_t do_vqdmladh_h(int16_t a, int16_t b, int16_t c, int16_t d,
+                             int round, bool *sat)
+{
+    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 15);
+    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
+}
+
+static int32_t do_vqdmladh_w(int32_t a, int32_t b, int32_t c, int32_t d,
+                             int round, bool *sat)
+{
+    int64_t m1 = (int64_t)a * b;
+    int64_t m2 = (int64_t)c * d;
+    int64_t r;
+    /*
+     * Architecturally we should do the entire add, double, round
+     * and then check for saturation. We do three saturating adds,
+     * but we need to be careful about the order. If the first
+     * m1 + m2 saturates then it's impossible for the *2+rc to
+     * bring it back into the non-saturated range. However, if
+     * m1 + m2 is negative then it's possible that doing the doubling
+     * would take the intermediate result below INT64_MAX and the
+     * addition of the rounding constant then brings it back in range.
+     * So we add half the rounding constant before doubling rather
+     * than adding the rounding constant after the doubling.
+     */
+    if (sadd64_overflow(m1, m2, &r) ||
+        sadd64_overflow(r, (round << 30), &r) ||
+        sadd64_overflow(r, r, &r)) {
+        *sat = true;
+        return r < 0 ? INT32_MAX : INT32_MIN;
+    }
+    return r >> 32;
+}
+
+DO_VQDMLADH_OP(vqdmladhb, 1, int8_t, 0, 0, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqdmladhh, 2, int16_t, 0, 0, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqdmladhw, 4, int32_t, 0, 0, do_vqdmladh_w)
+DO_VQDMLADH_OP(vqdmladhxb, 1, int8_t, 1, 0, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqdmladhxh, 2, int16_t, 1, 0, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqdmladhxw, 4, int32_t, 1, 0, do_vqdmladh_w)
+
+DO_VQDMLADH_OP(vqrdmladhb, 1, int8_t, 0, 1, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqrdmladhh, 2, int16_t, 0, 1, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqrdmladhw, 4, int32_t, 0, 1, do_vqdmladh_w)
+DO_VQDMLADH_OP(vqrdmladhxb, 1, int8_t, 1, 1, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqrdmladhxh, 2, int16_t, 1, 1, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqrdmladhxw, 4, int32_t, 1, 1, do_vqdmladh_w)
+
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, FN)                              \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                 uint32_t rm)                            \
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP(VQSHL_S, vqshls)
 DO_2OP(VQSHL_U, vqshlu)
 DO_2OP(VQRSHL_S, vqrshls)
 DO_2OP(VQRSHL_U, vqrshlu)
+DO_2OP(VQDMLADH, vqdmladh)
+DO_2OP(VQDMLADHX, vqdmladhx)
+DO_2OP(VQRDMLADH, vqrdmladh)
+DO_2OP(VQRDMLADHX, vqrdmladhx)
 
 static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                           MVEGenTwoOpScalarFn fn)
-- 
2.20.1

Implement the MVE VQDMLSDH and VQRDMLSDH insns, which are
like VQDMLADH and VQRDMLADH except that products are subtracted
rather than added.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-38-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 44 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 69 insertions(+)

Implement the vector form of the MVE VQDMULL insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-39-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  5 +++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 30 ++++++++++++++++++++++++++++++
 4 files changed, 70 insertions(+)

Implement the MVE VRHADD insn, which performs a rounded halving
addition.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-40-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 6 ++++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 19 insertions(+)

Implement the MVE VADC and VSBC insns.  These perform an
add-with-carry or subtract-with-carry of the 32-bit elements in each
lane of the input vectors, where the carry-out of each add is the
carry-in of the next.  The initial carry input is either 1 or is from
FPSCR.C; the carry out at the end is written back to FPSCR.C.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-41-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  5 ++++
 target/arm/mve.decode      |  5 ++++
 target/arm/mve_helper.c    | 52 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 37 +++++++++++++++++++++++++++
 4 files changed, 99 insertions(+)

Implement the MVE VCADD insn, which performs a complex add with
rotate.  Note that the size=0b11 encoding is VSBC.

The architecture grants some leeway for the "destination and Vm
source overlap" case for the size MO_32 case, but we choose not to
make use of it, instead always calculating all 16 bytes worth of
results before setting the destination register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-42-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  9 +++++++--
 target/arm/mve_helper.c    | 29 +++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  7 +++++++
 4 files changed, 51 insertions(+), 2 deletions(-)

Implement the MVE VHCADD insn, which is similar to VCADD
but performs a halving step. This one overlaps with VADC.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-43-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 8 ++++++--
 target/arm/mve_helper.c    | 2 ++
 target/arm/translate-mve.c | 4 +++-
 4 files changed, 19 insertions(+), 3 deletions(-)

Implement the MVE VADDV insn, which performs an addition
across vector lanes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-44-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    | 24 +++++++++++++++++++++
 target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 76 insertions(+)

In a CPU with MVE, the VMOV (vector lane to general-purpose register)
and VMOV (general-purpose register to vector lane) insns are not
predicated, but they are subject to beatwise execution if they
are not in an IT block.

Since our implementation always executes all 4 beats in one tick,
this means only that we need to handle PSR.ECI:
 * we must do the usual check for bad ECI state
 * we must advance ECI state if the insn succeeds
 * if ECI says we should not be executing the beat corresponding
   to the lane of the vector register being accessed then we
   should skip performing the move

Note that if PSR.ECI is non-zero then we cannot be in an IT block.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-45-peter.maydell@linaro.org
---
 target/arm/translate-a32.h |  2 +
 target/arm/translate-mve.c |  4 +-
 target/arm/translate-vfp.c | 77 +++++++++++++++++++++++++++++++++++---
 3 files changed, 75 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@ long neon_full_reg_offset(unsigned reg);
 long neon_element_offset(int reg, int element, MemOp memop);
 void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
 void clear_eci_state(DisasContext *s);
+bool mve_eci_check(DisasContext *s);
+void mve_update_and_store_eci(DisasContext *s);
 
 static inline TCGv_i32 load_cpu_offset(int offset)
 {
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool mve_check_qreg_bank(DisasContext *s, int qmask)
     return qmask < 8;
 }
 
-static bool mve_eci_check(DisasContext *s)
+bool mve_eci_check(DisasContext *s)
 {
     /*
      * This is a beatwise insn: check that ECI is valid (not a
@@ -XXX,XX +XXX,XX @@ static void mve_update_eci(DisasContext *s)
     }
 }
 
-static void mve_update_and_store_eci(DisasContext *s)
+void mve_update_and_store_eci(DisasContext *s)
 {
     /*
      * For insns which don't call a helper function that will call
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
     return true;
 }
 
+static bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
+{
+    /*
+     * In a CPU with MVE, the VMOV (vector lane to general-purpose register)
+     * and VMOV (general-purpose register to vector lane) insns are not
+     * predicated, but they are subject to beatwise execution if they are
+     * not in an IT block.
+     *
+     * Since our implementation always executes all 4 beats in one tick,
+     * this means only that if PSR.ECI says we should not be executing
+     * the beat corresponding to the lane of the vector register being
+     * accessed then we should skip performing the move, and that we need
+     * to do the usual check for bad ECI state and advance of ECI state.
+     *
+     * Note that if PSR.ECI is non-zero then we cannot be in an IT block.
+     *
+     * Return true if this VMOV scalar <-> gpreg should be skipped because
+     * the MVE PSR.ECI state says we skip the beat where the store happens.
+     */
+
+    /* Calculate the byte offset into Qn which we're going to access */
+    int ofs = (index << size) + ((vn & 1) * 8);
+
+    if (!dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+
+    switch (s->eci) {
+    case ECI_NONE:
+        return false;
+    case ECI_A0:
+        return ofs < 4;
+    case ECI_A0A1:
+        return ofs < 8;
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        return ofs < 12;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 {
     /* VMOV scalar to general purpose register */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
         return false;
     }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        if (!mve_eci_check(s)) {
+            return true;
+        }
+    }
+
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = tcg_temp_new_i32();
-    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
-    store_reg(s, a->rt, tmp);
+    if (!mve_skip_vmov(s, a->vn, a->index, a->size)) {
+        tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, a->index,
+                            a->size | (a->u ? 0 : MO_SIGN));
+        store_reg(s, a->rt, tmp);
+    }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        mve_update_and_store_eci(s);
+    }
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
         return false;
     }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        if (!mve_eci_check(s)) {
+            return true;
+        }
+    }
+
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = load_reg(s, a->rt);
-    write_neon_element32(tmp, a->vn, a->index, a->size);
-    tcg_temp_free_i32(tmp);
+    if (!mve_skip_vmov(s, a->vn, a->index, a->size)) {
+        tmp = load_reg(s, a->rt);
+        write_neon_element32(tmp, a->vn, a->index, a->size);
+        tcg_temp_free_i32(tmp);
+    }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        mve_update_and_store_eci(s);
+    }
     return true;
 }
 
-- 
2.20.1

From: Peter Collingbourne <pcc@google.com>

MTE3 introduces an asymmetric tag checking mode, in which loads are
checked synchronously and stores are checked asynchronously. Add
support for it.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210616195614.11785-1-pcc@google.com
[PMM: Add line to emulation.rst]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/emulation.rst |  1 +
 target/arm/cpu64.c            |  2 +-
 target/arm/mte_helper.c       | 82 ++++++++++++++++++++++-------------
 3 files changed, 53 insertions(+), 32 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_LSE (Large System Extensions)
 - FEAT_MTE (Memory Tagging Extension)
 - FEAT_MTE2 (Memory Tagging Extension)
+- FEAT_MTE3 (MTE Asymmetric Fault Handling)
 - FEAT_PAN (Privileged access never)
 - FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN)
 - FEAT_PAuth (Pointer authentication)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          * during realize if the board provides no tag memory, much like
          * we do for EL2 with the virtualization=on property.
          */
-        t = FIELD_DP64(t, ID_AA64PFR1, MTE, 2);
+        t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3);
         cpu->isar.id_aa64pfr1 = t;
 
         t = cpu->isar.id_aa64mmfr0;
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(stzgm_tags)(CPUARMState *env, uint64_t ptr, uint64_t val)
     }
 }
 
+static void mte_sync_check_fail(CPUARMState *env, uint32_t desc,
+                                uint64_t dirty_ptr, uintptr_t ra)
+{
+    int is_write, syn;
+
+    env->exception.vaddress = dirty_ptr;
+
+    is_write = FIELD_EX32(desc, MTEDESC, WRITE);
+    syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0, is_write,
+                                0x11);
+    raise_exception_ra(env, EXCP_DATA_ABORT, syn, exception_target_el(env), ra);
+    g_assert_not_reached();
+}
+
+static void mte_async_check_fail(CPUARMState *env, uint64_t dirty_ptr,
+                                 uintptr_t ra, ARMMMUIdx arm_mmu_idx, int el)
+{
+    int select;
+
+    if (regime_has_2_ranges(arm_mmu_idx)) {
+        select = extract64(dirty_ptr, 55, 1);
+    } else {
+        select = 0;
+    }
+    env->cp15.tfsr_el[el] |= 1 << select;
+#ifdef CONFIG_USER_ONLY
+    /*
+     * Stand in for a timer irq, setting _TIF_MTE_ASYNC_FAULT,
+     * which then sends a SIGSEGV when the thread is next scheduled.
+     * This cpu will return to the main loop at the end of the TB,
+     * which is rather sooner than "normal".  But the alternative
+     * is waiting until the next syscall.
+     */
+    qemu_cpu_kick(env_cpu(env));
+#endif
+}
+
 /* Record a tag check failure.  */
 static void mte_check_fail(CPUARMState *env, uint32_t desc,
                            uint64_t dirty_ptr, uintptr_t ra)
 {
     int mmu_idx = FIELD_EX32(desc, MTEDESC, MIDX);
     ARMMMUIdx arm_mmu_idx = core_to_aa64_mmu_idx(mmu_idx);
-    int el, reg_el, tcf, select, is_write, syn;
+    int el, reg_el, tcf;
     uint64_t sctlr;
 
     reg_el = regime_el(env, arm_mmu_idx);
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
     switch (tcf) {
     case 1:
         /* Tag check fail causes a synchronous exception. */
-        env->exception.vaddress = dirty_ptr;
-
-        is_write = FIELD_EX32(desc, MTEDESC, WRITE);
-        syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
-                                    is_write, 0x11);
-        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
-                           exception_target_el(env), ra);
-        /* noreturn, but fall through to the assert anyway */
+        mte_sync_check_fail(env, desc, dirty_ptr, ra);
+        break;
 
     case 0:
         /*
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
 
     case 2:
         /* Tag check fail causes asynchronous flag set.  */
-        if (regime_has_2_ranges(arm_mmu_idx)) {
-            select = extract64(dirty_ptr, 55, 1);
-        } else {
-            select = 0;
-        }
-        env->cp15.tfsr_el[el] |= 1 << select;
-#ifdef CONFIG_USER_ONLY
-        /*
-         * Stand in for a timer irq, setting _TIF_MTE_ASYNC_FAULT,
-         * which then sends a SIGSEGV when the thread is next scheduled.
-         * This cpu will return to the main loop at the end of the TB,
-         * which is rather sooner than "normal".  But the alternative
-         * is waiting until the next syscall.
-         */
-        qemu_cpu_kick(env_cpu(env));
-#endif
+        mte_async_check_fail(env, dirty_ptr, ra, arm_mmu_idx, el);
         break;
 
-    default:
-        /* Case 3: Reserved. */
-        qemu_log_mask(LOG_GUEST_ERROR,
-                      "Tag check failure with SCTLR_EL%d.TCF%s "
-                      "set to reserved value %d\n",
-                      reg_el, el ? "" : "0", tcf);
+    case 3:
+        /*
+         * Tag check fail causes asynchronous flag set for stores, or
+         * a synchronous exception for loads.
+         */
+        if (FIELD_EX32(desc, MTEDESC, WRITE)) {
+            mte_async_check_fail(env, dirty_ptr, ra, arm_mmu_idx, el);
+        } else {
+            mte_sync_check_fail(env, desc, dirty_ptr, ra);
+        }
         break;
     }
 }
-- 
2.20.1

From: Alexandre Iooss <erdnaxe@crans.org>

This adds the target guide for BBC Micro:bit.

Information is taken from https://wiki.qemu.org/Features/MicroBit
and from hw/arm/nrf51_soc.c.

Signed-off-by: Alexandre Iooss <erdnaxe@crans.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-id: 20210621075625.540471-1-erdnaxe@crans.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/nrf.rst    | 51 ++++++++++++++++++++++++++++++++++++++
 docs/system/target-arm.rst |  1 +
 MAINTAINERS                |  1 +
 3 files changed, 53 insertions(+)
 create mode 100644 docs/system/arm/nrf.rst

diff --git a/docs/system/arm/nrf.rst b/docs/system/arm/nrf.rst
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/docs/system/arm/nrf.rst
@@ -XXX,XX +XXX,XX @@
+Nordic nRF boards (``microbit``)
+================================
+
+The `Nordic nRF`_ chips are a family of ARM-based System-on-Chip that
+are designed to be used for low-power and short-range wireless solutions.
+
+.. _Nordic nRF: https://www.nordicsemi.com/Products
+
+The nRF51 series is the first series for short range wireless applications.
+It is superseded by the nRF52 series.
+The following machines are based on this chip :
+
+- ``microbit``       BBC micro:bit board with nRF51822 SoC
+
+There are other series such as nRF52, nRF53 and nRF91 which are currently not
+supported by QEMU.
+
+Supported devices
+-----------------
+
+ * ARM Cortex-M0 (ARMv6-M)
+ * Serial ports (UART)
+ * Clock controller
+ * Timers
+ * Random Number Generator (RNG)
+ * GPIO controller
+ * NVMC
+ * SWI
+
+Missing devices
+---------------
+
+ * Watchdog
+ * Real-Time Clock (RTC) controller
+ * TWI (i2c)
+ * SPI controller
+ * Analog to Digital Converter (ADC)
+ * Quadrature decoder
+ * Radio
+
+Boot options
+------------
+
+The Micro:bit machine can be started using the ``-device`` option to load a
+firmware in `ihex format`_. Example:
+
+.. _ihex format: https://en.wikipedia.org/wiki/Intel_HEX
+
+.. code-block:: bash
+
+  $ qemu-system-arm -M microbit -device loader,file=test.hex
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -XXX,XX +XXX,XX @@ undocumented; you can get a complete list by running
    arm/digic
    arm/musicpal
    arm/gumstix
+   arm/nrf
    arm/nseries
    arm/nuvoton
    arm/orangepi
diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/*/microbit*.c
 F: include/hw/*/nrf51*.h
 F: include/hw/*/microbit*.h
 F: tests/qtest/microbit-test.c
+F: docs/system/arm/nrf.rst
 
 AVR Machines
 -------------
-- 
2.20.1