Series comparison

-[PULL 00/28] target-arm queue
+[PULL 00/30] target-arm queue
-The following changes since commit 1ea06abceec61b6f3ab33dadb0510b6e09fb61e2:
+Hi; this is the latest target-arm queue. Most of the patches
 here are RTH's FEAT_HAFDBS finally landing. I've also included
 the RNG-seed randomization patches from Jason, as well as a few
 more minor things. The patches include a couple of regression
 fixes:
  * the resettable patch fixes a SCSI reset regression
  * the 'do not re-randomize on snapshot load' patches fix
    record-and-replay regressions
-  Merge remote-tracking branch 'remotes/berrange-gitlab/tags/misc-fixes-pull-request' into staging (2021-06-14 15:59:13 +0100)
+thanks
 -- PMM
 The following changes since commit e750a7ace492f0b450653d4ad368a77d6f660fb8:
   Merge tag 'pull-9p-20221024' of https://github.com/cschoenebeck/qemu into staging (2022-10-24 14:27:12 -0400)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210615
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20221025
-for you to fetch changes up to c611c956c7fdce651e30687b1f5d19b4cab78b6a:
+for you to fetch changes up to e2114f701c78f76246e4b1872639dad94a6bdd21:
-  include/qemu/int128.h: Add function to create Int128 from int64_t (2021-06-15 16:18:50 +0100)
+  rx: re-randomize rng-seed on reboot (2022-10-25 17:32:24 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * hw/intc/arm_gicv3_cpuif: Tolerate spurious EOIR writes
+ * Implement FEAT_E0PD
- * handle some UNALLOCATED decode cases correctly rather
+ * Implement FEAT_HAFDBS
-   than asserting
+ * honor HCR_E2H and HCR_TGE in arm_excp_unmasked()
- * hw: virt: consider hw_compat_6_0
+ * hw/arm/virt: Fix devicetree warnings about the virtio-iommu node
- * hw/arm: add quanta-gbs-bmc machine
+ * hw/core/resettable: fix reset level counting
- * hw/intc/armv7m_nvic: Remove stale comment
+ * hw/hyperv/hyperv.c: Use device_cold_reset() instead of device_legacy_reset()
- * arm, acpi: Remove dependency on presence of 'virt' board
+ * imx: reload cmp timer outside of the reload ptimer transaction
- * target/arm: Fix mte page crossing test
+ * x86: do not re-randomize RNG seed on snapshot load
- * hw/arm: quanta-q71l add pca954x muxes
+ * m68k/virt: do not re-randomize RNG seed on snapshot load
- * target/arm: First few parts of MVE support
+ * m68k/q800: do not re-randomize RNG seed on snapshot load
  * arm: re-randomize rng-seed on reboot
  * riscv: re-randomize rng-seed on reboot
  * mips/boston: re-randomize rng-seed on reboot
  * openrisc: re-randomize rng-seed on reboot
  * rx: re-randomize rng-seed on reboot
 ----------------------------------------------------------------
-Heinrich Schuchardt (1):
+Ake Koomsin (1):
-      hw: virt: consider hw_compat_6_0
+      target/arm: honor HCR_E2H and HCR_TGE in arm_excp_unmasked()
 Axel Heider (1):
       target/imx: reload cmp timer outside of the reload ptimer transaction
 Damien Hedde (1):
       hw/core/resettable: fix reset level counting
 Jason A. Donenfeld (10):
       reset: allow registering handlers that aren't called by snapshot loading
       device-tree: add re-randomization helper function
       x86: do not re-randomize RNG seed on snapshot load
       arm: re-randomize rng-seed on reboot
       riscv: re-randomize rng-seed on reboot
       m68k/virt: do not re-randomize RNG seed on snapshot load
       m68k/q800: do not re-randomize RNG seed on snapshot load
       mips/boston: re-randomize rng-seed on reboot
       openrisc: re-randomize rng-seed on reboot
       rx: re-randomize rng-seed on reboot
 Jean-Philippe Brucker (1):
-      hw/intc/arm_gicv3_cpuif: Tolerate spurious EOIR writes
+      hw/arm/virt: Fix devicetree warnings about the virtio-iommu node
-Patrick Venture (5):
+Peter Maydell (2):
-      hw/arm: add quanta-gbs-bmc machine
+      target/arm: Implement FEAT_E0PD
-      hw/arm: quanta-gbs-bmc add i2c comments
+      hw/hyperv/hyperv.c: Use device_cold_reset() instead of device_legacy_reset()
       hw/arm: gsj add i2c comments
       hw/arm: gsj add pca9548
       hw/arm: quanta-q71l add pca954x muxes
-Peter Maydell (17):
+Richard Henderson (14):
-      hw/intc/armv7m_nvic: Remove stale comment
+      target/arm: Introduce regime_is_stage2
-      hw/acpi: Provide stub version of acpi_ghes_record_errors()
+      target/arm: Add ptw_idx to S1Translate
-      hw/acpi: Provide function acpi_ghes_present()
+      target/arm: Add isar predicates for FEAT_HAFDBS
-      target/arm: Use acpi_ghes_present() to see if we report ACPI memory errors
+      target/arm: Extract HA and HD in aa64_va_parameters
-      target/arm: Provide and use H8 and H1_8 macros
+      target/arm: Move S1_ptw_translate outside arm_ld[lq]_ptw
-      target/arm: Enable FPSCR.QC bit for MVE
+      target/arm: Add ARMFault_UnsuppAtomicUpdate
-      target/arm: Handle VPR semantics in existing code
+      target/arm: Remove loop from get_phys_addr_lpae
-      target/arm: Add handling for PSR.ECI/ICI
+      target/arm: Fix fault reporting in get_phys_addr_lpae
-      target/arm: Let vfp_access_check() handle late NOCP checks
+      target/arm: Don't shift attrs in get_phys_addr_lpae
-      target/arm: Implement MVE LCTP
+      target/arm: Consider GP an attribute in get_phys_addr_lpae
-      target/arm: Implement MVE WLSTP insn
+      target/arm: Tidy merging of attributes from descriptor and table
-      target/arm: Implement MVE DLSTP
+      target/arm: Implement FEAT_HAFDBS, access flag portion
-      target/arm: Implement MVE LETP insn
+      target/arm: Implement FEAT_HAFDBS, dirty bit portion
-      target/arm: Add framework for MVE decode
+      target/arm: Use the max page size in a 2-stage ptw
       target/arm: Move expand_pred_b() data to vec_helper.c
       bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations
       include/qemu/int128.h: Add function to create Int128 from int64_t
-Richard Henderson (4):
+ docs/devel/reset.rst          |   8 +-
-      target/arm: Diagnose UNALLOCATED in disas_simd_two_reg_misc_fp16
+ docs/system/arm/emulation.rst |   2 +
-      target/arm: Remove fprintf from disas_simd_mod_imm
+ qapi/run-state.json           |   6 +-
-      target/arm: Diagnose UNALLOCATED in disas_simd_three_reg_same_fp16
+ include/hw/boards.h           |   2 +-
-      target/arm: Fix mte page crossing test
+ include/sysemu/device_tree.h  |   9 +
+ include/sysemu/reset.h        |   5 +-
- include/hw/acpi/ghes.h            |   9 +
+ target/arm/cpu.h              |  15 ++
- include/qemu/bitops.h             |  29 +++
+ target/arm/internals.h        |  30 +++
- include/qemu/int128.h             |  10 +
+ hw/arm/aspeed.c               |   4 +-
- target/arm/translate-a32.h        |   2 +
+ hw/arm/boot.c                 |   2 +
- target/arm/translate.h            |   9 +
+ hw/arm/mps2-tz.c              |   4 +-
- target/arm/vec_internal.h         |   9 +
+ hw/arm/virt.c                 |   5 +-
- target/arm/mve.decode             |  20 ++
+ hw/core/reset.c               |  17 +-
- target/arm/t32.decode             |  15 +-
+ hw/core/resettable.c          |   3 +-
- hw/acpi/ghes-stub.c               |  22 +++
+ hw/hppa/machine.c             |   4 +-
- hw/acpi/ghes.c                    |  17 ++
+ hw/hyperv/hyperv.c            |   2 +-
- hw/arm/aspeed.c                   |  11 +-
+ hw/i386/microvm.c             |   4 +-
- hw/arm/npcm7xx_boards.c           | 107 ++++++++++-
+ hw/i386/pc.c                  |   6 +-
- hw/arm/virt.c                     |   2 +
+ hw/i386/x86.c                 |   2 +-
- hw/intc/arm_gicv3_cpuif.c         |   5 +-
+ hw/m68k/q800.c                |  33 ++-
- hw/intc/armv7m_nvic.c             |   6 -
+ hw/m68k/virt.c                |  20 +-
- target/arm/kvm64.c                |   6 +-
+ hw/mips/boston.c              |   3 +
- target/arm/m_helper.c             |  54 +++++-
+ hw/openrisc/boot.c            |   3 +
- target/arm/mte_helper.c           |   2 +-
+ hw/ppc/pegasos2.c             |   4 +-
- target/arm/sve_helper.c           | 381 +++++++++++++-------------------------
+ hw/ppc/pnv.c                  |   4 +-
- target/arm/translate-a64.c        |  87 +++++----
+ hw/ppc/spapr.c                |   4 +-
- target/arm/translate-m-nocp.c     |  16 +-
+ hw/riscv/boot.c               |   3 +
- target/arm/translate-mve.c        |  29 +++
+ hw/rx/rx-gdbsim.c             |   3 +
- target/arm/translate-vfp.c        |  65 +++++--
+ hw/s390x/s390-virtio-ccw.c    |   4 +-
- target/arm/translate.c            | 300 ++++++++++++++++++++++++++++--
+ hw/timer/imx_epit.c           |   9 +-
- target/arm/vec_helper.c           | 116 +++++++++++-
+ migration/savevm.c            |   2 +-
- target/arm/vfp_helper.c           |   3 +-
+ softmmu/device_tree.c         |  21 ++
- tests/tcg/aarch64/mte-7.c         |  31 ++++
+ softmmu/runstate.c            |  11 +-
- hw/acpi/meson.build               |   6 +-
+ target/arm/cpu.c              |  24 +-
- hw/arm/Kconfig                    |   2 +
+ target/arm/cpu64.c            |   2 +
- target/arm/meson.build            |   2 +
+ target/arm/helper.c           |  31 ++-
- tests/tcg/aarch64/Makefile.target |   2 +-
+ target/arm/ptw.c              | 524 +++++++++++++++++++++++++++---------------
-files changed, 1019 insertions(+), 356 deletions(-)
+files changed, 572 insertions(+), 263 deletions(-)
  create mode 100644 target/arm/mve.decode
  create mode 100644 hw/acpi/ghes-stub.c
  create mode 100644 target/arm/translate-mve.c
  create mode 100644 tests/tcg/aarch64/mte-7.c

-[PULL 27/28] bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations
+[PULL 01/30] target/arm: Implement FEAT_E0PD
-Currently the ARM SVE helper code defines locally some utility
+FEAT_E0PD adds new bits E0PD0 and E0PD1 to TCR_EL1, which allow the
-functions for swapping 16-bit halfwords within 32-bit or 64-bit
+OS to forbid EL0 access to half of the address space.  Since this is
-values and for swapping 32-bit words within 64-bit values,
+an EL0-specific variation on the existing TCR_ELx.{EPD0,EPD1}, we can
-parallel to the byte-swapping bswap16/32/64 functions.
+implement it entirely in aa64_va_parameters().
-We want these also for the ARM MVE code, and they're potentially
+This requires moving the existing regime_is_user() to internals.h
-generally useful for other targets, so move them to bitops.h.
+so that the code in helper.c can get at it.
 (We don't put them in bswap.h with the bswap* functions because
 they are implemented in terms of the rotate operations also
 defined in bitops.h, and including bitops.h from bswap.h seems
 better avoided.)
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20221021160131.3531787-1-peter.maydell@linaro.org
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20210614151007.4545-17-peter.maydell@linaro.org
 ---
- include/qemu/bitops.h   | 29 +++++++++++++++++++++++++++++
+ docs/system/arm/emulation.rst |  1 +
- target/arm/sve_helper.c | 20 --------------------
+ target/arm/cpu.h              |  5 +++++
-files changed, 29 insertions(+), 20 deletions(-)
+ target/arm/internals.h        | 19 +++++++++++++++++++
  target/arm/cpu64.c            |  1 +
  target/arm/helper.c           |  9 +++++++++
  target/arm/ptw.c              | 19 -------------------
 files changed, 35 insertions(+), 19 deletions(-)
-diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/bitops.h
+--- a/docs/system/arm/emulation.rst
-+++ b/include/qemu/bitops.h
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t ror64(uint64_t word, unsigned int shift)
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
-     return (word >> shift) | (word << ((64 - shift) & 63));
+ - FEAT_Debugv8p4 (Debug changes for v8.4)
  - FEAT_DotProd (Advanced SIMD dot product instructions)
  - FEAT_DoubleFault (Double Fault Extension)
 +- FEAT_E0PD (Preventing EL0 access to halves of address maps)
  - FEAT_ETS (Enhanced Translation Synchronization)
  - FEAT_FCMA (Floating-point complex number instructions)
  - FEAT_FHM (Floating-point half-precision multiplication instructions)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_lva(const ARMISARegisters *id)
      return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, VARANGE) != 0;
  }
-+/**
++static inline bool isar_feature_aa64_e0pd(const ARMISARegisters *id)
 + * hswap32 - swap 16-bit halfwords within a 32-bit value
 + * @h: value to swap
 + */
 +static inline uint32_t hswap32(uint32_t h)
 +{
-+    return rol32(h, 16);
++    return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, E0PD) != 0;
 +}
 +
-+/**
+ static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
-+ * hswap64 - swap 16-bit halfwords within a 64-bit value
+ {
-+ * @h: value to swap
+     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
-+ */
+diff --git a/target/arm/internals.h b/target/arm/internals.h
-+static inline uint64_t hswap64(uint64_t h)
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/internals.h
 +++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
      }
  }
 +static inline bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
 +{
-+    uint64_t m = 0x0000ffff0000ffffull;
++    switch (mmu_idx) {
-+    h = rol64(h, 32);
++    case ARMMMUIdx_E20_0:
-+    return ((h & m) << 16) | ((h >> 16) & m);
++    case ARMMMUIdx_Stage1_E0:
 +    case ARMMMUIdx_MUser:
 +    case ARMMMUIdx_MSUser:
 +    case ARMMMUIdx_MUserNegPri:
 +    case ARMMMUIdx_MSUserNegPri:
 +        return true;
 +    default:
 +        return false;
 +    case ARMMMUIdx_E10_0:
 +    case ARMMMUIdx_E10_1:
 +    case ARMMMUIdx_E10_1_PAN:
 +        g_assert_not_reached();
 +    }
 +}
 +
-+/**
+ /* Return the SCTLR value which controls this address translation regime */
-+ * wswap64 - swap 32-bit words within a 64-bit value
+ static inline uint64_t regime_sctlr(CPUARMState *env, ARMMMUIdx mmu_idx)
-+ * @h: value to swap
+ {
-+ */
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
-+static inline uint64_t wswap64(uint64_t h)
+index XXXXXXX..XXXXXXX 100644
-+{
+--- a/target/arm/cpu64.c
-+    return rol64(h, 32);
++++ b/target/arm/cpu64.c
-+}
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
      t = FIELD_DP64(t, ID_AA64MMFR2, FWB, 1);      /* FEAT_S2FWB */
      t = FIELD_DP64(t, ID_AA64MMFR2, TTL, 1);      /* FEAT_TTL */
      t = FIELD_DP64(t, ID_AA64MMFR2, BBM, 2);      /* FEAT_BBM at level 2 */
 +    t = FIELD_DP64(t, ID_AA64MMFR2, E0PD, 1);     /* FEAT_E0PD */
      cpu->isar.id_aa64mmfr2 = t;
      t = cpu->isar.id_aa64zfr0;
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
          ps = extract32(tcr, 16, 3);
          ds = extract64(tcr, 32, 1);
      } else {
 +        bool e0pd;
 +
- /**
+         /*
-  * extract32:
+          * Bit 55 is always between the two regions, and is canonical for
-  * @value: the value to extract the bit field from
+          * determining if address tagging is enabled.
-diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
              epd = extract32(tcr, 7, 1);
              sh = extract32(tcr, 12, 2);
              hpd = extract64(tcr, 41, 1);
 +            e0pd = extract64(tcr, 55, 1);
          } else {
              tsz = extract32(tcr, 16, 6);
              gran = tg1_to_gran_size(extract32(tcr, 30, 2));
              epd = extract32(tcr, 23, 1);
              sh = extract32(tcr, 28, 2);
              hpd = extract64(tcr, 42, 1);
 +            e0pd = extract64(tcr, 56, 1);
          }
          ps = extract64(tcr, 32, 3);
          ds = extract64(tcr, 59, 1);
 +
 +        if (e0pd && cpu_isar_feature(aa64_e0pd, cpu) &&
 +            regime_is_user(env, mmu_idx)) {
 +            epd = true;
 +        }
      }
      gran = sanitize_gran_size(cpu, gran, stage2);
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/sve_helper.c
+--- a/target/arm/ptw.c
-+++ b/target/arm/sve_helper.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t expand_pred_s(uint8_t byte)
+@@ -XXX,XX +XXX,XX @@ static bool regime_translation_big_endian(CPUARMState *env, ARMMMUIdx mmu_idx)
-     return word[byte & 0x11];
+     return (regime_sctlr(env, mmu_idx) & SCTLR_EE) != 0;
  }
--/* Swap 16-bit words within a 32-bit word.  */
+-static bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
 -static inline uint32_t hswap32(uint32_t h)
 -{
--    return rol32(h, 16);
+-    switch (mmu_idx) {
 -    case ARMMMUIdx_E20_0:
 -    case ARMMMUIdx_Stage1_E0:
 -    case ARMMMUIdx_MUser:
 -    case ARMMMUIdx_MSUser:
 -    case ARMMMUIdx_MUserNegPri:
 -    case ARMMMUIdx_MSUserNegPri:
 -        return true;
 -    default:
 -        return false;
 -    case ARMMMUIdx_E10_0:
 -    case ARMMMUIdx_E10_1:
 -    case ARMMMUIdx_E10_1_PAN:
 -        g_assert_not_reached();
 -    }
 -}
 -
--/* Swap 16-bit words within a 64-bit word.  */
+ /* Return the TTBR associated with this translation regime */
--static inline uint64_t hswap64(uint64_t h)
+ static uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx, int ttbrn)
--{
+ {
 -    uint64_t m = 0x0000ffff0000ffffull;
 -    h = rol64(h, 32);
 -    return ((h & m) << 16) | ((h >> 16) & m);
 -}
 -
 -/* Swap 32-bit words within a 64-bit word.  */
 -static inline uint64_t wswap64(uint64_t h)
 -{
 -    return rol64(h, 32);
 -}
 -
  #define LOGICAL_PPPP(NAME, FUNC) \
  void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc)  \
  {                                                                         \
 --
-.20.1
+.25.1

-[PULL 05/28] hw: virt: consider hw_compat_6_0
+[PULL 02/30] hw/arm/virt: Fix devicetree warnings about the virtio-iommu node
-From: Heinrich Schuchardt <xypron.glpk@gmx.de>
+From: Jean-Philippe Brucker <jean-philippe@linaro.org>
-virt-6.0 must consider hw_compat_6_0.
+The "PCI Bus Binding to: IEEE Std 1275-1994" defines the compatible
 string for a PCIe bus or endpoint as "pci<vendorid>,<deviceid>" or
 similar. Since the initial binding for PCI virtio-iommu didn't follow
 this rule, it was modified to accept both strings and ensure backward
 compatibility. Also, the unit-name for the node should be
 "device,function".
-Fixes: da7e13c00b59 ("hw: add compat machines for 6.1")
+Fix corresponding dt-validate and dtc warnings:
-Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
-Reviewed-by: Cornelia Huck <cohuck@redhat.com>
+  pcie@10000000: virtio_iommu@16:compatible: ['virtio,pci-iommu'] does not contain items matching the given schema
-Message-id: 20210610183500.54207-1-xypron.glpk@gmx.de
+  pcie@10000000: Unevaluated properties are not allowed (... 'virtio_iommu@16' were unexpected)
   From schema: linux/Documentation/devicetree/bindings/pci/host-generic-pci.yaml
   virtio_iommu@16: compatible: 'oneOf' conditional failed, one must be fixed:
         ['virtio,pci-iommu'] is too short
         'pci1af4,1057' was expected
   From schema: dtschema/schemas/pci/pci-bus.yaml
   Warning (pci_device_reg): /pcie@10000000/virtio_iommu@16: PCI unit address format error, expected "2,0"
 Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/virt.c | 2 ++
+ hw/arm/virt.c | 5 +++--
-file changed, 2 insertions(+)
+file changed, 3 insertions(+), 2 deletions(-)
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
-@@ -XXX,XX +XXX,XX @@ DEFINE_VIRT_MACHINE_AS_LATEST(6, 1)
+@@ -XXX,XX +XXX,XX @@ static void create_smmu(const VirtMachineState *vms,
- static void virt_machine_6_0_options(MachineClass *mc)
+ static void create_virtio_iommu_dt_bindings(VirtMachineState *vms)
  {
-+    virt_machine_6_1_options(mc);
+-    const char compat[] = "virtio,pci-iommu";
-+    compat_props_add(mc->compat_props, hw_compat_6_0, hw_compat_6_0_len);
++    const char compat[] = "virtio,pci-iommu\0pci1af4,1057";
- }
+     uint16_t bdf = vms->virtio_iommu_bdf;
- DEFINE_VIRT_MACHINE(6, 0)
+     MachineState *ms = MACHINE(vms);
+     char *node;
      vms->iommu_phandle = qemu_fdt_alloc_phandle(ms->fdt);
 -    node = g_strdup_printf("%s/virtio_iommu@%d", vms->pciehb_nodename, bdf);
 +    node = g_strdup_printf("%s/virtio_iommu@%x,%x", vms->pciehb_nodename,
 +                           PCI_SLOT(bdf), PCI_FUNC(bdf));
      qemu_fdt_add_subnode(ms->fdt, node);
      qemu_fdt_setprop(ms->fdt, node, "compatible", compat, sizeof(compat));
      qemu_fdt_setprop_sized_cells(ms->fdt, node, "reg",
 --
-.20.1
+.25.1

-[PULL 26/28] target/arm: Move expand_pred_b() data to vec_helper.c
+[PULL 03/30] target/arm: honor HCR_E2H and HCR_TGE in arm_excp_unmasked()
-For MVE, we want to re-use the large data table from expand_pred_b().
+From: Ake Koomsin <ake@igel.co.jp>
 Move the data table to vec_helper.c so it is no longer in an SVE
 specific source file.
+An exception targeting EL2 from lower EL is actually maskable when
+HCR_E2H and HCR_TGE are both set. This applies to both secure and
+non-secure Security state.
+We can remove the conditions that try to suppress masking of
+interrupts when we are Secure and the exception targets EL2 and
+Secure EL2 is disabled.  This is OK because in that situation
+arm_phys_excp_target_el() will never return 2 as the target EL.  The
+'not if secure' check in this function was originally written before
+arm_hcr_el2_eff(), and back then the target EL returned by
+arm_phys_excp_target_el() could be 2 even if we were in Secure
+EL0/EL1; but it is no longer needed.
+Signed-off-by: Ake Koomsin <ake@igel.co.jp>
+Message-id: 20221017092432.546881-1-ake@igel.co.jp
+[PMM: Add commit message paragraph explaining why it's OK to
+ remove the checks on secure and SCR_EEL2]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210614151007.4545-14-peter.maydell@linaro.org
 ---
- target/arm/vec_internal.h |   3 ++
+ target/arm/cpu.c | 24 +++++++++++++++++-------
- target/arm/sve_helper.c   | 103 ++------------------------------------
+file changed, 17 insertions(+), 7 deletions(-)
  target/arm/vec_helper.c   | 102 +++++++++++++++++++++++++++++++++++++
 files changed, 109 insertions(+), 99 deletions(-)
-diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_internal.h
+--- a/target/arm/cpu.c
-+++ b/target/arm/vec_internal.h
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static inline bool arm_excp_unmasked(CPUState *cs, unsigned int excp_idx,
- #define H8(x)   (x)
+     if ((target_el > cur_el) && (target_el != 1)) {
- #define H1_8(x) (x)
+         /* Exceptions targeting a higher EL may not be maskable */
+         if (arm_feature(env, ARM_FEATURE_AARCH64)) {
-+/* Data for expanding active predicate bits to bytes, for byte elements. */
+-            /*
-+extern const uint64_t expand_pred_b_data[256];
+-             * 64-bit masking rules are simple: exceptions to EL3
-+
+-             * can't be masked, and exceptions to EL2 can only be
- static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
+-             * masked from Secure state. The HCR and SCR settings
- {
+-             * don't affect the masking logic, only the interrupt routing.
-     uint64_t *d = vd + opr_sz;
+-             */
-diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
+-            if (target_el == 3 || !secure || (env->cp15.scr_el3 & SCR_EEL2)) {
-index XXXXXXX..XXXXXXX 100644
++            switch (target_el) {
---- a/target/arm/sve_helper.c
++            case 2:
-+++ b/target/arm/sve_helper.c
++                /*
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
++                 * According to ARM DDI 0487H.a, an interrupt can be masked
-     return flags;
++                 * when HCR_E2H and HCR_TGE are both set regardless of the
- }
++                 * current Security state. Note that we need to revisit this
++                 * part again once we need to support NMI.
--/* Expand active predicate bits to bytes, for byte elements.
++                 */
-- *  for (i = 0; i < 256; ++i) {
++                if ((hcr_el2 & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
-- *      unsigned long m = 0;
++                        unmasked = true;
-- *      for (j = 0; j < 8; j++) {
++                }
-- *          if ((i >> j) & 1) {
++                break;
-- *              m |= 0xfful << (j << 3);
++            case 3:
-- *          }
++                /* Interrupt cannot be masked when the target EL is 3 */
-- *      }
+                 unmasked = true;
-- *      printf("0x%016lx,\n", m);
++                break;
-- *  }
++            default:
-+/*
++                g_assert_not_reached();
-+ * Expand active predicate bits to bytes, for byte elements.
+             }
-+ * (The data table itself is in vec_helper.c as MVE also needs it.)
+         } else {
-  */
+             /*
  static inline uint64_t expand_pred_b(uint8_t byte)
  {
 -    static const uint64_t word[256] = {
 -        0x0000000000000000, 0x00000000000000ff, 0x000000000000ff00,
 -        0x000000000000ffff, 0x0000000000ff0000, 0x0000000000ff00ff,
 -        0x0000000000ffff00, 0x0000000000ffffff, 0x00000000ff000000,
 -        0x00000000ff0000ff, 0x00000000ff00ff00, 0x00000000ff00ffff,
 -        0x00000000ffff0000, 0x00000000ffff00ff, 0x00000000ffffff00,
 -        0x00000000ffffffff, 0x000000ff00000000, 0x000000ff000000ff,
 -        0x000000ff0000ff00, 0x000000ff0000ffff, 0x000000ff00ff0000,
 -        0x000000ff00ff00ff, 0x000000ff00ffff00, 0x000000ff00ffffff,
 -        0x000000ffff000000, 0x000000ffff0000ff, 0x000000ffff00ff00,
 -        0x000000ffff00ffff, 0x000000ffffff0000, 0x000000ffffff00ff,
 -        0x000000ffffffff00, 0x000000ffffffffff, 0x0000ff0000000000,
 -        0x0000ff00000000ff, 0x0000ff000000ff00, 0x0000ff000000ffff,
 -        0x0000ff0000ff0000, 0x0000ff0000ff00ff, 0x0000ff0000ffff00,
 -        0x0000ff0000ffffff, 0x0000ff00ff000000, 0x0000ff00ff0000ff,
 -        0x0000ff00ff00ff00, 0x0000ff00ff00ffff, 0x0000ff00ffff0000,
 -        0x0000ff00ffff00ff, 0x0000ff00ffffff00, 0x0000ff00ffffffff,
 -        0x0000ffff00000000, 0x0000ffff000000ff, 0x0000ffff0000ff00,
 -        0x0000ffff0000ffff, 0x0000ffff00ff0000, 0x0000ffff00ff00ff,
 -        0x0000ffff00ffff00, 0x0000ffff00ffffff, 0x0000ffffff000000,
 -        0x0000ffffff0000ff, 0x0000ffffff00ff00, 0x0000ffffff00ffff,
 -        0x0000ffffffff0000, 0x0000ffffffff00ff, 0x0000ffffffffff00,
 -        0x0000ffffffffffff, 0x00ff000000000000, 0x00ff0000000000ff,
 -        0x00ff00000000ff00, 0x00ff00000000ffff, 0x00ff000000ff0000,
 -        0x00ff000000ff00ff, 0x00ff000000ffff00, 0x00ff000000ffffff,
 -        0x00ff0000ff000000, 0x00ff0000ff0000ff, 0x00ff0000ff00ff00,
 -        0x00ff0000ff00ffff, 0x00ff0000ffff0000, 0x00ff0000ffff00ff,
 -        0x00ff0000ffffff00, 0x00ff0000ffffffff, 0x00ff00ff00000000,
 -        0x00ff00ff000000ff, 0x00ff00ff0000ff00, 0x00ff00ff0000ffff,
 -        0x00ff00ff00ff0000, 0x00ff00ff00ff00ff, 0x00ff00ff00ffff00,
 -        0x00ff00ff00ffffff, 0x00ff00ffff000000, 0x00ff00ffff0000ff,
 -        0x00ff00ffff00ff00, 0x00ff00ffff00ffff, 0x00ff00ffffff0000,
 -        0x00ff00ffffff00ff, 0x00ff00ffffffff00, 0x00ff00ffffffffff,
 -        0x00ffff0000000000, 0x00ffff00000000ff, 0x00ffff000000ff00,
 -        0x00ffff000000ffff, 0x00ffff0000ff0000, 0x00ffff0000ff00ff,
 -        0x00ffff0000ffff00, 0x00ffff0000ffffff, 0x00ffff00ff000000,
 -        0x00ffff00ff0000ff, 0x00ffff00ff00ff00, 0x00ffff00ff00ffff,
 -        0x00ffff00ffff0000, 0x00ffff00ffff00ff, 0x00ffff00ffffff00,
 -        0x00ffff00ffffffff, 0x00ffffff00000000, 0x00ffffff000000ff,
 -        0x00ffffff0000ff00, 0x00ffffff0000ffff, 0x00ffffff00ff0000,
 -        0x00ffffff00ff00ff, 0x00ffffff00ffff00, 0x00ffffff00ffffff,
 -        0x00ffffffff000000, 0x00ffffffff0000ff, 0x00ffffffff00ff00,
 -        0x00ffffffff00ffff, 0x00ffffffffff0000, 0x00ffffffffff00ff,
 -        0x00ffffffffffff00, 0x00ffffffffffffff, 0xff00000000000000,
 -        0xff000000000000ff, 0xff0000000000ff00, 0xff0000000000ffff,
 -        0xff00000000ff0000, 0xff00000000ff00ff, 0xff00000000ffff00,
 -        0xff00000000ffffff, 0xff000000ff000000, 0xff000000ff0000ff,
 -        0xff000000ff00ff00, 0xff000000ff00ffff, 0xff000000ffff0000,
 -        0xff000000ffff00ff, 0xff000000ffffff00, 0xff000000ffffffff,
 -        0xff0000ff00000000, 0xff0000ff000000ff, 0xff0000ff0000ff00,
 -        0xff0000ff0000ffff, 0xff0000ff00ff0000, 0xff0000ff00ff00ff,
 -        0xff0000ff00ffff00, 0xff0000ff00ffffff, 0xff0000ffff000000,
 -        0xff0000ffff0000ff, 0xff0000ffff00ff00, 0xff0000ffff00ffff,
 -        0xff0000ffffff0000, 0xff0000ffffff00ff, 0xff0000ffffffff00,
 -        0xff0000ffffffffff, 0xff00ff0000000000, 0xff00ff00000000ff,
 -        0xff00ff000000ff00, 0xff00ff000000ffff, 0xff00ff0000ff0000,
 -        0xff00ff0000ff00ff, 0xff00ff0000ffff00, 0xff00ff0000ffffff,
 -        0xff00ff00ff000000, 0xff00ff00ff0000ff, 0xff00ff00ff00ff00,
 -        0xff00ff00ff00ffff, 0xff00ff00ffff0000, 0xff00ff00ffff00ff,
 -        0xff00ff00ffffff00, 0xff00ff00ffffffff, 0xff00ffff00000000,
 -        0xff00ffff000000ff, 0xff00ffff0000ff00, 0xff00ffff0000ffff,
 -        0xff00ffff00ff0000, 0xff00ffff00ff00ff, 0xff00ffff00ffff00,
 -        0xff00ffff00ffffff, 0xff00ffffff000000, 0xff00ffffff0000ff,
 -        0xff00ffffff00ff00, 0xff00ffffff00ffff, 0xff00ffffffff0000,
 -        0xff00ffffffff00ff, 0xff00ffffffffff00, 0xff00ffffffffffff,
 -        0xffff000000000000, 0xffff0000000000ff, 0xffff00000000ff00,
 -        0xffff00000000ffff, 0xffff000000ff0000, 0xffff000000ff00ff,
 -        0xffff000000ffff00, 0xffff000000ffffff, 0xffff0000ff000000,
 -        0xffff0000ff0000ff, 0xffff0000ff00ff00, 0xffff0000ff00ffff,
 -        0xffff0000ffff0000, 0xffff0000ffff00ff, 0xffff0000ffffff00,
 -        0xffff0000ffffffff, 0xffff00ff00000000, 0xffff00ff000000ff,
 -        0xffff00ff0000ff00, 0xffff00ff0000ffff, 0xffff00ff00ff0000,
 -        0xffff00ff00ff00ff, 0xffff00ff00ffff00, 0xffff00ff00ffffff,
 -        0xffff00ffff000000, 0xffff00ffff0000ff, 0xffff00ffff00ff00,
 -        0xffff00ffff00ffff, 0xffff00ffffff0000, 0xffff00ffffff00ff,
 -        0xffff00ffffffff00, 0xffff00ffffffffff, 0xffffff0000000000,
 -        0xffffff00000000ff, 0xffffff000000ff00, 0xffffff000000ffff,
 -        0xffffff0000ff0000, 0xffffff0000ff00ff, 0xffffff0000ffff00,
 -        0xffffff0000ffffff, 0xffffff00ff000000, 0xffffff00ff0000ff,
 -        0xffffff00ff00ff00, 0xffffff00ff00ffff, 0xffffff00ffff0000,
 -        0xffffff00ffff00ff, 0xffffff00ffffff00, 0xffffff00ffffffff,
 -        0xffffffff00000000, 0xffffffff000000ff, 0xffffffff0000ff00,
 -        0xffffffff0000ffff, 0xffffffff00ff0000, 0xffffffff00ff00ff,
 -        0xffffffff00ffff00, 0xffffffff00ffffff, 0xffffffffff000000,
 -        0xffffffffff0000ff, 0xffffffffff00ff00, 0xffffffffff00ffff,
 -        0xffffffffffff0000, 0xffffffffffff00ff, 0xffffffffffffff00,
 -        0xffffffffffffffff,
 -    };
 -    return word[byte];
 +    return expand_pred_b_data[byte];
  }
  /* Similarly for half-word elements.
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/int128.h"
  #include "vec_internal.h"
 +/*
 + * Data for expanding active predicate bits to bytes, for byte elements.
 + *
 + *  for (i = 0; i < 256; ++i) {
 + *      unsigned long m = 0;
 + *      for (j = 0; j < 8; j++) {
 + *          if ((i >> j) & 1) {
 + *              m |= 0xfful << (j << 3);
 + *          }
 + *      }
 + *      printf("0x%016lx,\n", m);
 + *  }
 + */
 +const uint64_t expand_pred_b_data[256] = {
 +    0x0000000000000000, 0x00000000000000ff, 0x000000000000ff00,
 +    0x000000000000ffff, 0x0000000000ff0000, 0x0000000000ff00ff,
 +    0x0000000000ffff00, 0x0000000000ffffff, 0x00000000ff000000,
 +    0x00000000ff0000ff, 0x00000000ff00ff00, 0x00000000ff00ffff,
 +    0x00000000ffff0000, 0x00000000ffff00ff, 0x00000000ffffff00,
 +    0x00000000ffffffff, 0x000000ff00000000, 0x000000ff000000ff,
 +    0x000000ff0000ff00, 0x000000ff0000ffff, 0x000000ff00ff0000,
 +    0x000000ff00ff00ff, 0x000000ff00ffff00, 0x000000ff00ffffff,
 +    0x000000ffff000000, 0x000000ffff0000ff, 0x000000ffff00ff00,
 +    0x000000ffff00ffff, 0x000000ffffff0000, 0x000000ffffff00ff,
 +    0x000000ffffffff00, 0x000000ffffffffff, 0x0000ff0000000000,
 +    0x0000ff00000000ff, 0x0000ff000000ff00, 0x0000ff000000ffff,
 +    0x0000ff0000ff0000, 0x0000ff0000ff00ff, 0x0000ff0000ffff00,
 +    0x0000ff0000ffffff, 0x0000ff00ff000000, 0x0000ff00ff0000ff,
 +    0x0000ff00ff00ff00, 0x0000ff00ff00ffff, 0x0000ff00ffff0000,
 +    0x0000ff00ffff00ff, 0x0000ff00ffffff00, 0x0000ff00ffffffff,
 +    0x0000ffff00000000, 0x0000ffff000000ff, 0x0000ffff0000ff00,
 +    0x0000ffff0000ffff, 0x0000ffff00ff0000, 0x0000ffff00ff00ff,
 +    0x0000ffff00ffff00, 0x0000ffff00ffffff, 0x0000ffffff000000,
 +    0x0000ffffff0000ff, 0x0000ffffff00ff00, 0x0000ffffff00ffff,
 +    0x0000ffffffff0000, 0x0000ffffffff00ff, 0x0000ffffffffff00,
 +    0x0000ffffffffffff, 0x00ff000000000000, 0x00ff0000000000ff,
 +    0x00ff00000000ff00, 0x00ff00000000ffff, 0x00ff000000ff0000,
 +    0x00ff000000ff00ff, 0x00ff000000ffff00, 0x00ff000000ffffff,
 +    0x00ff0000ff000000, 0x00ff0000ff0000ff, 0x00ff0000ff00ff00,
 +    0x00ff0000ff00ffff, 0x00ff0000ffff0000, 0x00ff0000ffff00ff,
 +    0x00ff0000ffffff00, 0x00ff0000ffffffff, 0x00ff00ff00000000,
 +    0x00ff00ff000000ff, 0x00ff00ff0000ff00, 0x00ff00ff0000ffff,
 +    0x00ff00ff00ff0000, 0x00ff00ff00ff00ff, 0x00ff00ff00ffff00,
 +    0x00ff00ff00ffffff, 0x00ff00ffff000000, 0x00ff00ffff0000ff,
 +    0x00ff00ffff00ff00, 0x00ff00ffff00ffff, 0x00ff00ffffff0000,
 +    0x00ff00ffffff00ff, 0x00ff00ffffffff00, 0x00ff00ffffffffff,
 +    0x00ffff0000000000, 0x00ffff00000000ff, 0x00ffff000000ff00,
 +    0x00ffff000000ffff, 0x00ffff0000ff0000, 0x00ffff0000ff00ff,
 +    0x00ffff0000ffff00, 0x00ffff0000ffffff, 0x00ffff00ff000000,
 +    0x00ffff00ff0000ff, 0x00ffff00ff00ff00, 0x00ffff00ff00ffff,
 +    0x00ffff00ffff0000, 0x00ffff00ffff00ff, 0x00ffff00ffffff00,
 +    0x00ffff00ffffffff, 0x00ffffff00000000, 0x00ffffff000000ff,
 +    0x00ffffff0000ff00, 0x00ffffff0000ffff, 0x00ffffff00ff0000,
 +    0x00ffffff00ff00ff, 0x00ffffff00ffff00, 0x00ffffff00ffffff,
 +    0x00ffffffff000000, 0x00ffffffff0000ff, 0x00ffffffff00ff00,
 +    0x00ffffffff00ffff, 0x00ffffffffff0000, 0x00ffffffffff00ff,
 +    0x00ffffffffffff00, 0x00ffffffffffffff, 0xff00000000000000,
 +    0xff000000000000ff, 0xff0000000000ff00, 0xff0000000000ffff,
 +    0xff00000000ff0000, 0xff00000000ff00ff, 0xff00000000ffff00,
 +    0xff00000000ffffff, 0xff000000ff000000, 0xff000000ff0000ff,
 +    0xff000000ff00ff00, 0xff000000ff00ffff, 0xff000000ffff0000,
 +    0xff000000ffff00ff, 0xff000000ffffff00, 0xff000000ffffffff,
 +    0xff0000ff00000000, 0xff0000ff000000ff, 0xff0000ff0000ff00,
 +    0xff0000ff0000ffff, 0xff0000ff00ff0000, 0xff0000ff00ff00ff,
 +    0xff0000ff00ffff00, 0xff0000ff00ffffff, 0xff0000ffff000000,
 +    0xff0000ffff0000ff, 0xff0000ffff00ff00, 0xff0000ffff00ffff,
 +    0xff0000ffffff0000, 0xff0000ffffff00ff, 0xff0000ffffffff00,
 +    0xff0000ffffffffff, 0xff00ff0000000000, 0xff00ff00000000ff,
 +    0xff00ff000000ff00, 0xff00ff000000ffff, 0xff00ff0000ff0000,
 +    0xff00ff0000ff00ff, 0xff00ff0000ffff00, 0xff00ff0000ffffff,
 +    0xff00ff00ff000000, 0xff00ff00ff0000ff, 0xff00ff00ff00ff00,
 +    0xff00ff00ff00ffff, 0xff00ff00ffff0000, 0xff00ff00ffff00ff,
 +    0xff00ff00ffffff00, 0xff00ff00ffffffff, 0xff00ffff00000000,
 +    0xff00ffff000000ff, 0xff00ffff0000ff00, 0xff00ffff0000ffff,
 +    0xff00ffff00ff0000, 0xff00ffff00ff00ff, 0xff00ffff00ffff00,
 +    0xff00ffff00ffffff, 0xff00ffffff000000, 0xff00ffffff0000ff,
 +    0xff00ffffff00ff00, 0xff00ffffff00ffff, 0xff00ffffffff0000,
 +    0xff00ffffffff00ff, 0xff00ffffffffff00, 0xff00ffffffffffff,
 +    0xffff000000000000, 0xffff0000000000ff, 0xffff00000000ff00,
 +    0xffff00000000ffff, 0xffff000000ff0000, 0xffff000000ff00ff,
 +    0xffff000000ffff00, 0xffff000000ffffff, 0xffff0000ff000000,
 +    0xffff0000ff0000ff, 0xffff0000ff00ff00, 0xffff0000ff00ffff,
 +    0xffff0000ffff0000, 0xffff0000ffff00ff, 0xffff0000ffffff00,
 +    0xffff0000ffffffff, 0xffff00ff00000000, 0xffff00ff000000ff,
 +    0xffff00ff0000ff00, 0xffff00ff0000ffff, 0xffff00ff00ff0000,
 +    0xffff00ff00ff00ff, 0xffff00ff00ffff00, 0xffff00ff00ffffff,
 +    0xffff00ffff000000, 0xffff00ffff0000ff, 0xffff00ffff00ff00,
 +    0xffff00ffff00ffff, 0xffff00ffffff0000, 0xffff00ffffff00ff,
 +    0xffff00ffffffff00, 0xffff00ffffffffff, 0xffffff0000000000,
 +    0xffffff00000000ff, 0xffffff000000ff00, 0xffffff000000ffff,
 +    0xffffff0000ff0000, 0xffffff0000ff00ff, 0xffffff0000ffff00,
 +    0xffffff0000ffffff, 0xffffff00ff000000, 0xffffff00ff0000ff,
 +    0xffffff00ff00ff00, 0xffffff00ff00ffff, 0xffffff00ffff0000,
 +    0xffffff00ffff00ff, 0xffffff00ffffff00, 0xffffff00ffffffff,
 +    0xffffffff00000000, 0xffffffff000000ff, 0xffffffff0000ff00,
 +    0xffffffff0000ffff, 0xffffffff00ff0000, 0xffffffff00ff00ff,
 +    0xffffffff00ffff00, 0xffffffff00ffffff, 0xffffffffff000000,
 +    0xffffffffff0000ff, 0xffffffffff00ff00, 0xffffffffff00ffff,
 +    0xffffffffffff0000, 0xffffffffffff00ff, 0xffffffffffffff00,
 +    0xffffffffffffffff,
 +};
 +
  /* Signed saturating rounding doubling multiply-accumulate high half, 8-bit */
  int8_t do_sqrdmlah_b(int8_t src1, int8_t src2, int8_t src3,
                       bool neg, bool round)
 --
-.20.1
+.25.1

-[PULL 25/28] target/arm: Add framework for MVE decode
+[PULL 04/30] hw/core/resettable: fix reset level counting
-Add the framework for decoding MVE insns, with the necessary new
+From: Damien Hedde <damien.hedde@greensocs.com>
 files and the meson.build rules, but no actual content yet.
+The code for handling the reset level count in the Resettable code
+has two issues:
+The reset count is only decremented for the 1->0 case.  This means
+that if there's ever a nested reset that takes the count to 2 then it
+will never again be decremented.  Eventually the count will exceed
+the '50' limit in resettable_phase_enter() and QEMU will trip over
+the assertion failure.  The repro case in issue 1266 is an example of
+this that happens now the SCSI subsystem uses three-phase reset.
+Secondly, the count is decremented only after the exit phase handler
+is called.  Moving the reset count decrement from "just after" to
+"just before" calling the exit phase handler allows
+resettable_is_in_reset() to return false during the handler
+execution.
+This simplifies reset handling in resettable devices.  Typically, a
+function that updates the device state will just need to read the
+current reset state and not anymore treat the "in a reset-exit
+transition" as a special case.
+Note that the semantics change to the *_is_in_reset() functions
+will have no effect on the current codebase, because only two
+devices (hw/char/cadence_uart.c and hw/misc/zynq_sclr.c) currently
+call those functions, and in neither case do they do it from the
+device's exit phase methed.
+Fixes: 4a5fc890 ("scsi: Use device_cold_reset() and bus_cold_reset()")
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1266
+Signed-off-by: Damien Hedde <damien.hedde@greensocs.com>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reported-by: Michael Peter <michael.peter@hensoldt-cyber.com>
-Message-id: 20210614151007.4545-11-peter.maydell@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20221020142749.3357951-1-peter.maydell@linaro.org
 Buglink: https://bugs.launchpad.net/qemu/+bug/1905297
 Reported-by: Michael Peter <michael.peter@hensoldt-cyber.com>
 [PMM: adjust the docs paragraph changed to get the name of the
  'enter' phase right and to clarify exactly when the count is
  adjusted; rewrite the commit message]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a32.h |  1 +
+ docs/devel/reset.rst | 8 +++++---
- target/arm/mve.decode      | 20 ++++++++++++++++++++
+ hw/core/resettable.c | 3 +--
- target/arm/translate-mve.c | 29 +++++++++++++++++++++++++++++
+files changed, 6 insertions(+), 5 deletions(-)
  target/arm/translate.c     |  1 +
  target/arm/meson.build     |  2 ++
 files changed, 53 insertions(+)
  create mode 100644 target/arm/mve.decode
  create mode 100644 target/arm/translate-mve.c
-diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
+diff --git a/docs/devel/reset.rst b/docs/devel/reset.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a32.h
+--- a/docs/devel/reset.rst
-+++ b/target/arm/translate-a32.h
++++ b/docs/devel/reset.rst
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ Polling the reset state
+ Resettable interface provides the ``resettable_is_in_reset()`` function.
- /* Prototypes for autogenerated disassembler functions */
+ This function returns true if the object parameter is currently under reset.
- bool disas_m_nocp(DisasContext *dc, uint32_t insn);
-+bool disas_mve(DisasContext *dc, uint32_t insn);
+-An object is under reset from the beginning of the *init* phase to the end of
- bool disas_vfp(DisasContext *s, uint32_t insn);
+-the *exit* phase. During all three phases, the function will return that the
- bool disas_vfp_uncond(DisasContext *s, uint32_t insn);
+-object is in reset.
- bool disas_neon_dp(DisasContext *s, uint32_t insn);
++An object is under reset from the beginning of the *enter* phase (before
-diff --git a/target/arm/mve.decode b/target/arm/mve.decode
++either its children or its own enter method is called) to the *exit*
-new file mode 100644
++phase. During *enter* and *hold* phase only, the function will return that the
-index XXXXXXX..XXXXXXX
++object is in reset. The state is changed after the *exit* is propagated to
---- /dev/null
++its children and just before calling the object's own *exit* method.
-+++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@
+ This function may be used if the object behavior has to be adapted
-+# M-profile MVE instruction descriptions
+ while in reset state. For example if a device has an irq input,
-+#
+diff --git a/hw/core/resettable.c b/hw/core/resettable.c
 +#  Copyright (c) 2021 Linaro, Ltd
 +#
 +# This library is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU Lesser General Public
 +# License as published by the Free Software Foundation; either
 +# version 2.1 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + *  ARM translation: M-profile MVE instructions
 + *
 + *  Copyright (c) 2021 Linaro, Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2.1 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "tcg/tcg-op.h"
 +#include "tcg/tcg-op-gvec.h"
 +#include "exec/exec-all.h"
 +#include "exec/gen-icount.h"
 +#include "translate.h"
 +#include "translate-a32.h"
 +
 +/* Include the generated decoder */
 +#include "decode-mve.c.inc"
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/core/resettable.c
-+++ b/target/arm/translate.c
++++ b/hw/core/resettable.c
-@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void resettable_phase_exit(Object *obj, void *opaque, ResetType type)
-     if (disas_t32(s, insn) ||
+     resettable_child_foreach(rc, obj, resettable_phase_exit, NULL, type);
-         disas_vfp_uncond(s, insn) ||
-         disas_neon_shared(s, insn) ||
+     assert(s->count > 0);
-+        disas_mve(s, insn) ||
+-    if (s->count == 1) {
-         ((insn >> 28) == 0xe && disas_vfp(s, insn))) {
++    if (--s->count == 0) {
-         return;
+         trace_resettable_phase_exit_exec(obj, obj_typename, !!rc->phases.exit);
          if (rc->phases.exit && !resettable_get_tr_func(rc, obj)) {
              rc->phases.exit(obj);
          }
 -        s->count = 0;
      }
-diff --git a/target/arm/meson.build b/target/arm/meson.build
+     s->exit_phase_in_progress = false;
-index XXXXXXX..XXXXXXX 100644
+     trace_resettable_phase_exit_end(obj, obj_typename, s->count);
 --- a/target/arm/meson.build
 +++ b/target/arm/meson.build
@@ -XXX,XX +XXX,XX @@ gen = [
    decodetree.process('vfp.decode', extra_args: '--decode=disas_vfp'),
    decodetree.process('vfp-uncond.decode', extra_args: '--decode=disas_vfp_uncond'),
    decodetree.process('m-nocp.decode', extra_args: '--decode=disas_m_nocp'),
 +  decodetree.process('mve.decode', extra_args: '--decode=disas_mve'),
    decodetree.process('a32.decode', extra_args: '--static-decode=disas_a32'),
    decodetree.process('a32-uncond.decode', extra_args: '--static-decode=disas_a32_uncond'),
    decodetree.process('t32.decode', extra_args: '--static-decode=disas_t32'),
@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
    'tlb_helper.c',
    'translate.c',
    'translate-m-nocp.c',
 +  'translate-mve.c',
    'translate-neon.c',
    'translate-vfp.c',
    'vec_helper.c',
 --
-.20.1
+.25.1

-[PULL 24/28] target/arm: Implement MVE LETP insn
+[PULL 05/30] hw/hyperv/hyperv.c: Use device_cold_reset() instead of device_legacy_reset()
-Implement the MVE LETP insn.  This is like the existing LE loop-end
+The semantic difference between the deprecated device_legacy_reset()
-insn, but it must perform an FPU-enabled check, and on loop-exit it
+function and the newer device_cold_reset() function is that the new
-resets LTPSIZE to 4.
+function resets both the device itself and any qbuses it owns,
+whereas the legacy function resets just the device itself and nothing
-To accommodate the requirement to do something on loop-exit, we drop
+else.  In hyperv_synic_reset() we reset a SynICState, which has no
-the use of condlabel and instead manage both the TB exits manually,
+qbuses, so for this purpose the two functions behave identically and
-in the same way we already do in trans_WLS().
+we can stop using the deprecated one.
 The other MVE-specific change to the LE insn is that we must raise an
 INVSTATE UsageFault insn if LTPSIZE is not 4.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-Message-id: 20210614151007.4545-10-peter.maydell@linaro.org
+Message-id: 20221013171817.1447562-1-peter.maydell@linaro.org
 ---
- target/arm/t32.decode  |   2 +-
+ hw/hyperv/hyperv.c | 2 +-
- target/arm/translate.c | 104 +++++++++++++++++++++++++++++++++++++----
+file changed, 1 insertion(+), 1 deletion(-)
 files changed, 97 insertions(+), 9 deletions(-)
-diff --git a/target/arm/t32.decode b/target/arm/t32.decode
+diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/t32.decode
+--- a/hw/hyperv/hyperv.c
-+++ b/target/arm/t32.decode
++++ b/hw/hyperv/hyperv.c
-@@ -XXX,XX +XXX,XX @@ BL               1111 0. .......... 11.1 ............         @branch24
+@@ -XXX,XX +XXX,XX @@ void hyperv_synic_reset(CPUState *cs)
-     DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001 size=4
+     SynICState *synic = get_synic(cs);
-     WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm size=4
-     {
+     if (synic) {
--      LE         1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
+-        device_legacy_reset(DEVICE(synic));
-+      LE         1111 0 0000 0 f:1 tp:1 1111 1100 . .......... 1 imm=%lob_imm
++        device_cold_reset(DEVICE(synic));
        # This is WLSTP
        WLS        1111 0 0000 0 size:2 rn:4 1100 . .......... 1 imm=%lob_imm
      }
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_LE(DisasContext *s, arg_LE *a)
-      * any faster.
-      */
-     TCGv_i32 tmp;
-+    TCGLabel *loopend;
-+    bool fpu_active;
-     if (!dc_isar_feature(aa32_lob, s)) {
-         return false;
-     }
-+    if (a->f && a->tp) {
-+        return false;
-+    }
-+    if (s->condexec_mask) {
-+        /*
-+         * LE in an IT block is CONSTRAINED UNPREDICTABLE;
-+         * we choose to UNDEF, because otherwise our use of
-+         * gen_goto_tb(1) would clash with the use of TB exit 1
-+         * in the dc->condjmp condition-failed codepath in
-+         * arm_tr_tb_stop() and we'd get an assertion.
-+         */
-+        return false;
-+    }
-+    if (a->tp) {
-+        /* LETP */
-+        if (!dc_isar_feature(aa32_mve, s)) {
-+            return false;
-+        }
-+        if (!vfp_access_check(s)) {
-+            s->eci_handled = true;
-+            return true;
-+        }
-+    }
-     /* LE/LETP is OK with ECI set and leaves it untouched */
-     s->eci_handled = true;
--    if (!a->f) {
--        /* Not loop-forever. If LR <= 1 this is the last loop: do nothing. */
--        arm_gen_condlabel(s);
--        tcg_gen_brcondi_i32(TCG_COND_LEU, cpu_R[14], 1, s->condlabel);
--        /* Decrement LR */
--        tmp = load_reg(s, 14);
--        tcg_gen_addi_i32(tmp, tmp, -1);
--        store_reg(s, 14, tmp);
-+    /*
-+     * With MVE, LTPSIZE might not be 4, and we must emit an INVSTATE
-+     * UsageFault exception for the LE insn in that case. Note that we
-+     * are not directly checking FPSCR.LTPSIZE but instead check the
-+     * pseudocode LTPSIZE() function, which returns 4 if the FPU is
-+     * not currently active (ie ActiveFPState() returns false). We
-+     * can identify not-active purely from our TB state flags, as the
-+     * FPU is active only if:
-+     *  the FPU is enabled
-+     *  AND lazy state preservation is not active
-+     *  AND we do not need a new fp context (this is the ASPEN/FPCA check)
-+     *
-+     * Usually we don't need to care about this distinction between
-+     * LTPSIZE and FPSCR.LTPSIZE, because the code in vfp_access_check()
-+     * will either take an exception or clear the conditions that make
-+     * the FPU not active. But LE is an unusual case of a non-FP insn
-+     * that looks at LTPSIZE.
-+     */
-+    fpu_active = !s->fp_excp_el && !s->v7m_lspact && !s->v7m_new_fp_ctxt_needed;
-+
-+    if (!a->tp && dc_isar_feature(aa32_mve, s) && fpu_active) {
-+        /* Need to do a runtime check for LTPSIZE != 4 */
-+        TCGLabel *skipexc = gen_new_label();
-+        tmp = load_cpu_field(v7m.ltpsize);
-+        tcg_gen_brcondi_i32(TCG_COND_EQ, tmp, 4, skipexc);
-+        tcg_temp_free_i32(tmp);
-+        gen_exception_insn(s, s->pc_curr, EXCP_INVSTATE, syn_uncategorized(),
-+                           default_exception_el(s));
-+        gen_set_label(skipexc);
-+    }
-+
-+    if (a->f) {
-+        /* Loop-forever: just jump back to the loop start */
-+        gen_jmp(s, read_pc(s) - a->imm);
-+        return true;
-+    }
-+
-+    /*
-+     * Not loop-forever. If LR <= loop-decrement-value this is the last loop.
-+     * For LE, we know at this point that LTPSIZE must be 4 and the
-+     * loop decrement value is 1. For LETP we need to calculate the decrement
-+     * value from LTPSIZE.
-+     */
-+    loopend = gen_new_label();
-+    if (!a->tp) {
-+        tcg_gen_brcondi_i32(TCG_COND_LEU, cpu_R[14], 1, loopend);
-+        tcg_gen_addi_i32(cpu_R[14], cpu_R[14], -1);
-+    } else {
-+        /*
-+         * Decrement by 1 << (4 - LTPSIZE). We need to use a TCG local
-+         * so that decr stays live after the brcondi.
-+         */
-+        TCGv_i32 decr = tcg_temp_local_new_i32();
-+        TCGv_i32 ltpsize = load_cpu_field(v7m.ltpsize);
-+        tcg_gen_sub_i32(decr, tcg_constant_i32(4), ltpsize);
-+        tcg_gen_shl_i32(decr, tcg_constant_i32(1), decr);
-+        tcg_temp_free_i32(ltpsize);
-+
-+        tcg_gen_brcond_i32(TCG_COND_LEU, cpu_R[14], decr, loopend);
-+
-+        tcg_gen_sub_i32(cpu_R[14], cpu_R[14], decr);
-+        tcg_temp_free_i32(decr);
-     }
-     /* Jump back to the loop start */
-     gen_jmp(s, read_pc(s) - a->imm);
-+
-+    gen_set_label(loopend);
-+    if (a->tp) {
-+        /* Exits from tail-pred loops must reset LTPSIZE to 4 */
-+        tmp = tcg_const_i32(4);
-+        store_cpu_field(tmp, v7m.ltpsize);
-+    }
-+    /* End TB, continuing to following insn */
-+    gen_jmp_tb(s, s->base.pc_next, 1);
-     return true;
  }
 --
-.20.1
+.25.1

-New patch
+[PULL 06/30] target/imx: reload cmp timer outside of the reload ptimer transaction
+From: Axel Heider <axel.heider@hensoldt.net>
+When running seL4 tests (https://docs.sel4.systems/projects/sel4test)
+on the sabrelight platform, the timer tests fail. The arm/imx6 EPIT
+timer interrupt does not fire properly, instead of a e.g. second in
+can take up to a minute to finally see the interrupt.
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1263
+Signed-off-by: Axel Heider <axel.heider@hensoldt.net>
+Message-id: 166663118138.13362.1229967229046092876-0@git.sr.ht
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/timer/imx_epit.c | 9 +++++++--
+file changed, 7 insertions(+), 2 deletions(-)
+diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/timer/imx_epit.c
++++ b/hw/timer/imx_epit.c
+@@ -XXX,XX +XXX,XX @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value,
+             /* If IOVW bit is set then set the timer value */
+             ptimer_set_count(s->timer_reload, s->lr);
+         }
+-
++        /*
++         * Commit the change to s->timer_reload, so it can propagate. Otherwise
++         * the timer interrupt may not fire properly. The commit must happen
++         * before calling imx_epit_reload_compare_timer(), which reads
++         * s->timer_reload internally again.
++         */
++        ptimer_transaction_commit(s->timer_reload);
+         imx_epit_reload_compare_timer(s);
+         ptimer_transaction_commit(s->timer_cmp);
+-        ptimer_transaction_commit(s->timer_reload);
+         break;
+     case 3: /* CMP */
+--
+.25.1

-[PULL 23/28] target/arm: Implement MVE DLSTP
+[PULL 07/30] target/arm: Introduce regime_is_stage2
-Implement the MVE DLSTP insn; this is like the existing DLS
+From: Richard Henderson <richard.henderson@linaro.org>
 insn, except that it must do an FPU access check and it
 sets LTPSIZE to the value specified in the insn.
+Reduce the amount of typing required for this check.
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20221024051851.3074715-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210614151007.4545-9-peter.maydell@linaro.org
 ---
- target/arm/t32.decode  |  9 ++++++---
+ target/arm/internals.h |  5 +++++
- target/arm/translate.c | 23 +++++++++++++++++++++--
+ target/arm/helper.c    | 14 +++++---------
-files changed, 27 insertions(+), 5 deletions(-)
+ target/arm/ptw.c       | 14 ++++++--------
 files changed, 16 insertions(+), 17 deletions(-)
-diff --git a/target/arm/t32.decode b/target/arm/t32.decode
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/t32.decode
+--- a/target/arm/internals.h
-+++ b/target/arm/t32.decode
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ BL               1111 0. .......... 11.1 ............         @branch24
+@@ -XXX,XX +XXX,XX @@ static inline bool regime_is_pan(CPUARMState *env, ARMMMUIdx mmu_idx)
      # LE and WLS immediate
      %lob_imm 1:10 11:1 !function=times_2
 -    DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001
 +    DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001 size=4
      WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm size=4
      {
        LE         1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
        # This is WLSTP
        WLS        1111 0 0000 0 size:2 rn:4 1100 . .......... 1 imm=%lob_imm
      }
--
--    LCTP         1111 0 0000 000     1111 1110 0000 0000 0001
-+    {
-+      LCTP       1111 0 0000 000     1111 1110 0000 0000 0001
-+      # This is DLSTP
-+      DLS        1111 0 0000 0 size:2 rn:4 1110 0000 0000 0001
-+    }
-   ]
  }
-diff --git a/target/arm/translate.c b/target/arm/translate.c
 +static inline bool regime_is_stage2(ARMMMUIdx mmu_idx)
 +{
 +    return mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S;
 +}
 +
  /* Return the exception level which controls this address translation regime */
  static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
  {
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_DLS(DisasContext *s, arg_DLS *a)
+@@ -XXX,XX +XXX,XX @@ int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
-         return false;
+ {
      if (regime_has_2_ranges(mmu_idx)) {
          return extract64(tcr, 37, 2);
 -    } else if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 +    } else if (regime_is_stage2(mmu_idx)) {
          return 0; /* VTCR_EL2 */
      } else {
          /* Replicate the single TBI bit so we always have 2 bits.  */
@@ -XXX,XX +XXX,XX @@ int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx)
  {
      if (regime_has_2_ranges(mmu_idx)) {
          return extract64(tcr, 51, 2);
 -    } else if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 +    } else if (regime_is_stage2(mmu_idx)) {
          return 0; /* VTCR_EL2 */
      } else {
          /* Replicate the single TBID bit so we always have 2 bits.  */
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
      int select, tsz, tbi, max_tsz, min_tsz, ps, sh;
      ARMGranuleSize gran;
      ARMCPU *cpu = env_archcpu(env);
 -    bool stage2 = mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S;
 +    bool stage2 = regime_is_stage2(mmu_idx);
      if (!regime_has_2_ranges(mmu_idx)) {
          select = 0;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
          }
          ds = false;
      } else if (ds) {
 -        switch (mmu_idx) {
 -        case ARMMMUIdx_Stage2:
 -        case ARMMMUIdx_Stage2_S:
 +        if (regime_is_stage2(mmu_idx)) {
              if (gran == Gran16K) {
                  ds = cpu_isar_feature(aa64_tgran16_2_lpa2, cpu);
              } else {
                  ds = cpu_isar_feature(aa64_tgran4_2_lpa2, cpu);
              }
 -            break;
 -        default:
 +        } else {
              if (gran == Gran16K) {
                  ds = cpu_isar_feature(aa64_tgran16_lpa2, cpu);
              } else {
                  ds = cpu_isar_feature(aa64_tgran4_lpa2, cpu);
              }
 -            break;
          }
          if (ds) {
              min_tsz = 12;
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
      bool have_wxn;
      int wxn = 0;
 -    assert(mmu_idx != ARMMMUIdx_Stage2);
 -    assert(mmu_idx != ARMMMUIdx_Stage2_S);
 +    assert(!regime_is_stage2(mmu_idx));
      user_rw = simple_ap_to_rw_prot_is_user(ap, true);
      if (is_user) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
          goto do_fault;
      }
-     if (a->rn == 13 || a->rn == 15) {
--        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+-    if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
-+        /*
++    if (!regime_is_stage2(mmu_idx)) {
-+         * For DLSTP rn == 15 is a related encoding (LCTP); the
+         /*
-+         * other cases caught by this condition are all
+          * The starting level depends on the virtual address size (which can
-+         * CONSTRAINED UNPREDICTABLE: we choose to UNDEF
+          * be up to 48 bits) and the translation granule size. It indicates
-+         */
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-         return false;
+         attrs = extract64(descriptor, 2, 10)
              | (extract64(descriptor, 52, 12) << 10);
 -        if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 +        if (regime_is_stage2(mmu_idx)) {
              /* Stage 2 table descriptors do not include any attribute fields */
              break;
          }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
      ap = extract32(attrs, 4, 2);
 -    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 +    if (regime_is_stage2(mmu_idx)) {
          ns = mmu_idx == ARMMMUIdx_Stage2;
          xn = extract32(attrs, 11, 2);
          result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
          result->f.guarded = guarded;
      }
--    /* Not a while loop, no tail predication: just set LR to the count */
+-    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
-+    if (a->size != 4) {
++    if (regime_is_stage2(mmu_idx)) {
-+        /* DLSTP */
+         result->cacheattrs.is_s2_format = true;
-+        if (!dc_isar_feature(aa32_mve, s)) {
+         result->cacheattrs.attrs = extract32(attrs, 0, 4);
-+            return false;
+     } else {
-+        }
+@@ -XXX,XX +XXX,XX @@ do_fault:
-+        if (!vfp_access_check(s)) {
+     fi->type = fault_type;
-+            return true;
+     fi->level = level;
-+        }
+     /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
-+    }
+-    fi->stage2 = fi->s1ptw || (mmu_idx == ARMMMUIdx_Stage2 ||
-+
+-                               mmu_idx == ARMMMUIdx_Stage2_S);
-+    /* Not a while loop: set LR to the count, and set LTPSIZE for DLSTP */
++    fi->stage2 = fi->s1ptw || regime_is_stage2(mmu_idx);
-     tmp = load_reg(s, a->rn);
+     fi->s1ns = mmu_idx == ARMMMUIdx_Stage2;
      store_reg(s, 14, tmp);
 +    if (a->size != 4) {
 +        /* DLSTP: set FPSCR.LTPSIZE */
 +        tmp = tcg_const_i32(a->size);
 +        store_cpu_field(tmp, v7m.ltpsize);
 +    }
      return true;
  }
 --
-.20.1
+.25.1

-[PULL 01/28] hw/intc/arm_gicv3_cpuif: Tolerate spurious EOIR writes
+[PULL 08/30] target/arm: Add ptw_idx to S1Translate
-From: Jean-Philippe Brucker <jean-philippe@linaro.org>
+From: Richard Henderson <richard.henderson@linaro.org>
-Commit 382c7160d1cd ("hw/intc/arm_gicv3_cpuif: Fix EOIR write access
+Hoist the computation of the mmu_idx for the ptw up to
-check logic") added an assert_not_reached() if the guest writes the EOIR
+get_phys_addr_with_struct and get_phys_addr_twostage.
-register while no interrupt is active.
+This removes the duplicate check for stage2 disabled
 from the middle of the walk, performing it only once.
-It turns out some software does this: EDK2, in
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 GicV3ExitBootServicesEvent(), unconditionally write EOIR for all
 interrupts that it manages. This now causes QEMU to abort when running
 UEFI on a VM with GICv3. Although it is UNPREDICTABLE behavior and EDK2
 does need fixing, the punishment seems a little harsh, especially since
 icc_eoir_write() already tolerates writes of nonexistent interrupt
 numbers. Display a guest error and tolerate spurious EOIR writes.
 Fixes: 382c7160d1cd ("hw/intc/arm_gicv3_cpuif: Fix EOIR write access check logic")
 Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20210604130352.1887560-1-jean-philippe@linaro.org
+Message-id: 20221024051851.3074715-3-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/intc/arm_gicv3_cpuif.c | 5 ++++-
+ target/arm/ptw.c | 71 ++++++++++++++++++++++++++++++++++++------------
-file changed, 4 insertions(+), 1 deletion(-)
+file changed, 54 insertions(+), 17 deletions(-)
-diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/arm_gicv3_cpuif.c
+--- a/target/arm/ptw.c
-+++ b/hw/intc/arm_gicv3_cpuif.c
++++ b/target/arm/ptw.c
 @@ -XXX,XX +XXX,XX @@
- #include "qemu/osdep.h"
+ typedef struct S1Translate {
- #include "qemu/bitops.h"
+     ARMMMUIdx in_mmu_idx;
-+#include "qemu/log.h"
++    ARMMMUIdx in_ptw_idx;
- #include "qemu/main-loop.h"
+     bool in_secure;
- #include "trace.h"
+     bool in_debug;
- #include "gicv3_internal.h"
+     bool out_secure;
-@@ -XXX,XX +XXX,XX @@ static void icc_eoir_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
  {
      bool is_secure = ptw->in_secure;
      ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
 -    ARMMMUIdx s2_mmu_idx = is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
 -    bool s2_phys = false;
 +    ARMMMUIdx s2_mmu_idx = ptw->in_ptw_idx;
      uint8_t pte_attrs;
      bool pte_secure;
 -    if (!arm_mmu_idx_is_stage1_of_2(mmu_idx)
 -        || regime_translation_disabled(env, s2_mmu_idx, is_secure)) {
 -        s2_mmu_idx = is_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
 -        s2_phys = true;
 -    }
 -
      if (unlikely(ptw->in_debug)) {
          /*
           * From gdbstub, do not use softmmu so that we don't modify the
           * state of the cpu at all, including softmmu tlb contents.
           */
 -        if (s2_phys) {
 -            ptw->out_phys = addr;
 -            pte_attrs = 0;
 -            pte_secure = is_secure;
 -        } else {
 +        if (regime_is_stage2(s2_mmu_idx)) {
              S1Translate s2ptw = {
                  .in_mmu_idx = s2_mmu_idx,
 +                .in_ptw_idx = is_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS,
                  .in_secure = is_secure,
                  .in_debug = true,
              };
              GetPhysAddrResult s2 = { };
 +
              if (!get_phys_addr_lpae(env, &s2ptw, addr, MMU_DATA_LOAD,
                                      false, &s2, fi)) {
                  goto fail;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
              ptw->out_phys = s2.f.phys_addr;
              pte_attrs = s2.cacheattrs.attrs;
              pte_secure = s2.f.attrs.secure;
 +        } else {
 +            /* Regime is physical. */
 +            ptw->out_phys = addr;
 +            pte_attrs = 0;
 +            pte_secure = is_secure;
          }
-         break;
+         ptw->out_host = NULL;
-     default:
+     } else {
--        g_assert_not_reached();
+@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
-+        qemu_log_mask(LOG_GUEST_ERROR,
+         pte_secure = full->attrs.secure;
 +                      "%s: IRQ %d isn't active\n", __func__, irq);
 +        return;
      }
-     icc_drop_prio(cs, grp);
+-    if (!s2_phys) {
 +    if (regime_is_stage2(s2_mmu_idx)) {
          uint64_t hcr = arm_hcr_el2_eff_secstate(env, is_secure);
          if ((hcr & HCR_PTW) && S2_attrs_are_device(hcr, pte_attrs)) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
          descaddr |= (address >> (stride * (4 - level))) & indexmask;
          descaddr &= ~7ULL;
          nstable = extract32(tableattrs, 4, 1);
 -        ptw->in_secure = !nstable;
 +        if (!nstable) {
 +            /*
 +             * Stage2_S -> Stage2 or Phys_S -> Phys_NS
 +             * Assert that the non-secure idx are even, and relative order.
 +             */
 +            QEMU_BUILD_BUG_ON((ARMMMUIdx_Phys_NS & 1) != 0);
 +            QEMU_BUILD_BUG_ON((ARMMMUIdx_Stage2 & 1) != 0);
 +            QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS + 1 != ARMMMUIdx_Phys_S);
 +            QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2 + 1 != ARMMMUIdx_Stage2_S);
 +            ptw->in_ptw_idx &= ~1;
 +            ptw->in_secure = false;
 +        }
          descriptor = arm_ldq_ptw(env, ptw, descaddr, fi);
          if (fi->type != ARMFault_None) {
              goto do_fault;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
      is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
      ptw->in_mmu_idx = s2walk_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
 +    ptw->in_ptw_idx = s2walk_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
      ptw->in_secure = s2walk_secure;
      /*
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
                                        ARMMMUFaultInfo *fi)
  {
      ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
 -    ARMMMUIdx s1_mmu_idx = stage_1_mmu_idx(mmu_idx);
      bool is_secure = ptw->in_secure;
 +    ARMMMUIdx s1_mmu_idx;
 -    if (mmu_idx != s1_mmu_idx) {
 +    switch (mmu_idx) {
 +    case ARMMMUIdx_Phys_S:
 +    case ARMMMUIdx_Phys_NS:
 +        /* Checking Phys early avoids special casing later vs regime_el. */
 +        return get_phys_addr_disabled(env, address, access_type, mmu_idx,
 +                                      is_secure, result, fi);
 +
 +    case ARMMMUIdx_Stage1_E0:
 +    case ARMMMUIdx_Stage1_E1:
 +    case ARMMMUIdx_Stage1_E1_PAN:
 +        /* First stage lookup uses second stage for ptw. */
 +        ptw->in_ptw_idx = is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
 +        break;
 +
 +    case ARMMMUIdx_E10_0:
 +        s1_mmu_idx = ARMMMUIdx_Stage1_E0;
 +        goto do_twostage;
 +    case ARMMMUIdx_E10_1:
 +        s1_mmu_idx = ARMMMUIdx_Stage1_E1;
 +        goto do_twostage;
 +    case ARMMMUIdx_E10_1_PAN:
 +        s1_mmu_idx = ARMMMUIdx_Stage1_E1_PAN;
 +    do_twostage:
          /*
           * Call ourselves recursively to do the stage 1 and then stage 2
           * translations if mmu_idx is a two-stage regime, and EL2 present.
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
              return get_phys_addr_twostage(env, ptw, address, access_type,
                                            result, fi);
          }
 +        /* fall through */
 +
 +    default:
 +        /* Single stage and second stage uses physical for ptw. */
 +        ptw->in_ptw_idx = is_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
 +        break;
      }
      /*
 --
-.20.1
+.25.1

-[PULL 28/28] include/qemu/int128.h: Add function to create Int128 from int64_t
+[PULL 09/30] target/arm: Add isar predicates for FEAT_HAFDBS
-int128_make64() creates an Int128 from an unsigned 64 bit value; add
+From: Richard Henderson <richard.henderson@linaro.org>
 a function int128_makes64() creating an Int128 from a signed 64 bit
 value.
+The MMFR1 field may indicate support for hardware update of
+access flag alone, or access flag and dirty bit.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20221024051851.3074715-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20210614151007.4545-34-peter.maydell@linaro.org
 ---
- include/qemu/int128.h | 10 ++++++++++
+ target/arm/cpu.h | 10 ++++++++++
 file changed, 10 insertions(+)
-diff --git a/include/qemu/int128.h b/include/qemu/int128.h
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/int128.h
+--- a/target/arm/cpu.h
-+++ b/include/qemu/int128.h
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_make64(uint64_t a)
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_e0pd(const ARMISARegisters *id)
-     return a;
+     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, E0PD) != 0;
  }
-+static inline Int128 int128_makes64(int64_t a)
++static inline bool isar_feature_aa64_hafs(const ARMISARegisters *id)
 +{
-+    return a;
++    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) != 0;
 +}
 +
- static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
++static inline bool isar_feature_aa64_hdbs(const ARMISARegisters *id)
  {
      return (__uint128_t)hi << 64 | lo;
@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_make64(uint64_t a)
      return (Int128) { a, 0 };
  }
 +static inline Int128 int128_makes64(int64_t a)
 +{
-+    return (Int128) { a, a >> 63 };
++    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) >= 2;
 +}
 +
- static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
+ static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
  {
-     return (Int128) { lo, hi };
+     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
 --
-.20.1
+.25.1

-[PULL 22/28] target/arm: Implement MVE WLSTP insn
+[PULL 10/30] target/arm: Extract HA and HD in aa64_va_parameters
-Implement the MVE WLSTP insn; this is like the existing WLS insn,
+From: Richard Henderson <richard.henderson@linaro.org>
 except that it specifies a size value which is used to set
 FPSCR.LTPSIZE.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Message-id: 20221024051851.3074715-5-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210614151007.4545-8-peter.maydell@linaro.org
 ---
- target/arm/t32.decode  |  8 ++++++--
+ target/arm/internals.h | 2 ++
- target/arm/translate.c | 37 ++++++++++++++++++++++++++++++++++++-
+ target/arm/helper.c    | 8 +++++++-
-files changed, 42 insertions(+), 3 deletions(-)
+files changed, 9 insertions(+), 1 deletion(-)
-diff --git a/target/arm/t32.decode b/target/arm/t32.decode
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/t32.decode
+--- a/target/arm/internals.h
-+++ b/target/arm/t32.decode
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ BL               1111 0. .......... 11.1 ............         @branch24
+@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
-     %lob_imm 1:10 11:1 !function=times_2
+     bool hpd        : 1;
+     bool tsz_oob    : 1;  /* tsz has been clamped to legal range */
-     DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001
+     bool ds         : 1;
--    WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm
++    bool ha         : 1;
--    LE           1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
++    bool hd         : 1;
-+    WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm size=4
+     ARMGranuleSize gran : 2;
-+    {
+ } ARMVAParameters;
-+      LE         1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
-+      # This is WLSTP
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 +      WLS        1111 0 0000 0 size:2 rn:4 1100 . .......... 1 imm=%lob_imm
 +    }
      LCTP         1111 0 0000 000     1111 1110 0000 0000 0001
    ]
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_WLS(DisasContext *s, arg_WLS *a)
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
-         return false;
+                                    ARMMMUIdx mmu_idx, bool data)
-     }
+ {
-     if (a->rn == 13 || a->rn == 15) {
+     uint64_t tcr = regime_tcr(env, mmu_idx);
--        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+-    bool epd, hpd, tsz_oob, ds;
-+        /*
++    bool epd, hpd, tsz_oob, ds, ha, hd;
-+         * For WLSTP rn == 15 is a related encoding (LE); the
+     int select, tsz, tbi, max_tsz, min_tsz, ps, sh;
-+         * other cases caught by this condition are all
+     ARMGranuleSize gran;
-+         * CONSTRAINED UNPREDICTABLE: we choose to UNDEF
+     ARMCPU *cpu = env_archcpu(env);
-+         */
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
-         return false;
+         epd = false;
-     }
+         sh = extract32(tcr, 12, 2);
-     if (s->condexec_mask) {
+         ps = extract32(tcr, 16, 3);
-@@ -XXX,XX +XXX,XX @@ static bool trans_WLS(DisasContext *s, arg_WLS *a)
++        ha = extract32(tcr, 21, 1) && cpu_isar_feature(aa64_hafs, cpu);
-          */
++        hd = extract32(tcr, 22, 1) && cpu_isar_feature(aa64_hdbs, cpu);
-         return false;
+         ds = extract64(tcr, 32, 1);
-     }
+     } else {
-+    if (a->size != 4) {
+         bool e0pd;
-+        /* WLSTP */
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
-+        if (!dc_isar_feature(aa32_mve, s)) {
+             e0pd = extract64(tcr, 56, 1);
-+            return false;
+         }
-+        }
+         ps = extract64(tcr, 32, 3);
-+        /*
++        ha = extract64(tcr, 39, 1) && cpu_isar_feature(aa64_hafs, cpu);
-+         * We need to check that the FPU is enabled here, but mustn't
++        hd = extract64(tcr, 40, 1) && cpu_isar_feature(aa64_hdbs, cpu);
-+         * call vfp_access_check() to do that because we don't want to
+         ds = extract64(tcr, 59, 1);
-+         * do the lazy state preservation in the "loop count is zero" case.
-+         * Do the check-and-raise-exception by hand.
+         if (e0pd && cpu_isar_feature(aa64_e0pd, cpu) &&
-+         */
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
-+        if (s->fp_excp_el) {
+         .hpd = hpd,
-+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+         .tsz_oob = tsz_oob,
-+                               syn_uncategorized(), s->fp_excp_el);
+         .ds = ds,
-+            return true;
++        .ha = ha,
-+        }
++        .hd = ha && hd,
-+    }
+         .gran = gran,
-+
+     };
-     nextlabel = gen_new_label();
+ }
      tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_R[a->rn], 0, nextlabel);
      tmp = load_reg(s, a->rn);
      store_reg(s, 14, tmp);
 +    if (a->size != 4) {
 +        /*
 +         * WLSTP: set FPSCR.LTPSIZE. This requires that we do the
 +         * lazy state preservation, new FP context creation, etc,
 +         * that vfp_access_check() does. We know that the actual
 +         * access check will succeed (ie it won't generate code that
 +         * throws an exception) because we did that check by hand earlier.
 +         */
 +        bool ok = vfp_access_check(s);
 +        assert(ok);
 +        tmp = tcg_const_i32(a->size);
 +        store_cpu_field(tmp, v7m.ltpsize);
 +    }
      gen_jmp_tb(s, s->base.pc_next, 1);
      gen_set_label(nextlabel);
 --
-.20.1
+.25.1

-[PULL 21/28] target/arm: Implement MVE LCTP
+[PULL 11/30] target/arm: Move S1_ptw_translate outside arm_ld[lq]_ptw
-Implement the MVE LCTP instruction.
+From: Richard Henderson <richard.henderson@linaro.org>
-We put its decode and implementation with the other
+Separate S1 translation from the actual lookup.
-low-overhead-branch insns because although it is only present if MVE
+Will enable lpae hardware updates.
 is implemented it is logically in the same group as the other LOB
 insns.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20221024051851.3074715-6-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210614151007.4545-7-peter.maydell@linaro.org
 ---
- target/arm/t32.decode  |  2 ++
+ target/arm/ptw.c | 41 ++++++++++++++++++++++-------------------
- target/arm/translate.c | 24 ++++++++++++++++++++++++
+file changed, 22 insertions(+), 19 deletions(-)
 files changed, 26 insertions(+)
-diff --git a/target/arm/t32.decode b/target/arm/t32.decode
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/t32.decode
+--- a/target/arm/ptw.c
-+++ b/target/arm/t32.decode
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ BL               1111 0. .......... 11.1 ............         @branch24
+@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
      DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001
      WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm
      LE           1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
 +
 +    LCTP         1111 0 0000 000     1111 1110 0000 0000 0001
    ]
  }
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
+ /* All loads done in the course of a page table walk go through here. */
---- a/target/arm/translate.c
+-static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw, hwaddr addr,
-+++ b/target/arm/translate.c
++static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw,
-@@ -XXX,XX +XXX,XX @@ static bool trans_LE(DisasContext *s, arg_LE *a)
+                             ARMMMUFaultInfo *fi)
-     return true;
+ {
      CPUState *cs = env_cpu(env);
      uint32_t data;
 -    if (!S1_ptw_translate(env, ptw, addr, fi)) {
 -        /* Failure. */
 -        assert(fi->s1ptw);
 -        return 0;
 -    }
 -
      if (likely(ptw->out_host)) {
          /* Page tables are in RAM, and we have the host address. */
          if (ptw->out_be) {
@@ -XXX,XX +XXX,XX @@ static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw, hwaddr addr,
      return data;
  }
-+static bool trans_LCTP(DisasContext *s, arg_LCTP *a)
+-static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw, hwaddr addr,
-+{
++static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw,
-+    /*
+                             ARMMMUFaultInfo *fi)
-+     * M-profile Loop Clear with Tail Predication. Since our implementation
+ {
-+     * doesn't cache branch information, all we need to do is reset
+     CPUState *cs = env_cpu(env);
-+     * FPSCR.LTPSIZE to 4.
+     uint64_t data;
-+     */
-+    TCGv_i32 ltpsize;
+-    if (!S1_ptw_translate(env, ptw, addr, fi)) {
-+
+-        /* Failure. */
-+    if (!dc_isar_feature(aa32_lob, s) ||
+-        assert(fi->s1ptw);
-+        !dc_isar_feature(aa32_mve, s)) {
+-        return 0;
-+        return false;
+-    }
 -
      if (likely(ptw->out_host)) {
          /* Page tables are in RAM, and we have the host address. */
          if (ptw->out_be) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v5(CPUARMState *env, S1Translate *ptw,
          fi->type = ARMFault_Translation;
          goto do_fault;
      }
 -    desc = arm_ldl_ptw(env, ptw, table, fi);
 +    if (!S1_ptw_translate(env, ptw, table, fi)) {
 +        goto do_fault;
 +    }
-+
++    desc = arm_ldl_ptw(env, ptw, fi);
-+    if (!vfp_access_check(s)) {
+     if (fi->type != ARMFault_None) {
-+        return true;
+         goto do_fault;
      }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v5(CPUARMState *env, S1Translate *ptw,
              /* Fine pagetable.  */
              table = (desc & 0xfffff000) | ((address >> 8) & 0xffc);
          }
 -        desc = arm_ldl_ptw(env, ptw, table, fi);
 +        if (!S1_ptw_translate(env, ptw, table, fi)) {
 +            goto do_fault;
 +        }
 +        desc = arm_ldl_ptw(env, ptw, fi);
          if (fi->type != ARMFault_None) {
              goto do_fault;
          }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, S1Translate *ptw,
          fi->type = ARMFault_Translation;
          goto do_fault;
      }
 -    desc = arm_ldl_ptw(env, ptw, table, fi);
 +    if (!S1_ptw_translate(env, ptw, table, fi)) {
 +        goto do_fault;
 +    }
-+
++    desc = arm_ldl_ptw(env, ptw, fi);
-+    ltpsize = tcg_const_i32(4);
+     if (fi->type != ARMFault_None) {
-+    store_cpu_field(ltpsize, v7m.ltpsize);
+         goto do_fault;
-+    return true;
+     }
-+}
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, S1Translate *ptw,
-+
+         ns = extract32(desc, 3, 1);
-+
+         /* Lookup l2 entry.  */
- static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
+         table = (desc & 0xfffffc00) | ((address >> 10) & 0x3fc);
- {
+-        desc = arm_ldl_ptw(env, ptw, table, fi);
-     TCGv_i32 addr, tmp;
++        if (!S1_ptw_translate(env, ptw, table, fi)) {
 +            goto do_fault;
 +        }
 +        desc = arm_ldl_ptw(env, ptw, fi);
          if (fi->type != ARMFault_None) {
              goto do_fault;
          }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
              ptw->in_ptw_idx &= ~1;
              ptw->in_secure = false;
          }
 -        descriptor = arm_ldq_ptw(env, ptw, descaddr, fi);
 +        if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
 +            goto do_fault;
 +        }
 +        descriptor = arm_ldq_ptw(env, ptw, fi);
          if (fi->type != ARMFault_None) {
              goto do_fault;
          }
 --
-.20.1
+.25.1

-[PULL 20/28] target/arm: Let vfp_access_check() handle late NOCP checks
+[PULL 12/30] target/arm: Add ARMFault_UnsuppAtomicUpdate
-In commit a3494d4671797c we reworked the M-profile handling of its
+From: Richard Henderson <richard.henderson@linaro.org>
 checks for when the NOCP exception should be raised because the FPU
 is disabled, so that (in line with the architecture) the NOCP check
 is done early over a large range of the encoding space, and takes
 precedence over UNDEF exceptions.  As part of this, we removed the
 code from full_vfp_access_check() which raised an exception there for
 M-profile with the FPU disabled, because it was no longer reachable.
-For MVE, some instructions which are outside the "coprocessor space"
+This fault type is to be used with FEAT_HAFDBS when
-region of the encoding space must nonetheless do "is the FPU enabled"
+the guest enables hw updates, but places the tables
-checks and possibly raise a NOCP exception.  (In particular this
+in memory where atomic updates are unsupported.
 covers the MVE-specific low-overhead branch insns LCTP, DLSTP and
 WLSTP.) To support these insns, reinstate the code in
 full_vfp_access_check(), so that their trans functions can call
 vfp_access_check() and get the correct behaviour.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Message-id: 20221024051851.3074715-7-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210614151007.4545-6-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.c | 20 +++++++++++++++-----
+ target/arm/internals.h | 4 ++++
-file changed, 15 insertions(+), 5 deletions(-)
+file changed, 4 insertions(+)
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
+--- a/target/arm/internals.h
-+++ b/target/arm/translate-vfp.c
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ static void gen_preserve_fp_state(DisasContext *s)
+@@ -XXX,XX +XXX,XX @@ typedef enum ARMFaultType {
- static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
+     ARMFault_AsyncExternal,
- {
+     ARMFault_Debug,
-     if (s->fp_excp_el) {
+     ARMFault_TLBConflict,
--        /* M-profile handled this earlier, in disas_m_nocp() */
++    ARMFault_UnsuppAtomicUpdate,
--        assert (!arm_dc_feature(s, ARM_FEATURE_M));
+     ARMFault_Lockdown,
--        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+     ARMFault_Exclusive,
--                           syn_fp_access_trap(1, 0xe, false),
+     ARMFault_ICacheMaint,
--                           s->fp_excp_el);
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t arm_fi_to_lfsc(ARMMMUFaultInfo *fi)
-+        if (arm_dc_feature(s, ARM_FEATURE_M)) {
+     case ARMFault_TLBConflict:
-+            /*
+         fsc = 0x30;
-+             * M-profile mostly catches the "FPU disabled" case early, in
+         break;
-+             * disas_m_nocp(), but a few insns (eg LCTP, WLSTP, DLSTP)
++    case ARMFault_UnsuppAtomicUpdate:
-+             * which do coprocessor-checks are outside the large ranges of
++        fsc = 0x31;
-+             * the encoding space handled by the patterns in m-nocp.decode,
++        break;
-+             * and for them we may need to raise NOCP here.
+     case ARMFault_Lockdown:
-+             */
+         fsc = 0x34;
-+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+         break;
 +                               syn_uncategorized(), s->fp_excp_el);
 +        } else {
 +            gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
 +                               syn_fp_access_trap(1, 0xe, false),
 +                               s->fp_excp_el);
 +        }
          return false;
      }
 --
-.20.1
+.25.1

-[PULL 12/28] target/arm: Fix mte page crossing test
+[PULL 13/30] target/arm: Remove loop from get_phys_addr_lpae
 From: Richard Henderson <richard.henderson@linaro.org>
-The test was off-by-one, because tag_last points to the
+The unconditional loop was used both to iterate over levels
-last byte of the tag to check, thus tag_last - prev_page
+and to control parsing of attributes.  Use an explicit goto
-will equal TARGET_PAGE_SIZE when we use the first byte
+in both cases.
-of the next page.
+While this appears less clean for iterating over levels, we
-Resolves: https://gitlab.com/qemu-project/qemu/-/issues/403
+will need to jump back into the middle of this loop for
-Reported-by: Peter Collingbourne <pcc@google.com>
+atomic updates, which is even uglier.
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210612195707.840217-1-richard.henderson@linaro.org
+Message-id: 20221024051851.3074715-8-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/mte_helper.c           |  2 +-
+ target/arm/ptw.c | 192 +++++++++++++++++++++++------------------------
- tests/tcg/aarch64/mte-7.c         | 31 +++++++++++++++++++++++++++++++
+file changed, 96 insertions(+), 96 deletions(-)
- tests/tcg/aarch64/Makefile.target |  2 +-
-files changed, 33 insertions(+), 2 deletions(-)
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
  create mode 100644 tests/tcg/aarch64/mte-7.c
 diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mte_helper.c
+--- a/target/arm/ptw.c
-+++ b/target/arm/mte_helper.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static int mte_probe_int(CPUARMState *env, uint32_t desc, uint64_t ptr,
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-     prev_page = ptr & TARGET_PAGE_MASK;
+     uint64_t descaddrmask;
-     next_page = prev_page + TARGET_PAGE_SIZE;
+     bool aarch64 = arm_el_is_aa64(env, el);
+     bool guarded = false;
--    if (likely(tag_last - prev_page <= TARGET_PAGE_SIZE)) {
++    uint64_t descriptor;
-+    if (likely(tag_last - prev_page < TARGET_PAGE_SIZE)) {
++    bool nstable;
-         /* Memory access stays on one page. */
-         tag_size = ((tag_byte_last - tag_byte_first) / (2 * TAG_GRANULE)) + 1;
+     /* TODO: This code does not support shareability levels. */
-         mem1 = allocation_tag_mem(env, mmu_idx, ptr, type, sizem1 + 1,
+     if (aarch64) {
-diff --git a/tests/tcg/aarch64/mte-7.c b/tests/tcg/aarch64/mte-7.c
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-new file mode 100644
+      * bits at each step.
-index XXXXXXX..XXXXXXX
+      */
---- /dev/null
+     tableattrs = is_secure ? 0 : (1 << 4);
-+++ b/tests/tcg/aarch64/mte-7.c
+-    for (;;) {
-@@ -XXX,XX +XXX,XX @@
+-        uint64_t descriptor;
-+/*
+-        bool nstable;
-+ * Memory tagging, unaligned access crossing pages.
+-
-+ * https://gitlab.com/qemu-project/qemu/-/issues/403
+-        descaddr |= (address >> (stride * (4 - level))) & indexmask;
-+ *
+-        descaddr &= ~7ULL;
-+ * Copyright (c) 2021 Linaro Ltd
+-        nstable = extract32(tableattrs, 4, 1);
-+ * SPDX-License-Identifier: GPL-2.0-or-later
+-        if (!nstable) {
-+ */
+-            /*
-+
+-             * Stage2_S -> Stage2 or Phys_S -> Phys_NS
-+#include "mte.h"
+-             * Assert that the non-secure idx are even, and relative order.
-+
+-             */
-+int main(int ac, char **av)
+-            QEMU_BUILD_BUG_ON((ARMMMUIdx_Phys_NS & 1) != 0);
-+{
+-            QEMU_BUILD_BUG_ON((ARMMMUIdx_Stage2 & 1) != 0);
-+    void *p;
+-            QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS + 1 != ARMMMUIdx_Phys_S);
-+
+-            QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2 + 1 != ARMMMUIdx_Stage2_S);
-+    enable_mte(PR_MTE_TCF_SYNC);
+-            ptw->in_ptw_idx &= ~1;
-+    p = alloc_mte_mem(2 * 0x1000);
+-            ptw->in_secure = false;
-+
+-        }
-+    /* Tag the pointer. */
+-        if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
-+    p = (void *)((unsigned long)p | (1ul << 56));
+-            goto do_fault;
-+
+-        }
-+    /* Store tag in sequential granules. */
+-        descriptor = arm_ldq_ptw(env, ptw, fi);
-+    asm("stg %0, [%0]" : : "r"(p + 0x0ff0));
+-        if (fi->type != ARMFault_None) {
-+    asm("stg %0, [%0]" : : "r"(p + 0x1000));
+-            goto do_fault;
 -        }
 -
 -        if (!(descriptor & 1) ||
 -            (!(descriptor & 2) && (level == 3))) {
 -            /* Invalid, or the Reserved level 3 encoding */
 -            goto do_fault;
 -        }
 -
 -        descaddr = descriptor & descaddrmask;
 + next_level:
 +    descaddr |= (address >> (stride * (4 - level))) & indexmask;
 +    descaddr &= ~7ULL;
 +    nstable = extract32(tableattrs, 4, 1);
 +    if (!nstable) {
          /*
 -         * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
 -         * of descriptor.  For FEAT_LPA2 and effective DS, bits [51:50] of
 -         * descaddr are in [9:8].  Otherwise, if descaddr is out of range,
 -         * raise AddressSizeFault.
 +         * Stage2_S -> Stage2 or Phys_S -> Phys_NS
 +         * Assert that the non-secure idx are even, and relative order.
           */
 -        if (outputsize > 48) {
 -            if (param.ds) {
 -                descaddr |= extract64(descriptor, 8, 2) << 50;
 -            } else {
 -                descaddr |= extract64(descriptor, 12, 4) << 48;
 -            }
 -        } else if (descaddr >> outputsize) {
 -            fault_type = ARMFault_AddressSize;
 -            goto do_fault;
 -        }
 -
 -        if ((descriptor & 2) && (level < 3)) {
 -            /*
 -             * Table entry. The top five bits are attributes which may
 -             * propagate down through lower levels of the table (and
 -             * which are all arranged so that 0 means "no effect", so
 -             * we can gather them up by ORing in the bits at each level).
 -             */
 -            tableattrs |= extract64(descriptor, 59, 5);
 -            level++;
 -            indexmask = indexmask_grainsize;
 -            continue;
 -        }
 -        /*
 -         * Block entry at level 1 or 2, or page entry at level 3.
 -         * These are basically the same thing, although the number
 -         * of bits we pull in from the vaddr varies. Note that although
 -         * descaddrmask masks enough of the low bits of the descriptor
 -         * to give a correct page or table address, the address field
 -         * in a block descriptor is smaller; so we need to explicitly
 -         * clear the lower bits here before ORing in the low vaddr bits.
 -         */
 -        page_size = (1ULL << ((stride * (4 - level)) + 3));
 -        descaddr &= ~(hwaddr)(page_size - 1);
 -        descaddr |= (address & (page_size - 1));
 -        /* Extract attributes from the descriptor */
 -        attrs = extract64(descriptor, 2, 10)
 -            | (extract64(descriptor, 52, 12) << 10);
 -
 -        if (regime_is_stage2(mmu_idx)) {
 -            /* Stage 2 table descriptors do not include any attribute fields */
 -            break;
 -        }
 -        /* Merge in attributes from table descriptors */
 -        attrs |= nstable << 3; /* NS */
 -        guarded = extract64(descriptor, 50, 1);  /* GP */
 -        if (param.hpd) {
 -            /* HPD disables all the table attributes except NSTable.  */
 -            break;
 -        }
 -        attrs |= extract32(tableattrs, 0, 2) << 11;     /* XN, PXN */
 -        /*
 -         * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
 -         * means "force PL1 access only", which means forcing AP[1] to 0.
 -         */
 -        attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
 -        attrs |= extract32(tableattrs, 3, 1) << 5;      /* APT[1] => AP[2] */
 -        break;
 +        QEMU_BUILD_BUG_ON((ARMMMUIdx_Phys_NS & 1) != 0);
 +        QEMU_BUILD_BUG_ON((ARMMMUIdx_Stage2 & 1) != 0);
 +        QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS + 1 != ARMMMUIdx_Phys_S);
 +        QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2 + 1 != ARMMMUIdx_Stage2_S);
 +        ptw->in_ptw_idx &= ~1;
 +        ptw->in_secure = false;
      }
 +    if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
 +        goto do_fault;
 +    }
 +    descriptor = arm_ldq_ptw(env, ptw, fi);
 +    if (fi->type != ARMFault_None) {
 +        goto do_fault;
 +    }
 +
 +    if (!(descriptor & 1) || (!(descriptor & 2) && (level == 3))) {
 +        /* Invalid, or the Reserved level 3 encoding */
 +        goto do_fault;
 +    }
 +
 +    descaddr = descriptor & descaddrmask;
 +
 +    /*
-+     * Perform an unaligned store with tag 1 crossing the pages.
++     * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
-+     * Failure dies with SIGSEGV.
++     * of descriptor.  For FEAT_LPA2 and effective DS, bits [51:50] of
 +     * descaddr are in [9:8].  Otherwise, if descaddr is out of range,
 +     * raise AddressSizeFault.
 +     */
-+    asm("str %0, [%0]" : : "r"(p + 0x0ffc));
++    if (outputsize > 48) {
-+    return 0;
++        if (param.ds) {
-+}
++            descaddr |= extract64(descriptor, 8, 2) << 50;
-diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
++        } else {
-index XXXXXXX..XXXXXXX 100644
++            descaddr |= extract64(descriptor, 12, 4) << 48;
---- a/tests/tcg/aarch64/Makefile.target
++        }
-+++ b/tests/tcg/aarch64/Makefile.target
++    } else if (descaddr >> outputsize) {
-@@ -XXX,XX +XXX,XX @@ AARCH64_TESTS += bti-2
++        fault_type = ARMFault_AddressSize;
++        goto do_fault;
- # MTE Tests
++    }
- ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_ARMV8_MTE),)
++
--AARCH64_TESTS += mte-1 mte-2 mte-3 mte-4 mte-5 mte-6
++    if ((descriptor & 2) && (level < 3)) {
-+AARCH64_TESTS += mte-1 mte-2 mte-3 mte-4 mte-5 mte-6 mte-7
++        /*
- mte-%: CFLAGS += -march=armv8.5-a+memtag
++         * Table entry. The top five bits are attributes which may
- endif
++         * propagate down through lower levels of the table (and
++         * which are all arranged so that 0 means "no effect", so
 +         * we can gather them up by ORing in the bits at each level).
 +         */
 +        tableattrs |= extract64(descriptor, 59, 5);
 +        level++;
 +        indexmask = indexmask_grainsize;
 +        goto next_level;
 +    }
 +
 +    /*
 +     * Block entry at level 1 or 2, or page entry at level 3.
 +     * These are basically the same thing, although the number
 +     * of bits we pull in from the vaddr varies. Note that although
 +     * descaddrmask masks enough of the low bits of the descriptor
 +     * to give a correct page or table address, the address field
 +     * in a block descriptor is smaller; so we need to explicitly
 +     * clear the lower bits here before ORing in the low vaddr bits.
 +     */
 +    page_size = (1ULL << ((stride * (4 - level)) + 3));
 +    descaddr &= ~(hwaddr)(page_size - 1);
 +    descaddr |= (address & (page_size - 1));
 +    /* Extract attributes from the descriptor */
 +    attrs = extract64(descriptor, 2, 10)
 +        | (extract64(descriptor, 52, 12) << 10);
 +
 +    if (regime_is_stage2(mmu_idx)) {
 +        /* Stage 2 table descriptors do not include any attribute fields */
 +        goto skip_attrs;
 +    }
 +    /* Merge in attributes from table descriptors */
 +    attrs |= nstable << 3; /* NS */
 +    guarded = extract64(descriptor, 50, 1);  /* GP */
 +    if (param.hpd) {
 +        /* HPD disables all the table attributes except NSTable.  */
 +        goto skip_attrs;
 +    }
 +    attrs |= extract32(tableattrs, 0, 2) << 11;     /* XN, PXN */
 +    /*
 +     * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
 +     * means "force PL1 access only", which means forcing AP[1] to 0.
 +     */
 +    attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
 +    attrs |= extract32(tableattrs, 3, 1) << 5;      /* APT[1] => AP[2] */
 + skip_attrs:
 +
      /*
       * Here descaddr is the final physical address, and attributes
       * are all in attrs.
 --
-.20.1
+.25.1

-[PULL 03/28] target/arm: Remove fprintf from disas_simd_mod_imm
+[PULL 14/30] target/arm: Fix fault reporting in get_phys_addr_lpae
 From: Richard Henderson <richard.henderson@linaro.org>
-The default of this switch is truly unreachable.
+Always overriding fi->type was incorrect, as we would not properly
-The switch selector is 3 bits, and all 8 cases are present.
+propagate the fault type from S1_ptw_translate, or arm_ldq_ptw.
 Simplify things by providing a new label for a translation fault.
 For other faults, store into fi directly.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20210604183506.916654-3-richard.henderson@linaro.org
+Message-id: 20221024051851.3074715-9-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 1 -
+ target/arm/ptw.c | 31 +++++++++++++------------------
-file changed, 1 deletion(-)
+file changed, 13 insertions(+), 18 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/ptw.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
      ARMCPU *cpu = env_archcpu(env);
      ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
      bool is_secure = ptw->in_secure;
 -    /* Read an LPAE long-descriptor translation table. */
 -    ARMFaultType fault_type = ARMFault_Translation;
      uint32_t level;
      ARMVAParameters param;
      uint64_t ttbr;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
           * so our choice is to always raise the fault.
           */
          if (param.tsz_oob) {
 -            fault_type = ARMFault_Translation;
 -            goto do_fault;
 +            goto do_translation_fault;
          }
-         break;
-     default:
+         addrsize = 64 - 8 * param.tbi;
--        fprintf(stderr, "%s: cmode_3_1: %x\n", __func__, cmode_3_1);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-         g_assert_not_reached();
+                                            addrsize - inputsize);
          if (-top_bits != param.select) {
              /* The gap between the two regions is a Translation fault */
 -            fault_type = ARMFault_Translation;
 -            goto do_fault;
 +            goto do_translation_fault;
          }
      }
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+          * Translation table walk disabled => Translation fault on TLB miss
+          * Note: This is always 0 on 64-bit EL2 and EL3.
+          */
+-        goto do_fault;
++        goto do_translation_fault;
+     }
+     if (!regime_is_stage2(mmu_idx)) {
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+         if (param.ds && stride == 9 && sl2) {
+             if (sl0 != 0) {
+                 level = 0;
+-                fault_type = ARMFault_Translation;
+-                goto do_fault;
++                goto do_translation_fault;
+             }
+             startlevel = -1;
+         } else if (!aarch64 || stride == 9) {
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+         ok = check_s2_mmu_setup(cpu, aarch64, startlevel,
+                                 inputsize, stride, outputsize);
+         if (!ok) {
+-            fault_type = ARMFault_Translation;
+-            goto do_fault;
++            goto do_translation_fault;
+         }
+         level = startlevel;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+         descaddr |= extract64(ttbr, 2, 4) << 48;
+     } else if (descaddr >> outputsize) {
+         level = 0;
+-        fault_type = ARMFault_AddressSize;
++        fi->type = ARMFault_AddressSize;
+         goto do_fault;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+     if (!(descriptor & 1) || (!(descriptor & 2) && (level == 3))) {
+         /* Invalid, or the Reserved level 3 encoding */
+-        goto do_fault;
++        goto do_translation_fault;
+     }
+     descaddr = descriptor & descaddrmask;
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+             descaddr |= extract64(descriptor, 12, 4) << 48;
+         }
+     } else if (descaddr >> outputsize) {
+-        fault_type = ARMFault_AddressSize;
++        fi->type = ARMFault_AddressSize;
+         goto do_fault;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+      * Here descaddr is the final physical address, and attributes
+      * are all in attrs.
+      */
+-    fault_type = ARMFault_AccessFlag;
+     if ((attrs & (1 << 8)) == 0) {
+         /* Access flag */
++        fi->type = ARMFault_AccessFlag;
+         goto do_fault;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+         result->f.prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
+     }
+-    fault_type = ARMFault_Permission;
+     if (!(result->f.prot & (1 << access_type))) {
++        fi->type = ARMFault_Permission;
+         goto do_fault;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+     result->f.lg_page_size = ctz64(page_size);
+     return false;
+-do_fault:
+-    fi->type = fault_type;
++ do_translation_fault:
++    fi->type = ARMFault_Translation;
++ do_fault:
+     fi->level = level;
+     /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
+     fi->stage2 = fi->s1ptw || regime_is_stage2(mmu_idx);
 --
-.20.1
+.25.1

-[PULL 19/28] target/arm: Add handling for PSR.ECI/ICI
+[PULL 15/30] target/arm: Don't shift attrs in get_phys_addr_lpae
-On A-profile, PSR bits [15:10][26:25] are always the IT state bits.
+From: Richard Henderson <richard.henderson@linaro.org>
 On M-profile, some of the reserved encodings of the IT state are used
 to instead indicate partial progress through instructions that were
 interrupted partway through by an exception and can be resumed.
-These resumable instructions fall into two categories:
+Leave the upper and lower attributes in the place they originate
 from in the descriptor.  Shifting them around is confusing, since
 one cannot read the bit numbers out of the manual.  Also, new
 attributes have been added which would alter the shifts.
-(1) load/store multiple instructions, where these bits are called
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-"ICI" and specify the register in the ldm/stm list where execution
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-should resume.  (Specifically: LDM, STM, VLDM, VSTM, VLLDM, VLSTM,
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-CLRM, VSCCLRM.)
+Message-id: 20221024051851.3074715-10-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/ptw.c | 31 +++++++++++++++----------------
 file changed, 15 insertions(+), 16 deletions(-)
-(2) MVE instructions subject to beatwise execution, where these bits
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 are called "ECI" and specify which beats in this and possibly also
 the following MVE insn have been executed.
 There are also a few insns (LE, LETP, and BKPT) which do not use the
 ICI/ECI bits but must leave them alone.
 Otherwise, we should raise an INVSTATE UsageFault for any attempt to
 execute an insn with non-zero ICI/ECI bits.
 So far we have been able to ignore ECI/ICI, because the architecture
 allows the IMPDEF choice of "always restart load/store multiple from
 the beginning regardless of ICI state", so the only thing we have
 been missing is that we don't raise the INVSTATE fault for bad guest
 code.  However, MVE requires that we honour ECI bits and do not
 rexecute beats of an insn that have already been executed.
 Add the support in the decoder for handling ECI/ICI:
  * identify the ECI/ICI case in the CONDEXEC TB flags
  * when a load/store multiple insn succeeds, it updates the ECI/ICI
    state (both in DisasContext and in the CPU state), and sets a flag
    to say that the ECI/ICI state was handled
  * if we find that the insn we just decoded did not handle the
    ECI/ICI state, we delete all the code that we just generated for
    it and instead emit the code to raise the INVFAULT.  This allows
    us to avoid having to update every non-MVE non-LDM/STM insn to
    make it check for "is ECI/ICI set?".
 We continue with our existing IMPDEF choice of not caring about the
 ICI state for the load/store multiples and simply restarting them
 from the beginning.  Because we don't allow interrupts in the middle
 of an insn, the only way we would see this state is if the guest set
 ICI manually on return from an exception handler, so it's a corner
 case which doesn't merit optimisation.
 ICI update for LDM/STM is simple -- it always zeroes the state.  ECI
 update for MVE beatwise insns will be a little more complex, since
 the ECI state may include information for the following insn.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210614151007.4545-5-peter.maydell@linaro.org
 ---
  target/arm/translate-a32.h    |   1 +
  target/arm/translate.h        |   9 +++
  target/arm/translate-m-nocp.c |  11 ++++
  target/arm/translate-vfp.c    |   6 ++
  target/arm/translate.c        | 111 ++++++++++++++++++++++++++++++++--
 files changed, 133 insertions(+), 5 deletions(-)
 diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a32.h
+--- a/target/arm/ptw.c
-+++ b/target/arm/translate-a32.h
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ long vfp_reg_offset(bool dp, unsigned reg);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
- long neon_full_reg_offset(unsigned reg);
+     hwaddr descaddr, indexmask, indexmask_grainsize;
- long neon_element_offset(int reg, int element, MemOp memop);
+     uint32_t tableattrs;
- void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
+     target_ulong page_size;
-+void clear_eci_state(DisasContext *s);
+-    uint32_t attrs;
++    uint64_t attrs;
- static inline TCGv_i32 load_cpu_offset(int offset)
+     int32_t stride;
- {
+     int addrsize, inputsize, outputsize;
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+     uint64_t tcr = regime_tcr(env, mmu_idx);
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
---- a/target/arm/translate.h
+     descaddr &= ~(hwaddr)(page_size - 1);
-+++ b/target/arm/translate.h
+     descaddr |= (address & (page_size - 1));
-@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
+     /* Extract attributes from the descriptor */
-     /* Thumb-2 conditional execution bits.  */
+-    attrs = extract64(descriptor, 2, 10)
-     int condexec_mask;
+-        | (extract64(descriptor, 52, 12) << 10);
-     int condexec_cond;
++    attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(52, 12));
-+    /* M-profile ECI/ICI exception-continuable instruction state */
-+    int eci;
+     if (regime_is_stage2(mmu_idx)) {
-+    /*
+         /* Stage 2 table descriptors do not include any attribute fields */
-+     * trans_ functions for insns which are continuable should set this true
+         goto skip_attrs;
 +     * after decode (ie after any UNDEF checks)
 +     */
 +    bool eci_handled;
 +    /* TCG op to rewind to if this turns out to be an invalid ECI state */
 +    TCGOp *insn_eci_rewind;
      int thumb;
      int sctlr_b;
      MemOp be_data;
 diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-m-nocp.c
 +++ b/target/arm/translate-m-nocp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
          unallocated_encoding(s);
          return true;
      }
-+
+     /* Merge in attributes from table descriptors */
-+    s->eci_handled = true;
+-    attrs |= nstable << 3; /* NS */
-+
++    attrs |= nstable << 5; /* NS */
-     /* If no fpu, NOP. */
+     guarded = extract64(descriptor, 50, 1);  /* GP */
-     if (!dc_isar_feature(aa32_vfp, s)) {
+     if (param.hpd) {
-+        clear_eci_state(s);
+         /* HPD disables all the table attributes except NSTable.  */
-         return true;
+         goto skip_attrs;
      }
+-    attrs |= extract32(tableattrs, 0, 2) << 11;     /* XN, PXN */
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
++    attrs |= extract64(tableattrs, 0, 2) << 53;     /* XN, PXN */
      /*
       * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
       * means "force PL1 access only", which means forcing AP[1] to 0.
       */
 -    attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
 -    attrs |= extract32(tableattrs, 3, 1) << 5;      /* APT[1] => AP[2] */
 +    attrs &= ~(extract64(tableattrs, 2, 1) << 6);   /* !APT[0] => AP[1] */
 +    attrs |= extract32(tableattrs, 3, 1) << 7;      /* APT[1] => AP[2] */
   skip_attrs:
      /*
       * Here descaddr is the final physical address, and attributes
       * are all in attrs.
       */
 -    if ((attrs & (1 << 8)) == 0) {
 +    if ((attrs & (1 << 10)) == 0) {
          /* Access flag */
          fi->type = ARMFault_AccessFlag;
          goto do_fault;
      }
-     tcg_temp_free_i32(fptr);
+-    ap = extract32(attrs, 4, 2);
-+    clear_eci_state(s);
++    ap = extract32(attrs, 6, 2);
-+
-     /* End the TB, because we have updated FP control bits */
+     if (regime_is_stage2(mmu_idx)) {
-     s->base.is_jmp = DISAS_UPDATE_EXIT;
+         ns = mmu_idx == ARMMMUIdx_Stage2;
-     return true;
+-        xn = extract32(attrs, 11, 2);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
++        xn = extract64(attrs, 53, 2);
-         return true;
+         result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
      } else {
 -        ns = extract32(attrs, 3, 1);
 -        xn = extract32(attrs, 12, 1);
 -        pxn = extract32(attrs, 11, 1);
 +        ns = extract32(attrs, 5, 1);
 +        xn = extract64(attrs, 54, 1);
 +        pxn = extract64(attrs, 53, 1);
          result->f.prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
      }
-+    s->eci_handled = true;
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-+
-     if (!dc_isar_feature(aa32_vfp_simd, s)) {
+     if (regime_is_stage2(mmu_idx)) {
-         /* NOP if we have neither FP nor MVE */
+         result->cacheattrs.is_s2_format = true;
-+        clear_eci_state(s);
+-        result->cacheattrs.attrs = extract32(attrs, 0, 4);
-         return true;
++        result->cacheattrs.attrs = extract32(attrs, 2, 4);
      } else {
          /* Index into MAIR registers for cache attributes */
 -        uint8_t attrindx = extract32(attrs, 0, 3);
 +        uint8_t attrindx = extract32(attrs, 2, 3);
          uint64_t mair = env->cp15.mair_el[regime_el(env, mmu_idx)];
          assert(attrindx <= 7);
          result->cacheattrs.is_s2_format = false;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
      if (param.ds) {
          result->cacheattrs.shareability = param.sh;
      } else {
 -        result->cacheattrs.shareability = extract32(attrs, 6, 2);
 +        result->cacheattrs.shareability = extract32(attrs, 8, 2);
      }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
+     result->f.phys_addr = descaddr;
          TCGv_i32 z32 = tcg_const_i32(0);
          store_cpu_field(z32, v7m.vpr);
      }
 +
 +    clear_eci_state(s);
      return true;
  }
 diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c
 +++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
          return false;
      }
 +    s->eci_handled = true;
 +
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
          tcg_temp_free_i32(addr);
      }
 +    clear_eci_state(s);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
          return false;
      }
 +    s->eci_handled = true;
 +
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
          tcg_temp_free_i32(addr);
      }
 +    clear_eci_state(s);
      return true;
  }
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline bool is_singlestepping(DisasContext *s)
      return s->base.singlestep_enabled || s->ss_active;
  }
 +void clear_eci_state(DisasContext *s)
 +{
 +    /*
 +     * Clear any ECI/ICI state: used when a load multiple/store
 +     * multiple insn executes.
 +     */
 +    if (s->eci) {
 +        TCGv_i32 tmp = tcg_const_i32(0);
 +        store_cpu_field(tmp, condexec_bits);
 +        s->eci = 0;
 +    }
 +}
 +
  static void gen_smul_dual(TCGv_i32 a, TCGv_i32 b)
  {
      TCGv_i32 tmp1 = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static bool trans_BKPT(DisasContext *s, arg_BKPT *a)
      if (!ENABLE_ARCH_5) {
          return false;
      }
 +    /* BKPT is OK with ECI set and leaves it untouched */
 +    s->eci_handled = true;
      if (arm_dc_feature(s, ARM_FEATURE_M) &&
          semihosting_enabled() &&
  #ifndef CONFIG_USER_ONLY
@@ -XXX,XX +XXX,XX @@ static bool op_stm(DisasContext *s, arg_ldst_block *a, int min_n)
          return true;
      }
 +    s->eci_handled = true;
 +
      addr = op_addr_block_pre(s, a, n);
      mem_idx = get_mem_index(s);
@@ -XXX,XX +XXX,XX @@ static bool op_stm(DisasContext *s, arg_ldst_block *a, int min_n)
      }
      op_addr_block_post(s, a, addr, n);
 +    clear_eci_state(s);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_ldm(DisasContext *s, arg_ldst_block *a, int min_n)
          return true;
      }
 +    s->eci_handled = true;
 +
      addr = op_addr_block_pre(s, a, n);
      mem_idx = get_mem_index(s);
      loaded_base = false;
@@ -XXX,XX +XXX,XX @@ static bool do_ldm(DisasContext *s, arg_ldst_block *a, int min_n)
          /* Must exit loop to check un-masked IRQs */
          s->base.is_jmp = DISAS_EXIT;
      }
 +    clear_eci_state(s);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_CLRM(DisasContext *s, arg_CLRM *a)
          return false;
      }
 +    s->eci_handled = true;
 +
      zero = tcg_const_i32(0);
      for (i = 0; i < 15; i++) {
          if (extract32(a->list, i, 1)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_CLRM(DisasContext *s, arg_CLRM *a)
          tcg_temp_free_i32(maskreg);
      }
      tcg_temp_free_i32(zero);
 +    clear_eci_state(s);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_LE(DisasContext *s, arg_LE *a)
          return false;
      }
 +    /* LE/LETP is OK with ECI set and leaves it untouched */
 +    s->eci_handled = true;
 +
      if (!a->f) {
          /* Not loop-forever. If LR <= 1 this is the last loop: do nothing. */
          arm_gen_condlabel(s);
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
      dc->thumb = EX_TBFLAG_AM32(tb_flags, THUMB);
      dc->be_data = EX_TBFLAG_ANY(tb_flags, BE_DATA) ? MO_BE : MO_LE;
      condexec = EX_TBFLAG_AM32(tb_flags, CONDEXEC);
 -    dc->condexec_mask = (condexec & 0xf) << 1;
 -    dc->condexec_cond = condexec >> 4;
 +    /*
 +     * the CONDEXEC TB flags are CPSR bits [15:10][26:25]. On A-profile this
 +     * is always the IT bits. On M-profile, some of the reserved encodings
 +     * of IT are used instead to indicate either ICI or ECI, which
 +     * indicate partial progress of a restartable insn that was interrupted
 +     * partway through by an exception:
 +     *  * if CONDEXEC[3:0] != 0b0000 : CONDEXEC is IT bits
 +     *  * if CONDEXEC[3:0] == 0b0000 : CONDEXEC is ICI or ECI bits
 +     * In all cases CONDEXEC == 0 means "not in IT block or restartable
 +     * insn, behave normally".
 +     */
 +    dc->eci = dc->condexec_mask = dc->condexec_cond = 0;
 +    dc->eci_handled = false;
 +    dc->insn_eci_rewind = NULL;
 +    if (condexec & 0xf) {
 +        dc->condexec_mask = (condexec & 0xf) << 1;
 +        dc->condexec_cond = condexec >> 4;
 +    } else {
 +        if (arm_feature(env, ARM_FEATURE_M)) {
 +            dc->eci = condexec >> 4;
 +        }
 +    }
      core_mmu_idx = EX_TBFLAG_ANY(tb_flags, MMUIDX);
      dc->mmu_idx = core_to_arm_mmu_idx(env, core_mmu_idx);
@@ -XXX,XX +XXX,XX @@ static void arm_tr_tb_start(DisasContextBase *dcbase, CPUState *cpu)
  static void arm_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
  {
      DisasContext *dc = container_of(dcbase, DisasContext, base);
 +    /*
 +     * The ECI/ICI bits share PSR bits with the IT bits, so we
 +     * need to reconstitute the bits from the split-out DisasContext
 +     * fields here.
 +     */
 +    uint32_t condexec_bits;
 -    tcg_gen_insn_start(dc->base.pc_next,
 -                       (dc->condexec_cond << 4) | (dc->condexec_mask >> 1),
 -                       0);
 +    if (dc->eci) {
 +        condexec_bits = dc->eci << 4;
 +    } else {
 +        condexec_bits = (dc->condexec_cond << 4) | (dc->condexec_mask >> 1);
 +    }
 +    tcg_gen_insn_start(dc->base.pc_next, condexec_bits, 0);
      dc->insn_start = tcg_last_op();
  }
@@ -XXX,XX +XXX,XX @@ static void thumb_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
      }
      dc->insn = insn;
 +    if (dc->eci) {
 +        /*
 +         * For M-profile continuable instructions, ECI/ICI handling
 +         * falls into these cases:
 +         *  - interrupt-continuable instructions
 +         *     These are the various load/store multiple insns (both
 +         *     integer and fp). The ICI bits indicate the register
 +         *     where the load/store can resume. We make the IMPDEF
 +         *     choice to always do "instruction restart", ie ignore
 +         *     the ICI value and always execute the ldm/stm from the
 +         *     start. So all we need to do is zero PSR.ICI if the
 +         *     insn executes.
 +         *  - MVE instructions subject to beat-wise execution
 +         *     Here the ECI bits indicate which beats have already been
 +         *     executed, and we must honour this. Each insn of this
 +         *     type will handle it correctly. We will update PSR.ECI
 +         *     in the helper function for the insn (some ECI values
 +         *     mean that the following insn also has been partially
 +         *     executed).
 +         *  - Special cases which don't advance ECI
 +         *     The insns LE, LETP and BKPT leave the ECI/ICI state
 +         *     bits untouched.
 +         *  - all other insns (the common case)
 +         *     Non-zero ECI/ICI means an INVSTATE UsageFault.
 +         *     We place a rewind-marker here. Insns in the previous
 +         *     three categories will set a flag in the DisasContext.
 +         *     If the flag isn't set after we call disas_thumb_insn()
 +         *     or disas_thumb2_insn() then we know we have a "some other
 +         *     insn" case. We will rewind to the marker (ie throwing away
 +         *     all the generated code) and instead emit "take exception".
 +         */
 +        dc->insn_eci_rewind = tcg_last_op();
 +    }
 +
      if (dc->condexec_mask && !thumb_insn_is_unconditional(dc, insn)) {
          uint32_t cond = dc->condexec_cond;
@@ -XXX,XX +XXX,XX @@ static void thumb_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
          }
      }
 +    if (dc->eci && !dc->eci_handled) {
 +        /*
 +         * Insn wasn't valid for ECI/ICI at all: undo what we
 +         * just generated and instead emit an exception
 +         */
 +        tcg_remove_ops_after(dc->insn_eci_rewind);
 +        dc->condjmp = 0;
 +        gen_exception_insn(dc, dc->pc_curr, EXCP_INVSTATE, syn_uncategorized(),
 +                           default_exception_el(dc));
 +    }
 +
      arm_post_translate_insn(dc);
      /* Thumb is a variable-length ISA.  Stop translation when the next insn
 --
-.20.1
+.25.1

-[PULL 17/28] target/arm: Enable FPSCR.QC bit for MVE
+[PULL 16/30] target/arm: Consider GP an attribute in get_phys_addr_lpae
-MVE has an FPSCR.QC bit similar to the A-profile Neon one; when MVE
+From: Richard Henderson <richard.henderson@linaro.org>
 is implemented make the bit writeable, both in the generic "load and
 store FPSCR" helper functions and in the code for handling the NZCVQC
 sysreg which we had previously left as "TODO when we implement MVE".
+Both GP and DBM are in the upper attribute block.
+Extend the computation of attrs to include them,
+then simplify the setting of guarded.
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Message-id: 20221024051851.3074715-11-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210614151007.4545-3-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.c | 30 +++++++++++++++++++++---------
+ target/arm/ptw.c | 6 ++----
- target/arm/vfp_helper.c    |  3 ++-
+file changed, 2 insertions(+), 4 deletions(-)
 files changed, 23 insertions(+), 10 deletions(-)
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
+--- a/target/arm/ptw.c
-+++ b/target/arm/translate-vfp.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-     {
+     uint32_t el = regime_el(env, mmu_idx);
-         TCGv_i32 fpscr;
+     uint64_t descaddrmask;
-         tmp = loadfn(s, opaque);
+     bool aarch64 = arm_el_is_aa64(env, el);
--        /*
+-    bool guarded = false;
--         * TODO: when we implement MVE, write the QC bit.
+     uint64_t descriptor;
--         * For non-MVE, QC is RES0.
+     bool nstable;
--         */
-+        if (dc_isar_feature(aa32_mve, s)) {
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-+            /* QC is only present for MVE; otherwise RES0 */
+     descaddr &= ~(hwaddr)(page_size - 1);
-+            TCGv_i32 qc = tcg_temp_new_i32();
+     descaddr |= (address & (page_size - 1));
-+            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
+     /* Extract attributes from the descriptor */
-+            /*
+-    attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(52, 12));
-+             * The 4 vfp.qc[] fields need only be "zero" vs "non-zero";
++    attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
-+             * here writing the same value into all elements is simplest.
-+             */
+     if (regime_is_stage2(mmu_idx)) {
-+            tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
+         /* Stage 2 table descriptors do not include any attribute fields */
-+                                 16, 16, qc);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
 +        }
          tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
          fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
          tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
          break;
      }
+     /* Merge in attributes from table descriptors */
-+    if (regno == ARM_VFP_FPSCR_NZCVQC && !dc_isar_feature(aa32_mve, s)) {
+     attrs |= nstable << 5; /* NS */
-+        /* QC is RES0 without MVE, so NZCVQC simplifies to NZCV */
+-    guarded = extract64(descriptor, 50, 1);  /* GP */
-+        regno = QEMU_VFP_FPSCR_NZCV;
+     if (param.hpd) {
-+    }
+         /* HPD disables all the table attributes except NSTable.  */
-+
+         goto skip_attrs;
-     switch (regno) {
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-     case ARM_VFP_FPSCR:
-         tmp = tcg_temp_new_i32();
+     /* When in aarch64 mode, and BTI is enabled, remember GP in the TLB.  */
-@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
+     if (aarch64 && cpu_isar_feature(aa64_bti, cpu)) {
-         storefn(s, opaque, tmp);
+-        result->f.guarded = guarded;
-         break;
++        result->f.guarded = extract64(attrs, 50, 1); /* GP */
      case ARM_VFP_FPSCR_NZCVQC:
 -        /*
 -         * TODO: MVE has a QC bit, which we probably won't store
 -         * in the xregs[] field. For non-MVE, where QC is RES0,
 -         * we can just fall through to the FPSCR_NZCV case.
 -         */
 +        tmp = tcg_temp_new_i32();
 +        gen_helper_vfp_get_fpscr(tmp, cpu_env);
 +        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
 +        storefn(s, opaque, tmp);
 +        break;
      case QEMU_VFP_FPSCR_NZCV:
          /*
           * Read just NZCV; this is a special case to avoid the
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
                                       FPCR_LTPSIZE_LENGTH);
      }
--    if (arm_feature(env, ARM_FEATURE_NEON)) {
+     if (regime_is_stage2(mmu_idx)) {
 +    if (arm_feature(env, ARM_FEATURE_NEON) ||
 +        cpu_isar_feature(aa32_mve, cpu)) {
          /*
           * The bit we set within fpscr_q is arbitrary; the register as a
           * whole being zero/non-zero is what counts.
 --
-.20.1
+.25.1

-[PULL 04/28] target/arm: Diagnose UNALLOCATED in disas_simd_three_reg_same_fp16
+[PULL 17/30] target/arm: Tidy merging of attributes from descriptor and table
 From: Richard Henderson <richard.henderson@linaro.org>
-This fprintf+assert has been in place since the beginning.
+Replace some gotos with some nested if statements.
 It is after to the fp_access_check, so we need to move the
 check up.  Fold that in to the pairwise filter.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20210604183506.916654-4-richard.henderson@linaro.org
+Message-id: 20221024051851.3074715-12-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 82 +++++++++++++++++++++++---------------
+ target/arm/ptw.c | 34 ++++++++++++++++------------------
-file changed, 50 insertions(+), 32 deletions(-)
+file changed, 16 insertions(+), 18 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/ptw.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-  */
+     page_size = (1ULL << ((stride * (4 - level)) + 3));
- static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
+     descaddr &= ~(hwaddr)(page_size - 1);
- {
+     descaddr |= (address & (page_size - 1));
--    int opcode, fpopcode;
+-    /* Extract attributes from the descriptor */
--    int is_q, u, a, rm, rn, rd;
+-    attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
--    int datasize, elements;
--    int pass;
+-    if (regime_is_stage2(mmu_idx)) {
-+    int opcode = extract32(insn, 11, 3);
+-        /* Stage 2 table descriptors do not include any attribute fields */
-+    int u = extract32(insn, 29, 1);
+-        goto skip_attrs;
-+    int a = extract32(insn, 23, 1);
+-    }
-+    int is_q = extract32(insn, 30, 1);
+-    /* Merge in attributes from table descriptors */
-+    int rm = extract32(insn, 16, 5);
+-    attrs |= nstable << 5; /* NS */
-+    int rn = extract32(insn, 5, 5);
+-    if (param.hpd) {
-+    int rd = extract32(insn, 0, 5);
+-        /* HPD disables all the table attributes except NSTable.  */
-+    /*
+-        goto skip_attrs;
-+     * For these floating point ops, the U, a and opcode bits
+-    }
-+     * together indicate the operation.
+-    attrs |= extract64(tableattrs, 0, 2) << 53;     /* XN, PXN */
-+     */
+     /*
-+    int fpopcode = opcode | (a << 3) | (u << 4);
+-     * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
-+    int datasize = is_q ? 128 : 64;
+-     * means "force PL1 access only", which means forcing AP[1] to 0.
-+    int elements = datasize / 16;
++     * Extract attributes from the descriptor, and apply table descriptors.
-+    bool pairwise;
++     * Stage 2 table descriptors do not include any attribute fields.
-     TCGv_ptr fpst;
++     * HPD disables all the table attributes except NSTable.
--    bool pairwise = false;
+      */
-+    int pass;
+-    attrs &= ~(extract64(tableattrs, 2, 1) << 6);   /* !APT[0] => AP[1] */
-+
+-    attrs |= extract32(tableattrs, 3, 1) << 7;      /* APT[1] => AP[2] */
-+    switch (fpopcode) {
+- skip_attrs:
-+    case 0x0: /* FMAXNM */
++    attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
-+    case 0x1: /* FMLA */
++    if (!regime_is_stage2(mmu_idx)) {
-+    case 0x2: /* FADD */
++        attrs |= nstable << 5; /* NS */
-+    case 0x3: /* FMULX */
++        if (!param.hpd) {
-+    case 0x4: /* FCMEQ */
++            attrs |= extract64(tableattrs, 0, 2) << 53;     /* XN, PXN */
-+    case 0x6: /* FMAX */
++            /*
-+    case 0x7: /* FRECPS */
++             * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
-+    case 0x8: /* FMINNM */
++             * means "force PL1 access only", which means forcing AP[1] to 0.
-+    case 0x9: /* FMLS */
++             */
-+    case 0xa: /* FSUB */
++            attrs &= ~(extract64(tableattrs, 2, 1) << 6); /* !APT[0] => AP[1] */
-+    case 0xe: /* FMIN */
++            attrs |= extract32(tableattrs, 3, 1) << 7;    /* APT[1] => AP[2] */
-+    case 0xf: /* FRSQRTS */
++        }
 +    case 0x13: /* FMUL */
 +    case 0x14: /* FCMGE */
 +    case 0x15: /* FACGE */
 +    case 0x17: /* FDIV */
 +    case 0x1a: /* FABD */
 +    case 0x1c: /* FCMGT */
 +    case 0x1d: /* FACGT */
 +        pairwise = false;
 +        break;
 +    case 0x10: /* FMAXNMP */
 +    case 0x12: /* FADDP */
 +    case 0x16: /* FMAXP */
 +    case 0x18: /* FMINNMP */
 +    case 0x1e: /* FMINP */
 +        pairwise = true;
 +        break;
 +    default:
 +        unallocated_encoding(s);
 +        return;
 +    }
-     if (!dc_isar_feature(aa64_fp16, s)) {
+     /*
-         unallocated_encoding(s);
+      * Here descaddr is the final physical address, and attributes
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
          return;
      }
 -    /* For these floating point ops, the U, a and opcode bits
 -     * together indicate the operation.
 -     */
 -    opcode = extract32(insn, 11, 3);
 -    u = extract32(insn, 29, 1);
 -    a = extract32(insn, 23, 1);
 -    is_q = extract32(insn, 30, 1);
 -    rm = extract32(insn, 16, 5);
 -    rn = extract32(insn, 5, 5);
 -    rd = extract32(insn, 0, 5);
 -
 -    fpopcode = opcode | (a << 3) |  (u << 4);
 -    datasize = is_q ? 128 : 64;
 -    elements = datasize / 16;
 -
 -    switch (fpopcode) {
 -    case 0x10: /* FMAXNMP */
 -    case 0x12: /* FADDP */
 -    case 0x16: /* FMAXP */
 -    case 0x18: /* FMINNMP */
 -    case 0x1e: /* FMINP */
 -        pairwise = true;
 -        break;
 -    }
 -
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      if (pairwise) {
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
                  gen_helper_advsimd_acgt_f16(tcg_res, tcg_op1, tcg_op2, fpst);
                  break;
              default:
 -                fprintf(stderr, "%s: insn 0x%04x, fpop 0x%2x @ 0x%" PRIx64 "\n",
 -                        __func__, insn, fpopcode, s->pc_curr);
                  g_assert_not_reached();
              }
 --
-.20.1
+.25.1

-[PULL 14/28] hw/arm: gsj add pca9548
+[PULL 18/30] target/arm: Implement FEAT_HAFDBS, access flag portion
-From: Patrick Venture <venture@google.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Tested: Quanta-gsj firmware booted.
+Perform the atomic update for hardware management of the access flag.
-i2c /dev entries driver
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-I2C init bus 1 freq 100000
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-I2C init bus 2 freq 100000
+Message-id: 20221024051851.3074715-13-richard.henderson@linaro.org
 I2C init bus 3 freq 100000
 I2C init bus 4 freq 100000
 I2C init bus 8 freq 100000
 I2C init bus 9 freq 100000
 at24 9-0055: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
 I2C init bus 10 freq 100000
 at24 10-0055: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
 I2C init bus 12 freq 100000
 I2C init bus 15 freq 100000
 i2c i2c-15: Added multiplexed i2c bus 16
 i2c i2c-15: Added multiplexed i2c bus 17
 i2c i2c-15: Added multiplexed i2c bus 18
 i2c i2c-15: Added multiplexed i2c bus 19
 i2c i2c-15: Added multiplexed i2c bus 20
 i2c i2c-15: Added multiplexed i2c bus 21
 i2c i2c-15: Added multiplexed i2c bus 22
 i2c i2c-15: Added multiplexed i2c bus 23
 pca954x 15-0075: registered 8 multiplexed busses for I2C switch pca9548
 Signed-off-by: Patrick Venture <venture@google.com>
 Reviewed-by: Hao Wu <wuhaotsh@google.com>
 Reviewed-by: Joel Stanley <joel@jms.id.au>
 Message-id: 20210608202522.2677850-3-venture@google.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/npcm7xx_boards.c | 6 ++----
+ docs/system/arm/emulation.rst |   1 +
- hw/arm/Kconfig          | 1 +
+ target/arm/cpu64.c            |   1 +
-files changed, 3 insertions(+), 4 deletions(-)
+ target/arm/ptw.c              | 176 +++++++++++++++++++++++++++++-----
 files changed, 156 insertions(+), 22 deletions(-)
-diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/npcm7xx_boards.c
+--- a/docs/system/arm/emulation.rst
-+++ b/hw/arm/npcm7xx_boards.c
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
+ - FEAT_FlagM (Flag manipulation instructions v2)
- #include "hw/arm/npcm7xx.h"
+ - FEAT_FlagM2 (Enhancements to flag manipulation instructions)
- #include "hw/core/cpu.h"
+ - FEAT_GTG (Guest translation granule size)
-+#include "hw/i2c/i2c_mux_pca954x.h"
++- FEAT_HAFDBS (Hardware management of the access flag and dirty bit state)
- #include "hw/i2c/smbus_eeprom.h"
+ - FEAT_HCX (Support for the HCRX_EL2 register)
- #include "hw/loader.h"
+ - FEAT_HPDS (Hierarchical permission disables)
- #include "hw/qdev-core.h"
+ - FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
-@@ -XXX,XX +XXX,XX @@ static void quanta_gsj_i2c_init(NPCM7xxState *soc)
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
-      * - ucd90160@6b
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
      cpu->isar.id_aa64mmfr0 = t;
      t = cpu->isar.id_aa64mmfr1;
 +    t = FIELD_DP64(t, ID_AA64MMFR1, HAFDBS, 1);   /* FEAT_HAFDBS, AF only */
      t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* FEAT_VMID16 */
      t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);       /* FEAT_VHE */
      t = FIELD_DP64(t, ID_AA64MMFR1, HPDS, 1);     /* FEAT_HPDS */
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
      bool in_secure;
      bool in_debug;
      bool out_secure;
 +    bool out_rw;
      bool out_be;
 +    hwaddr out_virt;
      hwaddr out_phys;
      void *out_host;
  } S1Translate;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
      uint8_t pte_attrs;
      bool pte_secure;
 +    ptw->out_virt = addr;
 +
      if (unlikely(ptw->in_debug)) {
          /*
           * From gdbstub, do not use softmmu so that we don't modify the
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
              pte_secure = is_secure;
          }
          ptw->out_host = NULL;
 +        ptw->out_rw = false;
      } else {
          CPUTLBEntryFull *full;
          int flags;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
              goto fail;
          }
          ptw->out_phys = full->phys_addr;
 +        ptw->out_rw = full->prot & PROT_WRITE;
          pte_attrs = full->pte_attrs;
          pte_secure = full->attrs.secure;
      }
@@ -XXX,XX +XXX,XX @@ static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw,
                              ARMMMUFaultInfo *fi)
  {
      CPUState *cs = env_cpu(env);
 +    void *host = ptw->out_host;
      uint32_t data;
 -    if (likely(ptw->out_host)) {
 +    if (likely(host)) {
          /* Page tables are in RAM, and we have the host address. */
 +        data = qatomic_read((uint32_t *)host);
          if (ptw->out_be) {
 -            data = ldl_be_p(ptw->out_host);
 +            data = be32_to_cpu(data);
          } else {
 -            data = ldl_le_p(ptw->out_host);
 +            data = le32_to_cpu(data);
          }
      } else {
          /* Page tables are in MMIO. */
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw,
                              ARMMMUFaultInfo *fi)
  {
      CPUState *cs = env_cpu(env);
 +    void *host = ptw->out_host;
      uint64_t data;
 -    if (likely(ptw->out_host)) {
 +    if (likely(host)) {
          /* Page tables are in RAM, and we have the host address. */
 +#ifdef CONFIG_ATOMIC64
 +        data = qatomic_read__nocheck((uint64_t *)host);
          if (ptw->out_be) {
 -            data = ldq_be_p(ptw->out_host);
 +            data = be64_to_cpu(data);
          } else {
 -            data = ldq_le_p(ptw->out_host);
 +            data = le64_to_cpu(data);
          }
 +#else
 +        if (ptw->out_be) {
 +            data = ldq_be_p(host);
 +        } else {
 +            data = ldq_le_p(host);
 +        }
 +#endif
      } else {
          /* Page tables are in MMIO. */
          MemTxAttrs attrs = { .secure = ptw->out_secure };
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw,
      return data;
  }
 +static uint64_t arm_casq_ptw(CPUARMState *env, uint64_t old_val,
 +                             uint64_t new_val, S1Translate *ptw,
 +                             ARMMMUFaultInfo *fi)
 +{
 +    uint64_t cur_val;
 +    void *host = ptw->out_host;
 +
 +    if (unlikely(!host)) {
 +        fi->type = ARMFault_UnsuppAtomicUpdate;
 +        fi->s1ptw = true;
 +        return 0;
 +    }
 +
 +    /*
 +     * Raising a stage2 Protection fault for an atomic update to a read-only
 +     * page is delayed until it is certain that there is a change to make.
 +     */
 +    if (unlikely(!ptw->out_rw)) {
 +        int flags;
 +        void *discard;
 +
 +        env->tlb_fi = fi;
 +        flags = probe_access_flags(env, ptw->out_virt, MMU_DATA_STORE,
 +                                   arm_to_core_mmu_idx(ptw->in_ptw_idx),
 +                                   true, &discard, 0);
 +        env->tlb_fi = NULL;
 +
 +        if (unlikely(flags & TLB_INVALID_MASK)) {
 +            assert(fi->type != ARMFault_None);
 +            fi->s2addr = ptw->out_virt;
 +            fi->stage2 = true;
 +            fi->s1ptw = true;
 +            fi->s1ns = !ptw->in_secure;
 +            return 0;
 +        }
 +
 +        /* In case CAS mismatches and we loop, remember writability. */
 +        ptw->out_rw = true;
 +    }
 +
 +#ifdef CONFIG_ATOMIC64
 +    if (ptw->out_be) {
 +        old_val = cpu_to_be64(old_val);
 +        new_val = cpu_to_be64(new_val);
 +        cur_val = qatomic_cmpxchg__nocheck((uint64_t *)host, old_val, new_val);
 +        cur_val = be64_to_cpu(cur_val);
 +    } else {
 +        old_val = cpu_to_le64(old_val);
 +        new_val = cpu_to_le64(new_val);
 +        cur_val = qatomic_cmpxchg__nocheck((uint64_t *)host, old_val, new_val);
 +        cur_val = le64_to_cpu(cur_val);
 +    }
 +#else
 +    /*
 +     * We can't support the full 64-bit atomic cmpxchg on the host.
 +     * Because this is only used for FEAT_HAFDBS, which is only for AA64,
 +     * we know that TCG_OVERSIZED_GUEST is set, which means that we are
 +     * running in round-robin mode and could only race with dma i/o.
 +     */
 +#ifndef TCG_OVERSIZED_GUEST
 +# error "Unexpected configuration"
 +#endif
 +    bool locked = qemu_mutex_iothread_locked();
 +    if (!locked) {
 +       qemu_mutex_lock_iothread();
 +    }
 +    if (ptw->out_be) {
 +        cur_val = ldq_be_p(host);
 +        if (cur_val == old_val) {
 +            stq_be_p(host, new_val);
 +        }
 +    } else {
 +        cur_val = ldq_le_p(host);
 +        if (cur_val == old_val) {
 +            stq_le_p(host, new_val);
 +        }
 +    }
 +    if (!locked) {
 +        qemu_mutex_unlock_iothread();
 +    }
 +#endif
 +
 +    return cur_val;
 +}
 +
  static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
                                       uint32_t *table, uint32_t address)
  {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
      uint32_t el = regime_el(env, mmu_idx);
      uint64_t descaddrmask;
      bool aarch64 = arm_el_is_aa64(env, el);
 -    uint64_t descriptor;
 +    uint64_t descriptor, new_descriptor;
      bool nstable;
      /* TODO: This code does not support shareability levels. */
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
      if (fi->type != ARMFault_None) {
          goto do_fault;
      }
 +    new_descriptor = descriptor;
 + restart_atomic_update:
      if (!(descriptor & 1) || (!(descriptor & 2) && (level == 3))) {
          /* Invalid, or the Reserved level 3 encoding */
          goto do_translation_fault;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
       * to give a correct page or table address, the address field
       * in a block descriptor is smaller; so we need to explicitly
       * clear the lower bits here before ORing in the low vaddr bits.
 +     *
 +     * Afterward, descaddr is the final physical address.
       */
+     page_size = (1ULL << ((stride * (4 - level)) + 3));
+     descaddr &= ~(hwaddr)(page_size - 1);
+     descaddr |= (address & (page_size - 1));
++    if (likely(!ptw->in_debug)) {
++        /*
++         * Access flag.
++         * If HA is enabled, prepare to update the descriptor below.
++         * Otherwise, pass the access fault on to software.
++         */
++        if (!(descriptor & (1 << 10))) {
++            if (param.ha) {
++                new_descriptor |= 1 << 10; /* AF */
++            } else {
++                fi->type = ARMFault_AccessFlag;
++                goto do_fault;
++            }
++        }
++    }
++
+     /*
+-     * Extract attributes from the descriptor, and apply table descriptors.
+-     * Stage 2 table descriptors do not include any attribute fields.
+-     * HPD disables all the table attributes except NSTable.
++     * Extract attributes from the (modified) descriptor, and apply
++     * table descriptors. Stage 2 table descriptors do not include
++     * any attribute fields. HPD disables all the table attributes
++     * except NSTable.
+      */
+-    attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
++    attrs = new_descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
+     if (!regime_is_stage2(mmu_idx)) {
+         attrs |= nstable << 5; /* NS */
+         if (!param.hpd) {
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+         }
+     }
 -    /*
--     * i2c-15:
+-     * Here descaddr is the final physical address, and attributes
--     * - pca9548@75
+-     * are all in attrs.
 -     */
-+    i2c_slave_create_simple(npcm7xx_i2c_get_bus(soc, 15), "pca9548", 0x75);
+-    if ((attrs & (1 << 10)) == 0) {
- }
+-        /* Access flag */
+-        fi->type = ARMFault_AccessFlag;
- static void quanta_gsj_fan_init(NPCM7xxMachine *machine, NPCM7xxState *soc)
+-        goto do_fault;
-diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
+-    }
-index XXXXXXX..XXXXXXX 100644
+-
---- a/hw/arm/Kconfig
+     ap = extract32(attrs, 6, 2);
-+++ b/hw/arm/Kconfig
+-
-@@ -XXX,XX +XXX,XX @@ config NPCM7XX
+     if (regime_is_stage2(mmu_idx)) {
-     select SERIAL
+         ns = mmu_idx == ARMMMUIdx_Stage2;
-     select SSI
+         xn = extract64(attrs, 53, 2);
-     select UNIMP
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-+    select PCA954X
+         goto do_fault;
+     }
- config FSL_IMX25
-     bool
++    /* If FEAT_HAFDBS has made changes, update the PTE. */
 +    if (new_descriptor != descriptor) {
 +        new_descriptor = arm_casq_ptw(env, descriptor, new_descriptor, ptw, fi);
 +        if (fi->type != ARMFault_None) {
 +            goto do_fault;
 +        }
 +        /*
 +         * I_YZSVV says that if the in-memory descriptor has changed,
 +         * then we must use the information in that new value
 +         * (which might include a different output address, different
 +         * attributes, or generate a fault).
 +         * Restart the handling of the descriptor value from scratch.
 +         */
 +        if (new_descriptor != descriptor) {
 +            descriptor = new_descriptor;
 +            goto restart_atomic_update;
 +        }
 +    }
 +
      if (ns) {
          /*
           * The NS bit will (as required by the architecture) have no effect if
 --
-.20.1
+.25.1

-[PULL 18/28] target/arm: Handle VPR semantics in existing code
+[PULL 19/30] target/arm: Implement FEAT_HAFDBS, dirty bit portion
-When MVE is supported, the VPR register has a place on the exception
+From: Richard Henderson <richard.henderson@linaro.org>
 stack frame in a previously reserved slot just above the FPSCR.
 It must also be zeroed in various situations when we invalidate
 FPU context.
-Update the code which handles the stack frames (exception entry and
+Perform the atomic update for hardware management of the dirty bit.
 exit code, VLLDM, and VLSTM) to save/restore VPR.
-Update code which invalidates FP registers (mostly also exception
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-entry and exit code, but also VSCCLRM and the code in
+Message-id: 20221024051851.3074715-14-richard.henderson@linaro.org
-full_vfp_access_check() that corresponds to the ExecuteFPCheck()
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-pseudocode) to zero VPR.
+---
  target/arm/cpu64.c |  2 +-
  target/arm/ptw.c   | 16 ++++++++++++++++
 files changed, 17 insertions(+), 1 deletion(-)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210614151007.4545-4-peter.maydell@linaro.org
 ---
  target/arm/m_helper.c         | 54 +++++++++++++++++++++++++++++------
  target/arm/translate-m-nocp.c |  5 +++-
  target/arm/translate-vfp.c    |  9 ++++--
 files changed, 57 insertions(+), 11 deletions(-)
 diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/m_helper.c
+--- a/target/arm/cpu64.c
-+++ b/target/arm/m_helper.c
++++ b/target/arm/cpu64.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-             uint32_t shi = extract64(dn, 32, 32);
+     cpu->isar.id_aa64mmfr0 = t;
-             if (i >= 16) {
+     t = cpu->isar.id_aa64mmfr1;
--                faddr += 8; /* skip the slot for the FPSCR */
+-    t = FIELD_DP64(t, ID_AA64MMFR1, HAFDBS, 1);   /* FEAT_HAFDBS, AF only */
-+                faddr += 8; /* skip the slot for the FPSCR/VPR */
++    t = FIELD_DP64(t, ID_AA64MMFR1, HAFDBS, 2);   /* FEAT_HAFDBS */
      t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* FEAT_VMID16 */
      t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);       /* FEAT_VHE */
      t = FIELD_DP64(t, ID_AA64MMFR1, HPDS, 1);     /* FEAT_HPDS */
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
                  goto do_fault;
              }
-             stacked_ok = stacked_ok &&
+         }
-                 v7m_stack_write(cpu, faddr, slo, mmu_idx, STACK_LAZYFP) &&
++
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
++        /*
-         stacked_ok = stacked_ok &&
++         * Dirty Bit.
-             v7m_stack_write(cpu, fpcar + 0x40,
++         * If HD is enabled, pre-emptively set/clear the appropriate AP/S2AP
-                             vfp_get_fpscr(env), mmu_idx, STACK_LAZYFP);
++         * bit for writeback. The actual write protection test may still be
-+        if (cpu_isar_feature(aa32_mve, cpu)) {
++         * overridden by tableattrs, to be merged below.
-+            stacked_ok = stacked_ok &&
++         */
-+                v7m_stack_write(cpu, fpcar + 0x44,
++        if (param.hd
-+                                env->v7m.vpr, mmu_idx, STACK_LAZYFP);
++            && extract64(descriptor, 51, 1)  /* DBM */
 +            && access_type == MMU_DATA_STORE) {
 +            if (regime_is_stage2(mmu_idx)) {
 +                new_descriptor |= 1ull << 7;    /* set S2AP[1] */
 +            } else {
 +                new_descriptor &= ~(1ull << 7); /* clear AP[2] */
 +            }
 +        }
      }
      /*
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
-     env->v7m.fpccr[is_secure] &= ~R_V7M_FPCCR_LSPACT_MASK;
-     if (ts) {
--        /* Clear s0 to s31 and the FPSCR */
-+        /* Clear s0 to s31 and the FPSCR and VPR */
-         int i;
-         for (i = 0; i < 32; i += 2) {
-             *aa32_vfp_dreg(env, i / 2) = 0;
-         }
-         vfp_set_fpscr(env, 0);
-+        if (cpu_isar_feature(aa32_mve, cpu)) {
-+            env->v7m.vpr = 0;
-+        }
-     }
-     /*
--     * Otherwise s0 to s15 and FPSCR are UNKNOWN; we choose to leave them
-+     * Otherwise s0 to s15, FPSCR and VPR are UNKNOWN; we choose to leave them
-      * unchanged.
-      */
- }
-@@ -XXX,XX +XXX,XX @@ static void v7m_update_fpccr(CPUARMState *env, uint32_t frameptr,
- void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
- {
-     /* fptr is the value of Rn, the frame pointer we store the FP regs to */
-+    ARMCPU *cpu = env_archcpu(env);
-     bool s = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
-     bool lspact = env->v7m.fpccr[s] & R_V7M_FPCCR_LSPACT_MASK;
-     uintptr_t ra = GETPC();
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
-             cpu_stl_data_ra(env, faddr + 4, shi, ra);
-         }
-         cpu_stl_data_ra(env, fptr + 0x40, vfp_get_fpscr(env), ra);
-+        if (cpu_isar_feature(aa32_mve, cpu)) {
-+            cpu_stl_data_ra(env, fptr + 0x44, env->v7m.vpr, ra);
-+        }
-         /*
--         * If TS is 0 then s0 to s15 and FPSCR are UNKNOWN; we choose to
-+         * If TS is 0 then s0 to s15, FPSCR and VPR are UNKNOWN; we choose to
-          * leave them unchanged, matching our choice in v7m_preserve_fp_state.
-          */
-         if (ts) {
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
-                 *aa32_vfp_dreg(env, i / 2) = 0;
-             }
-             vfp_set_fpscr(env, 0);
-+            if (cpu_isar_feature(aa32_mve, cpu)) {
-+                env->v7m.vpr = 0;
-+            }
-         }
-     } else {
-         v7m_update_fpccr(env, fptr, false);
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
- void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
- {
-+    ARMCPU *cpu = env_archcpu(env);
-     uintptr_t ra = GETPC();
-     /* fptr is the value of Rn, the frame pointer we load the FP regs from */
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
-             uint32_t faddr = fptr + 4 * i;
-             if (i >= 16) {
--                faddr += 8; /* skip the slot for the FPSCR */
-+                faddr += 8; /* skip the slot for the FPSCR and VPR */
-             }
-             slo = cpu_ldl_data_ra(env, faddr, ra);
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
-         }
-         fpscr = cpu_ldl_data_ra(env, fptr + 0x40, ra);
-         vfp_set_fpscr(env, fpscr);
-+        if (cpu_isar_feature(aa32_mve, cpu)) {
-+            env->v7m.vpr = cpu_ldl_data_ra(env, fptr + 0x44, ra);
-+        }
-     }
-     env->v7m.control[M_REG_S] |= R_V7M_CONTROL_FPCA_MASK;
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
-                     uint32_t shi = extract64(dn, 32, 32);
-                     if (i >= 16) {
--                        faddr += 8; /* skip the slot for the FPSCR */
-+                        faddr += 8; /* skip the slot for the FPSCR and VPR */
-                     }
-                     stacked_ok = stacked_ok &&
-                         v7m_stack_write(cpu, faddr, slo,
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
-                 stacked_ok = stacked_ok &&
-                     v7m_stack_write(cpu, frameptr + 0x60,
-                                     vfp_get_fpscr(env), mmu_idx, STACK_NORMAL);
-+                if (cpu_isar_feature(aa32_mve, cpu)) {
-+                    stacked_ok = stacked_ok &&
-+                        v7m_stack_write(cpu, frameptr + 0x64,
-+                                        env->v7m.vpr, mmu_idx, STACK_NORMAL);
-+                }
-                 if (cpacr_pass) {
-                     for (i = 0; i < ((framesize == 0xa8) ? 32 : 16); i += 2) {
-                         *aa32_vfp_dreg(env, i / 2) = 0;
-                     }
-                     vfp_set_fpscr(env, 0);
-+                    if (cpu_isar_feature(aa32_mve, cpu)) {
-+                        env->v7m.vpr = 0;
-+                    }
-                 }
-             } else {
-                 /* Lazy stacking enabled, save necessary info to stack later */
-@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
-                     v7m_exception_taken(cpu, excret, true, false);
-                 }
-             }
--            /* Clear s0..s15 and FPSCR; TODO also VPR when MVE is implemented */
-+            /* Clear s0..s15, FPSCR and VPR */
-             int i;
-             for (i = 0; i < 16; i += 2) {
-                 *aa32_vfp_dreg(env, i / 2) = 0;
-             }
-             vfp_set_fpscr(env, 0);
-+            if (cpu_isar_feature(aa32_mve, cpu)) {
-+                env->v7m.vpr = 0;
-+            }
-         }
-     }
-@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
-                     uint32_t faddr = frameptr + 0x20 + 4 * i;
-                     if (i >= 16) {
--                        faddr += 8; /* Skip the slot for the FPSCR */
-+                        faddr += 8; /* Skip the slot for the FPSCR and VPR */
-                     }
-                     pop_ok = pop_ok &&
-@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
-                 if (pop_ok) {
-                     vfp_set_fpscr(env, fpscr);
-                 }
-+                if (cpu_isar_feature(aa32_mve, cpu)) {
-+                    pop_ok = pop_ok &&
-+                        v7m_stack_read(cpu, &env->v7m.vpr,
-+                                       frameptr + 0x64, mmu_idx);
-+                }
-                 if (!pop_ok) {
-                     /*
-                      * These regs are 0 if security extension present;
-@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
-                         *aa32_vfp_dreg(env, i / 2) = 0;
-                     }
-                     vfp_set_fpscr(env, 0);
-+                    if (cpu_isar_feature(aa32_mve, cpu)) {
-+                        env->v7m.vpr = 0;
-+                    }
-                 }
-             }
-         }
-diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-m-nocp.c
-+++ b/target/arm/translate-m-nocp.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
-         btmreg++;
-     }
-     assert(btmreg == topreg + 1);
--    /* TODO: when MVE is implemented, zero VPR here */
-+    if (dc_isar_feature(aa32_mve, s)) {
-+        TCGv_i32 z32 = tcg_const_i32(0);
-+        store_cpu_field(z32, v7m.vpr);
-+    }
-     return true;
- }
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
-+++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
-         if (s->v7m_new_fp_ctxt_needed) {
-             /*
--             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
--             * and the FPSCR.
-+             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA,
-+             * the FPSCR, and VPR.
-              */
-             TCGv_i32 control, fpscr;
-             uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
-@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
-             fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
-             gen_helper_vfp_set_fpscr(cpu_env, fpscr);
-             tcg_temp_free_i32(fpscr);
-+            if (dc_isar_feature(aa32_mve, s)) {
-+                TCGv_i32 z32 = tcg_const_i32(0);
-+                store_cpu_field(z32, v7m.vpr);
-+            }
-+
-             /*
-              * We don't need to arrange to end the TB, because the only
-              * parts of FPSCR which we cache in the TB flags are the VECLEN
 --
-.20.1
+.25.1

-[PULL 02/28] target/arm: Diagnose UNALLOCATED in disas_simd_two_reg_misc_fp16
+[PULL 20/30] target/arm: Use the max page size in a 2-stage ptw
 From: Richard Henderson <richard.henderson@linaro.org>
-This fprintf+assert has been in place since the beginning.
+We had only been reporting the stage2 page size.  This causes
-It is prior to the fp_access_check, so we're still good to
+problems if stage1 is using a larger page size (16k, 2M, etc),
-raise sigill here.
+but stage2 is using a smaller page size, because cputlb does
 not set large_page_{addr,mask} properly.
-Resolves: https://gitlab.com/qemu-project/qemu/-/issues/381
+Fix by using the max of the two page sizes.
 Reported-by: Marc Zyngier <maz@kernel.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20221024051851.3074715-15-richard.henderson@linaro.org
 Message-id: 20210604183506.916654-2-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 4 ++--
+ target/arm/ptw.c | 11 ++++++++++-
-file changed, 2 insertions(+), 2 deletions(-)
+file changed, 10 insertions(+), 1 deletion(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/ptw.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
-     case 0x7f: /* FSQRT (vector) */
+                                    ARMMMUFaultInfo *fi)
-         break;
+ {
-     default:
+     hwaddr ipa;
--        fprintf(stderr, "%s: insn 0x%04x fpop 0x%2x\n", __func__, insn, fpop);
+-    int s1_prot;
--        g_assert_not_reached();
++    int s1_prot, s1_lgpgsz;
-+        unallocated_encoding(s);
+     bool is_secure = ptw->in_secure;
-+        return;
+     bool ret, ipa_secure, s2walk_secure;
      ARMCacheAttrs cacheattrs1;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
       * Save the stage1 results so that we may merge prot and cacheattrs later.
       */
      s1_prot = result->f.prot;
 +    s1_lgpgsz = result->f.lg_page_size;
      cacheattrs1 = result->cacheattrs;
      memset(result, 0, sizeof(*result));
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
          return ret;
      }
++    /*
 +     * Use the maximum of the S1 & S2 page size, so that invalidation
 +     * of pages > TARGET_PAGE_SIZE works correctly.
 +     */
 +    if (result->f.lg_page_size < s1_lgpgsz) {
 +        result->f.lg_page_size = s1_lgpgsz;
 +    }
 +
      /* Combine the S1 and S2 cache attributes. */
      hcr = arm_hcr_el2_eff_secstate(env, is_secure);
      if (hcr & HCR_DC) {
 --
-.20.1
+.25.1

-[PULL 15/28] hw/arm: quanta-q71l add pca954x muxes
+[PULL 21/30] reset: allow registering handlers that aren't called by snapshot loading
-From: Patrick Venture <venture@google.com>
+From: "Jason A. Donenfeld" <Jason@zx2c4.com>
-Adds the pca954x muxes expected.
+Snapshot loading only expects to call deterministic handlers, not
 non-deterministic ones. So introduce a way of registering handlers that
 won't be called when reseting for snapshots.
-Tested: Booted quanta-q71l image to userspace.
+Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
-Signed-off-by: Patrick Venture <venture@google.com>
+Message-id: 20221025004327.568476-2-Jason@zx2c4.com
-Reviewed-by: Hao Wu <wuhaotsh@google.com>
+[PMM: updated json doc comment with Markus' text; fixed
-Reviewed-by: Joel Stanley <joel@jms.id.au>
+ checkpatch style nit]
-Reviewed-by: Cédric Le Goater <clg@kaod.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20210608202522.2677850-4-venture@google.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/aspeed.c | 11 ++++++++---
+ qapi/run-state.json        |  6 +++++-
- hw/arm/Kconfig  |  1 +
+ include/hw/boards.h        |  2 +-
-files changed, 9 insertions(+), 3 deletions(-)
+ include/sysemu/reset.h     |  5 ++++-
  hw/arm/aspeed.c            |  4 ++--
  hw/arm/mps2-tz.c           |  4 ++--
  hw/core/reset.c            | 17 ++++++++++++++++-
  hw/hppa/machine.c          |  4 ++--
  hw/i386/microvm.c          |  4 ++--
  hw/i386/pc.c               |  6 +++---
  hw/ppc/pegasos2.c          |  4 ++--
  hw/ppc/pnv.c               |  4 ++--
  hw/ppc/spapr.c             |  4 ++--
  hw/s390x/s390-virtio-ccw.c |  4 ++--
  migration/savevm.c         |  2 +-
  softmmu/runstate.c         | 11 ++++++++---
 files changed, 54 insertions(+), 27 deletions(-)
+diff --git a/qapi/run-state.json b/qapi/run-state.json
+index XXXXXXX..XXXXXXX 100644
+--- a/qapi/run-state.json
++++ b/qapi/run-state.json
+@@ -XXX,XX +XXX,XX @@
+ #                   ignores --no-reboot. This is useful for sanitizing
+ #                   hypercalls on s390 that are used during kexec/kdump/boot
+ #
++# @snapshot-load: A snapshot is being loaded by the record & replay
++#                 subsystem. This value is used only within QEMU.  It
++#                 doesn't occur in QMP. (since 7.2)
++#
+ ##
+ { 'enum': 'ShutdownCause',
+   # Beware, shutdown_caused_by_guest() depends on enumeration order
+   'data': [ 'none', 'host-error', 'host-qmp-quit', 'host-qmp-system-reset',
+             'host-signal', 'host-ui', 'guest-shutdown', 'guest-reset',
+-            'guest-panic', 'subsystem-reset'] }
++            'guest-panic', 'subsystem-reset', 'snapshot-load'] }
+ ##
+ # @StatusInfo:
+diff --git a/include/hw/boards.h b/include/hw/boards.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/hw/boards.h
++++ b/include/hw/boards.h
+@@ -XXX,XX +XXX,XX @@ struct MachineClass {
+     const char *deprecation_reason;
+     void (*init)(MachineState *state);
+-    void (*reset)(MachineState *state);
++    void (*reset)(MachineState *state, ShutdownCause reason);
+     void (*wakeup)(MachineState *state);
+     int (*kvm_type)(MachineState *machine, const char *arg);
+diff --git a/include/sysemu/reset.h b/include/sysemu/reset.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/sysemu/reset.h
++++ b/include/sysemu/reset.h
+@@ -XXX,XX +XXX,XX @@
+ #ifndef QEMU_SYSEMU_RESET_H
+ #define QEMU_SYSEMU_RESET_H
++#include "qapi/qapi-events-run-state.h"
++
+ typedef void QEMUResetHandler(void *opaque);
+ void qemu_register_reset(QEMUResetHandler *func, void *opaque);
++void qemu_register_reset_nosnapshotload(QEMUResetHandler *func, void *opaque);
+ void qemu_unregister_reset(QEMUResetHandler *func, void *opaque);
+-void qemu_devices_reset(void);
++void qemu_devices_reset(ShutdownCause reason);
+ #endif
 diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/aspeed.c
 +++ b/hw/arm/aspeed.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_bletchley_class_init(ObjectClass *oc, void *data)
- #include "hw/arm/boot.h"
+         aspeed_soc_num_cpus(amc->soc_name);
- #include "hw/arm/aspeed.h"
+ }
- #include "hw/arm/aspeed_soc.h"
-+#include "hw/i2c/i2c_mux_pca954x.h"
+-static void fby35_reset(MachineState *state)
- #include "hw/i2c/smbus_eeprom.h"
++static void fby35_reset(MachineState *state, ShutdownCause reason)
- #include "hw/misc/pca9552.h"
+ {
- #include "hw/misc/tmp105.h"
+     AspeedMachineState *bmc = ASPEED_MACHINE(state);
-@@ -XXX,XX +XXX,XX @@ static void quanta_q71l_bmc_i2c_init(AspeedMachineState *bmc)
+     AspeedGPIOState *gpio = &bmc->soc.gpio;
-     /* TODO: i2c-1: Add Frontpanel FRU eeprom@57 24c64 */
-     /* TODO: Add Memory Riser i2c mux and eeproms. */
+-    qemu_devices_reset();
++    qemu_devices_reset(reason);
--    /* TODO: i2c-2: pca9546@74 */
--    /* TODO: i2c-2: pca9548@77 */
+     /* Board ID: 7 (Class-1, 4 slots) */
-+    i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 2), "pca9546", 0x74);
+     object_property_set_bool(OBJECT(gpio), "gpioV4", true, &error_fatal);
-+    i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 2), "pca9548", 0x77);
+diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/mps2-tz.c
 +++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void mps2_set_remap(Object *obj, const char *value, Error **errp)
      }
  }
 -static void mps2_machine_reset(MachineState *machine)
 +static void mps2_machine_reset(MachineState *machine, ShutdownCause reason)
  {
      MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
@@ -XXX,XX +XXX,XX @@ static void mps2_machine_reset(MachineState *machine)
       * reset see the correct mapping.
       */
      remap_memory(mms, mms->remap);
 -    qemu_devices_reset();
 +    qemu_devices_reset(reason);
  }
  static void mps2tz_class_init(ObjectClass *oc, void *data)
 diff --git a/hw/core/reset.c b/hw/core/reset.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/core/reset.c
 +++ b/hw/core/reset.c
@@ -XXX,XX +XXX,XX @@ typedef struct QEMUResetEntry {
      QTAILQ_ENTRY(QEMUResetEntry) entry;
      QEMUResetHandler *func;
      void *opaque;
 +    bool skip_on_snapshot_load;
  } QEMUResetEntry;
  static QTAILQ_HEAD(, QEMUResetEntry) reset_handlers =
@@ -XXX,XX +XXX,XX @@ void qemu_register_reset(QEMUResetHandler *func, void *opaque)
      QTAILQ_INSERT_TAIL(&reset_handlers, re, entry);
  }
 +void qemu_register_reset_nosnapshotload(QEMUResetHandler *func, void *opaque)
 +{
 +    QEMUResetEntry *re = g_new0(QEMUResetEntry, 1);
 +
-     /* TODO: i2c-3: Add BIOS FRU eeprom@56 24c64 */
++    re->func = func;
--    /* TODO: i2c-7: Add pca9546@70 */
++    re->opaque = opaque;
 +    re->skip_on_snapshot_load = true;
 +    QTAILQ_INSERT_TAIL(&reset_handlers, re, entry);
 +}
 +
-+    /* i2c-7 */
+ void qemu_unregister_reset(QEMUResetHandler *func, void *opaque)
-+    i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 7), "pca9546", 0x70);
+ {
-     /*        - i2c@0: pmbus@59 */
+     QEMUResetEntry *re;
-     /*        - i2c@1: pmbus@58 */
+@@ -XXX,XX +XXX,XX @@ void qemu_unregister_reset(QEMUResetHandler *func, void *opaque)
-     /*        - i2c@2: pmbus@58 */
+     }
-     /*        - i2c@3: pmbus@59 */
+ }
-+
-     /* TODO: i2c-7: Add PDB FRU eeprom@52 */
+-void qemu_devices_reset(void)
-     /* TODO: i2c-8: Add BMC FRU eeprom@50 */
++void qemu_devices_reset(ShutdownCause reason)
- }
+ {
-diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
+     QEMUResetEntry *re, *nre;
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/Kconfig
+     /* reset all devices */
-+++ b/hw/arm/Kconfig
+     QTAILQ_FOREACH_SAFE(re, &reset_handlers, entry, nre) {
-@@ -XXX,XX +XXX,XX @@ config ASPEED_SOC
++        if (reason == SHUTDOWN_CAUSE_SNAPSHOT_LOAD &&
-     select PCA9552
++            re->skip_on_snapshot_load) {
-     select SERIAL
++            continue;
-     select SMBUS_EEPROM
++        }
-+    select PCA954X
+         re->func(re->opaque);
-     select SSI
+     }
-     select SSI_M25P80
+ }
-     select TMP105
+diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/hppa/machine.c
 +++ b/hw/hppa/machine.c
@@ -XXX,XX +XXX,XX @@ static void machine_hppa_init(MachineState *machine)
      cpu[0]->env.gr[19] = FW_CFG_IO_BASE;
  }
 -static void hppa_machine_reset(MachineState *ms)
 +static void hppa_machine_reset(MachineState *ms, ShutdownCause reason)
  {
      unsigned int smp_cpus = ms->smp.cpus;
      int i;
 -    qemu_devices_reset();
 +    qemu_devices_reset(reason);
      /* Start all CPUs at the firmware entry point.
       *  Monarch CPU will initialize firmware, secondary CPUs
 diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/i386/microvm.c
 +++ b/hw/i386/microvm.c
@@ -XXX,XX +XXX,XX @@ static void microvm_machine_state_init(MachineState *machine)
      microvm_devices_init(mms);
  }
 -static void microvm_machine_reset(MachineState *machine)
 +static void microvm_machine_reset(MachineState *machine, ShutdownCause reason)
  {
      MicrovmMachineState *mms = MICROVM_MACHINE(machine);
      CPUState *cs;
@@ -XXX,XX +XXX,XX @@ static void microvm_machine_reset(MachineState *machine)
          mms->kernel_cmdline_fixed = true;
      }
 -    qemu_devices_reset();
 +    qemu_devices_reset(reason);
      CPU_FOREACH(cs) {
          cpu = X86_CPU(cs);
 diff --git a/hw/i386/pc.c b/hw/i386/pc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/i386/pc.c
 +++ b/hw/i386/pc.c
@@ -XXX,XX +XXX,XX @@ static void pc_machine_initfn(Object *obj)
      cxl_machine_init(obj, &pcms->cxl_devices_state);
  }
 -static void pc_machine_reset(MachineState *machine)
 +static void pc_machine_reset(MachineState *machine, ShutdownCause reason)
  {
      CPUState *cs;
      X86CPU *cpu;
 -    qemu_devices_reset();
 +    qemu_devices_reset(reason);
      /* Reset APIC after devices have been reset to cancel
       * any changes that qemu_devices_reset() might have done.
@@ -XXX,XX +XXX,XX @@ static void pc_machine_reset(MachineState *machine)
  static void pc_machine_wakeup(MachineState *machine)
  {
      cpu_synchronize_all_states();
 -    pc_machine_reset(machine);
 +    pc_machine_reset(machine, SHUTDOWN_CAUSE_NONE);
      cpu_synchronize_all_post_reset();
  }
 diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/ppc/pegasos2.c
 +++ b/hw/ppc/pegasos2.c
@@ -XXX,XX +XXX,XX @@ static void pegasos2_pci_config_write(Pegasos2MachineState *pm, int bus,
      pegasos2_mv_reg_write(pm, pcicfg + 4, len, val);
  }
 -static void pegasos2_machine_reset(MachineState *machine)
 +static void pegasos2_machine_reset(MachineState *machine, ShutdownCause reason)
  {
      Pegasos2MachineState *pm = PEGASOS2_MACHINE(machine);
      void *fdt;
      uint64_t d[2];
      int sz;
 -    qemu_devices_reset();
 +    qemu_devices_reset(reason);
      if (!pm->vof) {
          return; /* Firmware should set up machine so nothing to do */
      }
 diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/ppc/pnv.c
 +++ b/hw/ppc/pnv.c
@@ -XXX,XX +XXX,XX @@ static void pnv_powerdown_notify(Notifier *n, void *opaque)
      }
  }
 -static void pnv_reset(MachineState *machine)
 +static void pnv_reset(MachineState *machine, ShutdownCause reason)
  {
      PnvMachineState *pnv = PNV_MACHINE(machine);
      IPMIBmc *bmc;
      void *fdt;
 -    qemu_devices_reset();
 +    qemu_devices_reset(reason);
      /*
       * The machine should provide by default an internal BMC simulator.
 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/ppc/spapr.c
 +++ b/hw/ppc/spapr.c
@@ -XXX,XX +XXX,XX @@ void spapr_check_mmu_mode(bool guest_radix)
      }
  }
 -static void spapr_machine_reset(MachineState *machine)
 +static void spapr_machine_reset(MachineState *machine, ShutdownCause reason)
  {
      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
      PowerPCCPU *first_ppc_cpu;
@@ -XXX,XX +XXX,XX @@ static void spapr_machine_reset(MachineState *machine)
          spapr_setup_hpt(spapr);
      }
 -    qemu_devices_reset();
 +    qemu_devices_reset(reason);
      spapr_ovec_cleanup(spapr->ov5_cas);
      spapr->ov5_cas = spapr_ovec_new();
 diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/s390x/s390-virtio-ccw.c
 +++ b/hw/s390x/s390-virtio-ccw.c
@@ -XXX,XX +XXX,XX @@ static void s390_pv_prepare_reset(S390CcwMachineState *ms)
      s390_pv_prep_reset();
  }
 -static void s390_machine_reset(MachineState *machine)
 +static void s390_machine_reset(MachineState *machine, ShutdownCause reason)
  {
      S390CcwMachineState *ms = S390_CCW_MACHINE(machine);
      enum s390_reset reset_type;
@@ -XXX,XX +XXX,XX @@ static void s390_machine_reset(MachineState *machine)
              s390_machine_unprotect(ms);
          }
 -        qemu_devices_reset();
 +        qemu_devices_reset(reason);
          s390_crypto_reset();
          /* configure and start the ipl CPU only */
 diff --git a/migration/savevm.c b/migration/savevm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/migration/savevm.c
 +++ b/migration/savevm.c
@@ -XXX,XX +XXX,XX @@ bool load_snapshot(const char *name, const char *vmstate,
          goto err_drain;
      }
 -    qemu_system_reset(SHUTDOWN_CAUSE_NONE);
 +    qemu_system_reset(SHUTDOWN_CAUSE_SNAPSHOT_LOAD);
      mis->from_src_file = f;
      if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
 diff --git a/softmmu/runstate.c b/softmmu/runstate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/softmmu/runstate.c
 +++ b/softmmu/runstate.c
@@ -XXX,XX +XXX,XX @@ void qemu_system_reset(ShutdownCause reason)
      cpu_synchronize_all_states();
      if (mc && mc->reset) {
 -        mc->reset(current_machine);
 +        mc->reset(current_machine, reason);
      } else {
 -        qemu_devices_reset();
 +        qemu_devices_reset(reason);
      }
 -    if (reason && reason != SHUTDOWN_CAUSE_SUBSYSTEM_RESET) {
 +    switch (reason) {
 +    case SHUTDOWN_CAUSE_NONE:
 +    case SHUTDOWN_CAUSE_SUBSYSTEM_RESET:
 +    case SHUTDOWN_CAUSE_SNAPSHOT_LOAD:
 +        break;
 +    default:
          qapi_event_send_reset(shutdown_caused_by_guest(reason), reason);
      }
      cpu_synchronize_all_post_reset();
 --
-.20.1
+.25.1

-[PULL 10/28] hw/acpi: Provide function acpi_ghes_present()
+[PULL 22/30] device-tree: add re-randomization helper function
-Allow code elsewhere in the system to check whether the ACPI GHES
+From: "Jason A. Donenfeld" <Jason@zx2c4.com>
 table is present, so it can determine whether it is OK to try to
 record an error by calling acpi_ghes_record_errors().
-(We don't need to migrate the new 'present' field in AcpiGhesState,
+When the system reboots, the rng-seed that the FDT has should be
-because it is set once at system initialization and doesn't change.)
+re-randomized, so that the new boot gets a new seed. Several
 architectures require this functionality, so export a function for
 injecting a new seed into the given FDT.
+Cc: Alistair Francis <alistair.francis@wdc.com>
+Cc: David Gibson <david@gibson.dropbear.id.au>
+Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Message-id: 20221025004327.568476-3-Jason@zx2c4.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
-Message-id: 20210603171259.27962-3-peter.maydell@linaro.org
 ---
- include/hw/acpi/ghes.h |  9 +++++++++
+ include/sysemu/device_tree.h |  9 +++++++++
- hw/acpi/ghes-stub.c    |  5 +++++
+ softmmu/device_tree.c        | 21 +++++++++++++++++++++
- hw/acpi/ghes.c         | 17 +++++++++++++++++
+files changed, 30 insertions(+)
 files changed, 31 insertions(+)
-diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
+diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/acpi/ghes.h
+--- a/include/sysemu/device_tree.h
-+++ b/include/hw/acpi/ghes.h
++++ b/include/sysemu/device_tree.h
-@@ -XXX,XX +XXX,XX @@ enum {
+@@ -XXX,XX +XXX,XX @@ int qemu_fdt_setprop_sized_cells_from_array(void *fdt,
+                                                 qdt_tmp);                 \
- typedef struct AcpiGhesState {
+     })
-     uint64_t ghes_addr_le;
 +    bool present; /* True if GHES is present at all on this board */
  } AcpiGhesState;
  void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
@@ -XXX,XX +XXX,XX @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
  void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                            GArray *hardware_errors);
  int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
 +
 +/**
-+ * acpi_ghes_present: Report whether ACPI GHES table is present
++ * qemu_fdt_randomize_seeds:
 + * @fdt: device tree blob
 + *
-+ * Returns: true if the system has an ACPI GHES table and it is
++ * Re-randomize all "rng-seed" properties with new seeds.
 + * safe to call acpi_ghes_record_errors() to record a memory error.
 + */
-+bool acpi_ghes_present(void);
++void qemu_fdt_randomize_seeds(void *fdt);
- #endif
++
-diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
+ #define FDT_PCI_RANGE_RELOCATABLE          0x80000000
  #define FDT_PCI_RANGE_PREFETCHABLE         0x40000000
  #define FDT_PCI_RANGE_ALIASED              0x20000000
 diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/acpi/ghes-stub.c
+--- a/softmmu/device_tree.c
-+++ b/hw/acpi/ghes-stub.c
++++ b/softmmu/device_tree.c
-@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+@@ -XXX,XX +XXX,XX @@
- {
+ #include "qemu/option.h"
-     return -1;
+ #include "qemu/bswap.h"
  #include "qemu/cutils.h"
 +#include "qemu/guest-random.h"
  #include "sysemu/device_tree.h"
  #include "hw/loader.h"
  #include "hw/boards.h"
@@ -XXX,XX +XXX,XX @@ void hmp_dumpdtb(Monitor *mon, const QDict *qdict)
      info_report("dtb dumped to %s", filename);
  }
 +
-+bool acpi_ghes_present(void)
++void qemu_fdt_randomize_seeds(void *fdt)
 +{
-+    return false;
++    int noffset, poffset, len;
-+}
++    const char *name;
-diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
++    uint8_t *data;
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/acpi/ghes.c
 +++ b/hw/acpi/ghes.c
@@ -XXX,XX +XXX,XX @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
      /* Create a read-write fw_cfg file for Address */
      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
          NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
 +
-+    ags->present = true;
++    for (noffset = fdt_next_node(fdt, 0, NULL);
- }
++         noffset >= 0;
++         noffset = fdt_next_node(fdt, noffset, NULL)) {
- int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
++        for (poffset = fdt_first_property_offset(fdt, noffset);
-@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
++             poffset >= 0;
++             poffset = fdt_next_property_offset(fdt, poffset)) {
-     return ret;
++            data = (uint8_t *)fdt_getprop_by_offset(fdt, poffset, &name, &len);
- }
++            if (!data || strcmp(name, "rng-seed"))
-+
++                continue;
-+bool acpi_ghes_present(void)
++            qemu_guest_getrandom_nofail(data, len);
-+{
++        }
 +    AcpiGedState *acpi_ged_state;
 +    AcpiGhesState *ags;
 +
 +    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
 +                                                       NULL));
 +
 +    if (!acpi_ged_state) {
 +        return false;
 +    }
-+    ags = &acpi_ged_state->ghes_state;
-+    return ags->present;
 +}
 --
-.20.1
+.25.1

-New patch
+[PULL 23/30] x86: do not re-randomize RNG seed on snapshot load
+From: "Jason A. Donenfeld" <Jason@zx2c4.com>
+Snapshot loading is supposed to be deterministic, so we shouldn't
+re-randomize the various seeds used.
+Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
+Message-id: 20221025004327.568476-4-Jason@zx2c4.com
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/i386/x86.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/hw/i386/x86.c b/hw/i386/x86.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/i386/x86.c
++++ b/hw/i386/x86.c
+@@ -XXX,XX +XXX,XX @@ void x86_load_linux(X86MachineState *x86ms,
+         setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
+         setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
+         qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
+-        qemu_register_reset(reset_rng_seed, setup_data);
++        qemu_register_reset_nosnapshotload(reset_rng_seed, setup_data);
+         fw_cfg_add_bytes_callback(fw_cfg, FW_CFG_KERNEL_DATA, reset_rng_seed, NULL,
+                                   setup_data, kernel, kernel_size, true);
+     } else {
+--
+.25.1

-[PULL 16/28] target/arm: Provide and use H8 and H1_8 macros
+[PULL 24/30] arm: re-randomize rng-seed on reboot
-Currently we provide Hn and H1_n macros for accessing the correct
+From: "Jason A. Donenfeld" <Jason@zx2c4.com>
 data within arrays of vector elements of size 1, 2 and 4, accounting
 for host endianness.  We don't provide any macros for elements of
 size 8 because there the host endianness doesn't matter.  However,
 this does result in awkwardness where we need to pass empty arguments
 to macros, because checkpatch complains about them.  The empty
 argument is a little confusing for humans to read as well.
-Add H8() and H1_8() macros and use them where we were previously
+When the system reboots, the rng-seed that the FDT has should be
-passing empty arguments to macros.
+re-randomized, so that the new boot gets a new seed. Since the FDT is in
 the ROM region at this point, we add a hook right after the ROM has been
 added, so that we have a pointer to that copy of the FDT.
-Suggested-by: Richard Henderson <richard.henderson@linaro.org>
+Cc: Peter Maydell <peter.maydell@linaro.org>
 Cc: qemu-arm@nongnu.org
 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
 Message-id: 20221025004327.568476-5-Jason@zx2c4.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210614151007.4545-2-peter.maydell@linaro.org
-Message-id: 20210610132505.5827-1-peter.maydell@linaro.org
 ---
- target/arm/vec_internal.h |   8 +-
+ hw/arm/boot.c | 2 ++
- target/arm/sve_helper.c   | 258 +++++++++++++++++++-------------------
+file changed, 2 insertions(+)
  target/arm/vec_helper.c   |  14 +--
 files changed, 143 insertions(+), 137 deletions(-)
-diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
+diff --git a/hw/arm/boot.c b/hw/arm/boot.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_internal.h
+--- a/hw/arm/boot.c
-+++ b/target/arm/vec_internal.h
++++ b/hw/arm/boot.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
- #define H2(x)   (x)
+      * the DTB is copied again upon reset, even if addr points into RAM.
- #define H4(x)   (x)
+      */
- #endif
+     rom_add_blob_fixed_as("dtb", fdt, size, addr, as);
--
++    qemu_register_reset_nosnapshotload(qemu_fdt_randomize_seeds,
-+/*
++                                       rom_ptr_for_as(as, addr, size));
-+ * Access to 64-bit elements isn't host-endian dependent; we provide H8
-+ * and H1_8 so that when a function is being generated from a macro we
+     g_free(fdt);
 + * can pass these rather than an empty macro argument, for clarity.
 + */
 +#define H8(x)   (x)
 +#define H1_8(x) (x)
  static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
  {
 diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve_helper.c
 +++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg,               \
  DO_ZPZZ_PAIR_FP(sve2_faddp_zpzz_h, float16, H1_2, float16_add)
  DO_ZPZZ_PAIR_FP(sve2_faddp_zpzz_s, float32, H1_4, float32_add)
 -DO_ZPZZ_PAIR_FP(sve2_faddp_zpzz_d, float64,     , float64_add)
 +DO_ZPZZ_PAIR_FP(sve2_faddp_zpzz_d, float64, H1_8, float64_add)
  DO_ZPZZ_PAIR_FP(sve2_fmaxnmp_zpzz_h, float16, H1_2, float16_maxnum)
  DO_ZPZZ_PAIR_FP(sve2_fmaxnmp_zpzz_s, float32, H1_4, float32_maxnum)
 -DO_ZPZZ_PAIR_FP(sve2_fmaxnmp_zpzz_d, float64,     , float64_maxnum)
 +DO_ZPZZ_PAIR_FP(sve2_fmaxnmp_zpzz_d, float64, H1_8, float64_maxnum)
  DO_ZPZZ_PAIR_FP(sve2_fminnmp_zpzz_h, float16, H1_2, float16_minnum)
  DO_ZPZZ_PAIR_FP(sve2_fminnmp_zpzz_s, float32, H1_4, float32_minnum)
 -DO_ZPZZ_PAIR_FP(sve2_fminnmp_zpzz_d, float64,     , float64_minnum)
 +DO_ZPZZ_PAIR_FP(sve2_fminnmp_zpzz_d, float64, H1_8, float64_minnum)
  DO_ZPZZ_PAIR_FP(sve2_fmaxp_zpzz_h, float16, H1_2, float16_max)
  DO_ZPZZ_PAIR_FP(sve2_fmaxp_zpzz_s, float32, H1_4, float32_max)
 -DO_ZPZZ_PAIR_FP(sve2_fmaxp_zpzz_d, float64,     , float64_max)
 +DO_ZPZZ_PAIR_FP(sve2_fmaxp_zpzz_d, float64, H1_8, float64_max)
  DO_ZPZZ_PAIR_FP(sve2_fminp_zpzz_h, float16, H1_2, float16_min)
  DO_ZPZZ_PAIR_FP(sve2_fminp_zpzz_s, float32, H1_4, float32_min)
 -DO_ZPZZ_PAIR_FP(sve2_fminp_zpzz_d, float64,     , float64_min)
 +DO_ZPZZ_PAIR_FP(sve2_fminp_zpzz_d, float64, H1_8, float64_min)
  #undef DO_ZPZZ_PAIR_FP
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)          \
  DO_ZZZ_TB(sve2_saddl_h, int16_t, int8_t, H1_2, H1, DO_ADD)
  DO_ZZZ_TB(sve2_saddl_s, int32_t, int16_t, H1_4, H1_2, DO_ADD)
 -DO_ZZZ_TB(sve2_saddl_d, int64_t, int32_t,     , H1_4, DO_ADD)
 +DO_ZZZ_TB(sve2_saddl_d, int64_t, int32_t, H1_8, H1_4, DO_ADD)
  DO_ZZZ_TB(sve2_ssubl_h, int16_t, int8_t, H1_2, H1, DO_SUB)
  DO_ZZZ_TB(sve2_ssubl_s, int32_t, int16_t, H1_4, H1_2, DO_SUB)
 -DO_ZZZ_TB(sve2_ssubl_d, int64_t, int32_t,     , H1_4, DO_SUB)
 +DO_ZZZ_TB(sve2_ssubl_d, int64_t, int32_t, H1_8, H1_4, DO_SUB)
  DO_ZZZ_TB(sve2_sabdl_h, int16_t, int8_t, H1_2, H1, DO_ABD)
  DO_ZZZ_TB(sve2_sabdl_s, int32_t, int16_t, H1_4, H1_2, DO_ABD)
 -DO_ZZZ_TB(sve2_sabdl_d, int64_t, int32_t,     , H1_4, DO_ABD)
 +DO_ZZZ_TB(sve2_sabdl_d, int64_t, int32_t, H1_8, H1_4, DO_ABD)
  DO_ZZZ_TB(sve2_uaddl_h, uint16_t, uint8_t, H1_2, H1, DO_ADD)
  DO_ZZZ_TB(sve2_uaddl_s, uint32_t, uint16_t, H1_4, H1_2, DO_ADD)
 -DO_ZZZ_TB(sve2_uaddl_d, uint64_t, uint32_t,     , H1_4, DO_ADD)
 +DO_ZZZ_TB(sve2_uaddl_d, uint64_t, uint32_t, H1_8, H1_4, DO_ADD)
  DO_ZZZ_TB(sve2_usubl_h, uint16_t, uint8_t, H1_2, H1, DO_SUB)
  DO_ZZZ_TB(sve2_usubl_s, uint32_t, uint16_t, H1_4, H1_2, DO_SUB)
 -DO_ZZZ_TB(sve2_usubl_d, uint64_t, uint32_t,     , H1_4, DO_SUB)
 +DO_ZZZ_TB(sve2_usubl_d, uint64_t, uint32_t, H1_8, H1_4, DO_SUB)
  DO_ZZZ_TB(sve2_uabdl_h, uint16_t, uint8_t, H1_2, H1, DO_ABD)
  DO_ZZZ_TB(sve2_uabdl_s, uint32_t, uint16_t, H1_4, H1_2, DO_ABD)
 -DO_ZZZ_TB(sve2_uabdl_d, uint64_t, uint32_t,     , H1_4, DO_ABD)
 +DO_ZZZ_TB(sve2_uabdl_d, uint64_t, uint32_t, H1_8, H1_4, DO_ABD)
  DO_ZZZ_TB(sve2_smull_zzz_h, int16_t, int8_t, H1_2, H1, DO_MUL)
  DO_ZZZ_TB(sve2_smull_zzz_s, int32_t, int16_t, H1_4, H1_2, DO_MUL)
 -DO_ZZZ_TB(sve2_smull_zzz_d, int64_t, int32_t,     , H1_4, DO_MUL)
 +DO_ZZZ_TB(sve2_smull_zzz_d, int64_t, int32_t, H1_8, H1_4, DO_MUL)
  DO_ZZZ_TB(sve2_umull_zzz_h, uint16_t, uint8_t, H1_2, H1, DO_MUL)
  DO_ZZZ_TB(sve2_umull_zzz_s, uint32_t, uint16_t, H1_4, H1_2, DO_MUL)
 -DO_ZZZ_TB(sve2_umull_zzz_d, uint64_t, uint32_t,     , H1_4, DO_MUL)
 +DO_ZZZ_TB(sve2_umull_zzz_d, uint64_t, uint32_t, H1_8, H1_4, DO_MUL)
  /* Note that the multiply cannot overflow, but the doubling can. */
  static inline int16_t do_sqdmull_h(int16_t n, int16_t m)
@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqdmull_d(int64_t n, int64_t m)
  DO_ZZZ_TB(sve2_sqdmull_zzz_h, int16_t, int8_t, H1_2, H1, do_sqdmull_h)
  DO_ZZZ_TB(sve2_sqdmull_zzz_s, int32_t, int16_t, H1_4, H1_2, do_sqdmull_s)
 -DO_ZZZ_TB(sve2_sqdmull_zzz_d, int64_t, int32_t,     , H1_4, do_sqdmull_d)
 +DO_ZZZ_TB(sve2_sqdmull_zzz_d, int64_t, int32_t, H1_8, H1_4, do_sqdmull_d)
  #undef DO_ZZZ_TB
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
  DO_ZZZ_WTB(sve2_saddw_h, int16_t, int8_t, H1_2, H1, DO_ADD)
  DO_ZZZ_WTB(sve2_saddw_s, int32_t, int16_t, H1_4, H1_2, DO_ADD)
 -DO_ZZZ_WTB(sve2_saddw_d, int64_t, int32_t,     , H1_4, DO_ADD)
 +DO_ZZZ_WTB(sve2_saddw_d, int64_t, int32_t, H1_8, H1_4, DO_ADD)
  DO_ZZZ_WTB(sve2_ssubw_h, int16_t, int8_t, H1_2, H1, DO_SUB)
  DO_ZZZ_WTB(sve2_ssubw_s, int32_t, int16_t, H1_4, H1_2, DO_SUB)
 -DO_ZZZ_WTB(sve2_ssubw_d, int64_t, int32_t,     , H1_4, DO_SUB)
 +DO_ZZZ_WTB(sve2_ssubw_d, int64_t, int32_t, H1_8, H1_4, DO_SUB)
  DO_ZZZ_WTB(sve2_uaddw_h, uint16_t, uint8_t, H1_2, H1, DO_ADD)
  DO_ZZZ_WTB(sve2_uaddw_s, uint32_t, uint16_t, H1_4, H1_2, DO_ADD)
 -DO_ZZZ_WTB(sve2_uaddw_d, uint64_t, uint32_t,     , H1_4, DO_ADD)
 +DO_ZZZ_WTB(sve2_uaddw_d, uint64_t, uint32_t, H1_8, H1_4, DO_ADD)
  DO_ZZZ_WTB(sve2_usubw_h, uint16_t, uint8_t, H1_2, H1, DO_SUB)
  DO_ZZZ_WTB(sve2_usubw_s, uint32_t, uint16_t, H1_4, H1_2, DO_SUB)
 -DO_ZZZ_WTB(sve2_usubw_d, uint64_t, uint32_t,     , H1_4, DO_SUB)
 +DO_ZZZ_WTB(sve2_usubw_d, uint64_t, uint32_t, H1_8, H1_4, DO_SUB)
  #undef DO_ZZZ_WTB
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)          \
  DO_ZZZ_NTB(sve2_eoril_b, uint8_t, H1, DO_EOR)
  DO_ZZZ_NTB(sve2_eoril_h, uint16_t, H1_2, DO_EOR)
  DO_ZZZ_NTB(sve2_eoril_s, uint32_t, H1_4, DO_EOR)
 -DO_ZZZ_NTB(sve2_eoril_d, uint64_t,     , DO_EOR)
 +DO_ZZZ_NTB(sve2_eoril_d, uint64_t, H1_8, DO_EOR)
  #undef DO_ZZZ_NTB
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
  DO_ZZZW_ACC(sve2_sabal_h, int16_t, int8_t, H1_2, H1, DO_ABD)
  DO_ZZZW_ACC(sve2_sabal_s, int32_t, int16_t, H1_4, H1_2, DO_ABD)
 -DO_ZZZW_ACC(sve2_sabal_d, int64_t, int32_t,     , H1_4, DO_ABD)
 +DO_ZZZW_ACC(sve2_sabal_d, int64_t, int32_t, H1_8, H1_4, DO_ABD)
  DO_ZZZW_ACC(sve2_uabal_h, uint16_t, uint8_t, H1_2, H1, DO_ABD)
  DO_ZZZW_ACC(sve2_uabal_s, uint32_t, uint16_t, H1_4, H1_2, DO_ABD)
 -DO_ZZZW_ACC(sve2_uabal_d, uint64_t, uint32_t,     , H1_4, DO_ABD)
 +DO_ZZZW_ACC(sve2_uabal_d, uint64_t, uint32_t, H1_8, H1_4, DO_ABD)
  DO_ZZZW_ACC(sve2_smlal_zzzw_h, int16_t, int8_t, H1_2, H1, DO_MUL)
  DO_ZZZW_ACC(sve2_smlal_zzzw_s, int32_t, int16_t, H1_4, H1_2, DO_MUL)
 -DO_ZZZW_ACC(sve2_smlal_zzzw_d, int64_t, int32_t,     , H1_4, DO_MUL)
 +DO_ZZZW_ACC(sve2_smlal_zzzw_d, int64_t, int32_t, H1_8, H1_4, DO_MUL)
  DO_ZZZW_ACC(sve2_umlal_zzzw_h, uint16_t, uint8_t, H1_2, H1, DO_MUL)
  DO_ZZZW_ACC(sve2_umlal_zzzw_s, uint32_t, uint16_t, H1_4, H1_2, DO_MUL)
 -DO_ZZZW_ACC(sve2_umlal_zzzw_d, uint64_t, uint32_t,     , H1_4, DO_MUL)
 +DO_ZZZW_ACC(sve2_umlal_zzzw_d, uint64_t, uint32_t, H1_8, H1_4, DO_MUL)
  #define DO_NMUL(N, M)  -(N * M)
  DO_ZZZW_ACC(sve2_smlsl_zzzw_h, int16_t, int8_t, H1_2, H1, DO_NMUL)
  DO_ZZZW_ACC(sve2_smlsl_zzzw_s, int32_t, int16_t, H1_4, H1_2, DO_NMUL)
 -DO_ZZZW_ACC(sve2_smlsl_zzzw_d, int64_t, int32_t,     , H1_4, DO_NMUL)
 +DO_ZZZW_ACC(sve2_smlsl_zzzw_d, int64_t, int32_t, H1_8, H1_4, DO_NMUL)
  DO_ZZZW_ACC(sve2_umlsl_zzzw_h, uint16_t, uint8_t, H1_2, H1, DO_NMUL)
  DO_ZZZW_ACC(sve2_umlsl_zzzw_s, uint32_t, uint16_t, H1_4, H1_2, DO_NMUL)
 -DO_ZZZW_ACC(sve2_umlsl_zzzw_d, uint64_t, uint32_t,     , H1_4, DO_NMUL)
 +DO_ZZZW_ACC(sve2_umlsl_zzzw_d, uint64_t, uint32_t, H1_8, H1_4, DO_NMUL)
  #undef DO_ZZZW_ACC
@@ -XXX,XX +XXX,XX @@ DO_SQDMLAL(sve2_sqdmlal_zzzw_h, int16_t, int8_t, H1_2, H1,
             do_sqdmull_h, DO_SQADD_H)
  DO_SQDMLAL(sve2_sqdmlal_zzzw_s, int32_t, int16_t, H1_4, H1_2,
             do_sqdmull_s, DO_SQADD_S)
 -DO_SQDMLAL(sve2_sqdmlal_zzzw_d, int64_t, int32_t,     , H1_4,
 +DO_SQDMLAL(sve2_sqdmlal_zzzw_d, int64_t, int32_t, H1_8, H1_4,
             do_sqdmull_d, do_sqadd_d)
  DO_SQDMLAL(sve2_sqdmlsl_zzzw_h, int16_t, int8_t, H1_2, H1,
             do_sqdmull_h, DO_SQSUB_H)
  DO_SQDMLAL(sve2_sqdmlsl_zzzw_s, int32_t, int16_t, H1_4, H1_2,
             do_sqdmull_s, DO_SQSUB_S)
 -DO_SQDMLAL(sve2_sqdmlsl_zzzw_d, int64_t, int32_t,     , H1_4,
 +DO_SQDMLAL(sve2_sqdmlsl_zzzw_d, int64_t, int32_t, H1_8, H1_4,
             do_sqdmull_d, do_sqsub_d)
  #undef DO_SQDMLAL
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
  DO_CMLA_FUNC(sve2_cmla_zzzz_b, uint8_t, H1, DO_CMLA)
  DO_CMLA_FUNC(sve2_cmla_zzzz_h, uint16_t, H2, DO_CMLA)
  DO_CMLA_FUNC(sve2_cmla_zzzz_s, uint32_t, H4, DO_CMLA)
 -DO_CMLA_FUNC(sve2_cmla_zzzz_d, uint64_t,   , DO_CMLA)
 +DO_CMLA_FUNC(sve2_cmla_zzzz_d, uint64_t, H8, DO_CMLA)
  #define DO_SQRDMLAH_B(N, M, A, S) \
      do_sqrdmlah_b(N, M, A, S, true)
@@ -XXX,XX +XXX,XX @@ DO_CMLA_FUNC(sve2_cmla_zzzz_d, uint64_t,   , DO_CMLA)
  DO_CMLA_FUNC(sve2_sqrdcmlah_zzzz_b, int8_t, H1, DO_SQRDMLAH_B)
  DO_CMLA_FUNC(sve2_sqrdcmlah_zzzz_h, int16_t, H2, DO_SQRDMLAH_H)
  DO_CMLA_FUNC(sve2_sqrdcmlah_zzzz_s, int32_t, H4, DO_SQRDMLAH_S)
 -DO_CMLA_FUNC(sve2_sqrdcmlah_zzzz_d, int64_t,   , DO_SQRDMLAH_D)
 +DO_CMLA_FUNC(sve2_sqrdcmlah_zzzz_d, int64_t, H8, DO_SQRDMLAH_D)
  #define DO_CMLA_IDX_FUNC(NAME, TYPE, H, OP) \
  void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)    \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
  DO_ZZXZ(sve2_sqrdmlah_idx_h, int16_t, H2, DO_SQRDMLAH_H)
  DO_ZZXZ(sve2_sqrdmlah_idx_s, int32_t, H4, DO_SQRDMLAH_S)
 -DO_ZZXZ(sve2_sqrdmlah_idx_d, int64_t,   , DO_SQRDMLAH_D)
 +DO_ZZXZ(sve2_sqrdmlah_idx_d, int64_t, H8, DO_SQRDMLAH_D)
  #define DO_SQRDMLSH_H(N, M, A) \
      ({ uint32_t discard; do_sqrdmlah_h(N, M, A, true, true, &discard); })
@@ -XXX,XX +XXX,XX @@ DO_ZZXZ(sve2_sqrdmlah_idx_d, int64_t,   , DO_SQRDMLAH_D)
  DO_ZZXZ(sve2_sqrdmlsh_idx_h, int16_t, H2, DO_SQRDMLSH_H)
  DO_ZZXZ(sve2_sqrdmlsh_idx_s, int32_t, H4, DO_SQRDMLSH_S)
 -DO_ZZXZ(sve2_sqrdmlsh_idx_d, int64_t,   , DO_SQRDMLSH_D)
 +DO_ZZXZ(sve2_sqrdmlsh_idx_d, int64_t, H8, DO_SQRDMLSH_D)
  #undef DO_ZZXZ
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)  \
  #define DO_MLA(N, M, A)  (A + N * M)
  DO_ZZXW(sve2_smlal_idx_s, int32_t, int16_t, H1_4, H1_2, DO_MLA)
 -DO_ZZXW(sve2_smlal_idx_d, int64_t, int32_t,     , H1_4, DO_MLA)
 +DO_ZZXW(sve2_smlal_idx_d, int64_t, int32_t, H1_8, H1_4, DO_MLA)
  DO_ZZXW(sve2_umlal_idx_s, uint32_t, uint16_t, H1_4, H1_2, DO_MLA)
 -DO_ZZXW(sve2_umlal_idx_d, uint64_t, uint32_t,     , H1_4, DO_MLA)
 +DO_ZZXW(sve2_umlal_idx_d, uint64_t, uint32_t, H1_8, H1_4, DO_MLA)
  #define DO_MLS(N, M, A)  (A - N * M)
  DO_ZZXW(sve2_smlsl_idx_s, int32_t, int16_t, H1_4, H1_2, DO_MLS)
 -DO_ZZXW(sve2_smlsl_idx_d, int64_t, int32_t,     , H1_4, DO_MLS)
 +DO_ZZXW(sve2_smlsl_idx_d, int64_t, int32_t, H1_8, H1_4, DO_MLS)
  DO_ZZXW(sve2_umlsl_idx_s, uint32_t, uint16_t, H1_4, H1_2, DO_MLS)
 -DO_ZZXW(sve2_umlsl_idx_d, uint64_t, uint32_t,     , H1_4, DO_MLS)
 +DO_ZZXW(sve2_umlsl_idx_d, uint64_t, uint32_t, H1_8, H1_4, DO_MLS)
  #define DO_SQDMLAL_S(N, M, A)  DO_SQADD_S(A, do_sqdmull_s(N, M))
  #define DO_SQDMLAL_D(N, M, A)  do_sqadd_d(A, do_sqdmull_d(N, M))
  DO_ZZXW(sve2_sqdmlal_idx_s, int32_t, int16_t, H1_4, H1_2, DO_SQDMLAL_S)
 -DO_ZZXW(sve2_sqdmlal_idx_d, int64_t, int32_t,     , H1_4, DO_SQDMLAL_D)
 +DO_ZZXW(sve2_sqdmlal_idx_d, int64_t, int32_t, H1_8, H1_4, DO_SQDMLAL_D)
  #define DO_SQDMLSL_S(N, M, A)  DO_SQSUB_S(A, do_sqdmull_s(N, M))
  #define DO_SQDMLSL_D(N, M, A)  do_sqsub_d(A, do_sqdmull_d(N, M))
  DO_ZZXW(sve2_sqdmlsl_idx_s, int32_t, int16_t, H1_4, H1_2, DO_SQDMLSL_S)
 -DO_ZZXW(sve2_sqdmlsl_idx_d, int64_t, int32_t,     , H1_4, DO_SQDMLSL_D)
 +DO_ZZXW(sve2_sqdmlsl_idx_d, int64_t, int32_t, H1_8, H1_4, DO_SQDMLSL_D)
  #undef DO_MLA
  #undef DO_MLS
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)            \
  }
  DO_ZZX(sve2_sqdmull_idx_s, int32_t, int16_t, H1_4, H1_2, do_sqdmull_s)
 -DO_ZZX(sve2_sqdmull_idx_d, int64_t, int32_t,     , H1_4, do_sqdmull_d)
 +DO_ZZX(sve2_sqdmull_idx_d, int64_t, int32_t, H1_8, H1_4, do_sqdmull_d)
  DO_ZZX(sve2_smull_idx_s, int32_t, int16_t, H1_4, H1_2, DO_MUL)
 -DO_ZZX(sve2_smull_idx_d, int64_t, int32_t,     , H1_4, DO_MUL)
 +DO_ZZX(sve2_smull_idx_d, int64_t, int32_t, H1_8, H1_4, DO_MUL)
  DO_ZZX(sve2_umull_idx_s, uint32_t, uint16_t, H1_4, H1_2, DO_MUL)
 -DO_ZZX(sve2_umull_idx_d, uint64_t, uint32_t,     , H1_4, DO_MUL)
 +DO_ZZX(sve2_umull_idx_d, uint64_t, uint32_t, H1_8, H1_4, DO_MUL)
  #undef DO_ZZX
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)  \
  DO_CADD(sve2_cadd_b, int8_t, H1, DO_ADD, DO_SUB)
  DO_CADD(sve2_cadd_h, int16_t, H1_2, DO_ADD, DO_SUB)
  DO_CADD(sve2_cadd_s, int32_t, H1_4, DO_ADD, DO_SUB)
 -DO_CADD(sve2_cadd_d, int64_t,     , DO_ADD, DO_SUB)
 +DO_CADD(sve2_cadd_d, int64_t, H1_8, DO_ADD, DO_SUB)
  DO_CADD(sve2_sqcadd_b, int8_t, H1, DO_SQADD_B, DO_SQSUB_B)
  DO_CADD(sve2_sqcadd_h, int16_t, H1_2, DO_SQADD_H, DO_SQSUB_H)
  DO_CADD(sve2_sqcadd_s, int32_t, H1_4, DO_SQADD_S, DO_SQSUB_S)
 -DO_CADD(sve2_sqcadd_d, int64_t,     , do_sqadd_d, do_sqsub_d)
 +DO_CADD(sve2_sqcadd_d, int64_t, H1_8, do_sqadd_d, do_sqsub_d)
  #undef DO_CADD
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, uint32_t desc)           \
  DO_ZZI_SHLL(sve2_sshll_h, int16_t, int8_t, H1_2, H1)
  DO_ZZI_SHLL(sve2_sshll_s, int32_t, int16_t, H1_4, H1_2)
 -DO_ZZI_SHLL(sve2_sshll_d, int64_t, int32_t,     , H1_4)
 +DO_ZZI_SHLL(sve2_sshll_d, int64_t, int32_t, H1_8, H1_4)
  DO_ZZI_SHLL(sve2_ushll_h, uint16_t, uint8_t, H1_2, H1)
  DO_ZZI_SHLL(sve2_ushll_s, uint32_t, uint16_t, H1_4, H1_2)
 -DO_ZZI_SHLL(sve2_ushll_d, uint64_t, uint32_t,     , H1_4)
 +DO_ZZI_SHLL(sve2_ushll_d, uint64_t, uint32_t, H1_8, H1_4)
  #undef DO_ZZI_SHLL
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_shrnb_d, uint64_t, uint32_t, DO_SHR)
  DO_SHRNT(sve2_shrnt_h, uint16_t, uint8_t, H1_2, H1, DO_SHR)
  DO_SHRNT(sve2_shrnt_s, uint32_t, uint16_t, H1_4, H1_2, DO_SHR)
 -DO_SHRNT(sve2_shrnt_d, uint64_t, uint32_t,     , H1_4, DO_SHR)
 +DO_SHRNT(sve2_shrnt_d, uint64_t, uint32_t, H1_8, H1_4, DO_SHR)
  DO_SHRNB(sve2_rshrnb_h, uint16_t, uint8_t, do_urshr)
  DO_SHRNB(sve2_rshrnb_s, uint32_t, uint16_t, do_urshr)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_rshrnb_d, uint64_t, uint32_t, do_urshr)
  DO_SHRNT(sve2_rshrnt_h, uint16_t, uint8_t, H1_2, H1, do_urshr)
  DO_SHRNT(sve2_rshrnt_s, uint32_t, uint16_t, H1_4, H1_2, do_urshr)
 -DO_SHRNT(sve2_rshrnt_d, uint64_t, uint32_t,     , H1_4, do_urshr)
 +DO_SHRNT(sve2_rshrnt_d, uint64_t, uint32_t, H1_8, H1_4, do_urshr)
  #define DO_SQSHRUN_H(x, sh) do_sat_bhs((int64_t)(x) >> sh, 0, UINT8_MAX)
  #define DO_SQSHRUN_S(x, sh) do_sat_bhs((int64_t)(x) >> sh, 0, UINT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_sqshrunb_d, int64_t, uint32_t, DO_SQSHRUN_D)
  DO_SHRNT(sve2_sqshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRUN_H)
  DO_SHRNT(sve2_sqshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRUN_S)
 -DO_SHRNT(sve2_sqshrunt_d, int64_t, uint32_t,     , H1_4, DO_SQSHRUN_D)
 +DO_SHRNT(sve2_sqshrunt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQSHRUN_D)
  #define DO_SQRSHRUN_H(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT8_MAX)
  #define DO_SQRSHRUN_S(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_sqrshrunb_d, int64_t, uint32_t, DO_SQRSHRUN_D)
  DO_SHRNT(sve2_sqrshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQRSHRUN_H)
  DO_SHRNT(sve2_sqrshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQRSHRUN_S)
 -DO_SHRNT(sve2_sqrshrunt_d, int64_t, uint32_t,     , H1_4, DO_SQRSHRUN_D)
 +DO_SHRNT(sve2_sqrshrunt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQRSHRUN_D)
  #define DO_SQSHRN_H(x, sh) do_sat_bhs(x >> sh, INT8_MIN, INT8_MAX)
  #define DO_SQSHRN_S(x, sh) do_sat_bhs(x >> sh, INT16_MIN, INT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_sqshrnb_d, int64_t, uint32_t, DO_SQSHRN_D)
  DO_SHRNT(sve2_sqshrnt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRN_H)
  DO_SHRNT(sve2_sqshrnt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRN_S)
 -DO_SHRNT(sve2_sqshrnt_d, int64_t, uint32_t,     , H1_4, DO_SQSHRN_D)
 +DO_SHRNT(sve2_sqshrnt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQSHRN_D)
  #define DO_SQRSHRN_H(x, sh) do_sat_bhs(do_srshr(x, sh), INT8_MIN, INT8_MAX)
  #define DO_SQRSHRN_S(x, sh) do_sat_bhs(do_srshr(x, sh), INT16_MIN, INT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_sqrshrnb_d, int64_t, uint32_t, DO_SQRSHRN_D)
  DO_SHRNT(sve2_sqrshrnt_h, int16_t, uint8_t, H1_2, H1, DO_SQRSHRN_H)
  DO_SHRNT(sve2_sqrshrnt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQRSHRN_S)
 -DO_SHRNT(sve2_sqrshrnt_d, int64_t, uint32_t,     , H1_4, DO_SQRSHRN_D)
 +DO_SHRNT(sve2_sqrshrnt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQRSHRN_D)
  #define DO_UQSHRN_H(x, sh) MIN(x >> sh, UINT8_MAX)
  #define DO_UQSHRN_S(x, sh) MIN(x >> sh, UINT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_uqshrnb_d, uint64_t, uint32_t, DO_UQSHRN_D)
  DO_SHRNT(sve2_uqshrnt_h, uint16_t, uint8_t, H1_2, H1, DO_UQSHRN_H)
  DO_SHRNT(sve2_uqshrnt_s, uint32_t, uint16_t, H1_4, H1_2, DO_UQSHRN_S)
 -DO_SHRNT(sve2_uqshrnt_d, uint64_t, uint32_t,     , H1_4, DO_UQSHRN_D)
 +DO_SHRNT(sve2_uqshrnt_d, uint64_t, uint32_t, H1_8, H1_4, DO_UQSHRN_D)
  #define DO_UQRSHRN_H(x, sh) MIN(do_urshr(x, sh), UINT8_MAX)
  #define DO_UQRSHRN_S(x, sh) MIN(do_urshr(x, sh), UINT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_uqrshrnb_d, uint64_t, uint32_t, DO_UQRSHRN_D)
  DO_SHRNT(sve2_uqrshrnt_h, uint16_t, uint8_t, H1_2, H1, DO_UQRSHRN_H)
  DO_SHRNT(sve2_uqrshrnt_s, uint32_t, uint16_t, H1_4, H1_2, DO_UQRSHRN_S)
 -DO_SHRNT(sve2_uqrshrnt_d, uint64_t, uint32_t,     , H1_4, DO_UQRSHRN_D)
 +DO_SHRNT(sve2_uqrshrnt_d, uint64_t, uint32_t, H1_8, H1_4, DO_UQRSHRN_D)
  #undef DO_SHRNB
  #undef DO_SHRNT
@@ -XXX,XX +XXX,XX @@ DO_BINOPNB(sve2_addhnb_d, uint64_t, uint32_t, 32, DO_ADDHN)
  DO_BINOPNT(sve2_addhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_ADDHN)
  DO_BINOPNT(sve2_addhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_ADDHN)
 -DO_BINOPNT(sve2_addhnt_d, uint64_t, uint32_t, 32,     , H1_4, DO_ADDHN)
 +DO_BINOPNT(sve2_addhnt_d, uint64_t, uint32_t, 32, H1_8, H1_4, DO_ADDHN)
  DO_BINOPNB(sve2_raddhnb_h, uint16_t, uint8_t, 8, DO_RADDHN)
  DO_BINOPNB(sve2_raddhnb_s, uint32_t, uint16_t, 16, DO_RADDHN)
@@ -XXX,XX +XXX,XX @@ DO_BINOPNB(sve2_raddhnb_d, uint64_t, uint32_t, 32, DO_RADDHN)
  DO_BINOPNT(sve2_raddhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_RADDHN)
  DO_BINOPNT(sve2_raddhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_RADDHN)
 -DO_BINOPNT(sve2_raddhnt_d, uint64_t, uint32_t, 32,     , H1_4, DO_RADDHN)
 +DO_BINOPNT(sve2_raddhnt_d, uint64_t, uint32_t, 32, H1_8, H1_4, DO_RADDHN)
  DO_BINOPNB(sve2_subhnb_h, uint16_t, uint8_t, 8, DO_SUBHN)
  DO_BINOPNB(sve2_subhnb_s, uint32_t, uint16_t, 16, DO_SUBHN)
@@ -XXX,XX +XXX,XX @@ DO_BINOPNB(sve2_subhnb_d, uint64_t, uint32_t, 32, DO_SUBHN)
  DO_BINOPNT(sve2_subhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_SUBHN)
  DO_BINOPNT(sve2_subhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_SUBHN)
 -DO_BINOPNT(sve2_subhnt_d, uint64_t, uint32_t, 32,     , H1_4, DO_SUBHN)
 +DO_BINOPNT(sve2_subhnt_d, uint64_t, uint32_t, 32, H1_8, H1_4, DO_SUBHN)
  DO_BINOPNB(sve2_rsubhnb_h, uint16_t, uint8_t, 8, DO_RSUBHN)
  DO_BINOPNB(sve2_rsubhnb_s, uint32_t, uint16_t, 16, DO_RSUBHN)
@@ -XXX,XX +XXX,XX @@ DO_BINOPNB(sve2_rsubhnb_d, uint64_t, uint32_t, 32, DO_RSUBHN)
  DO_BINOPNT(sve2_rsubhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_RSUBHN)
  DO_BINOPNT(sve2_rsubhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_RSUBHN)
 -DO_BINOPNT(sve2_rsubhnt_d, uint64_t, uint32_t, 32,     , H1_4, DO_RSUBHN)
 +DO_BINOPNT(sve2_rsubhnt_d, uint64_t, uint32_t, 32, H1_8, H1_4, DO_RSUBHN)
  #undef DO_RSUBHN
  #undef DO_SUBHN
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, uint64_t val, uint32_t desc) \
  DO_INSR(sve_insr_b, uint8_t, H1)
  DO_INSR(sve_insr_h, uint16_t, H1_2)
  DO_INSR(sve_insr_s, uint32_t, H1_4)
 -DO_INSR(sve_insr_d, uint64_t, )
 +DO_INSR(sve_insr_d, uint64_t, H1_8)
  #undef DO_INSR
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_tbx_##SUFF)(void *vd, void *vn, void *vm, uint32_t desc) \
  DO_TB(b, uint8_t, H1)
  DO_TB(h, uint16_t, H2)
  DO_TB(s, uint32_t, H4)
 -DO_TB(d, uint64_t,   )
 +DO_TB(d, uint64_t, H8)
  #undef DO_TB
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, uint32_t desc)           \
  DO_UNPK(sve_sunpk_h, int16_t, int8_t, H2, H1)
  DO_UNPK(sve_sunpk_s, int32_t, int16_t, H4, H2)
 -DO_UNPK(sve_sunpk_d, int64_t, int32_t, , H4)
 +DO_UNPK(sve_sunpk_d, int64_t, int32_t, H8, H4)
  DO_UNPK(sve_uunpk_h, uint16_t, uint8_t, H2, H1)
  DO_UNPK(sve_uunpk_s, uint32_t, uint16_t, H4, H2)
 -DO_UNPK(sve_uunpk_d, uint64_t, uint32_t, , H4)
 +DO_UNPK(sve_uunpk_d, uint64_t, uint32_t, H8, H4)
  #undef DO_UNPK
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)       \
  DO_ZIP(sve_zip_b, uint8_t, H1)
  DO_ZIP(sve_zip_h, uint16_t, H1_2)
  DO_ZIP(sve_zip_s, uint32_t, H1_4)
 -DO_ZIP(sve_zip_d, uint64_t, )
 +DO_ZIP(sve_zip_d, uint64_t, H1_8)
  DO_ZIP(sve2_zip_q, Int128, )
  #define DO_UZP(NAME, TYPE, H) \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)         \
  DO_UZP(sve_uzp_b, uint8_t, H1)
  DO_UZP(sve_uzp_h, uint16_t, H1_2)
  DO_UZP(sve_uzp_s, uint32_t, H1_4)
 -DO_UZP(sve_uzp_d, uint64_t, )
 +DO_UZP(sve_uzp_d, uint64_t, H1_8)
  DO_UZP(sve2_uzp_q, Int128, )
  #define DO_TRN(NAME, TYPE, H) \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)         \
  DO_TRN(sve_trn_b, uint8_t, H1)
  DO_TRN(sve_trn_h, uint16_t, H1_2)
  DO_TRN(sve_trn_s, uint32_t, H1_4)
 -DO_TRN(sve_trn_d, uint64_t, )
 +DO_TRN(sve_trn_d, uint64_t, H1_8)
  DO_TRN(sve2_trn_q, Int128, )
  #undef DO_ZIP
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
  #define DO_CMP_PPZZ_S(NAME, TYPE, OP) \
      DO_CMP_PPZZ(NAME, TYPE, OP, H1_4, 0x1111111111111111ull)
  #define DO_CMP_PPZZ_D(NAME, TYPE, OP) \
 -    DO_CMP_PPZZ(NAME, TYPE, OP,     , 0x0101010101010101ull)
 +    DO_CMP_PPZZ(NAME, TYPE, OP, H1_8, 0x0101010101010101ull)
  DO_CMP_PPZZ_B(sve_cmpeq_ppzz_b, uint8_t,  ==)
  DO_CMP_PPZZ_H(sve_cmpeq_ppzz_h, uint16_t, ==)
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc)   \
  #define DO_CMP_PPZI_S(NAME, TYPE, OP) \
      DO_CMP_PPZI(NAME, TYPE, OP, H1_4, 0x1111111111111111ull)
  #define DO_CMP_PPZI_D(NAME, TYPE, OP) \
 -    DO_CMP_PPZI(NAME, TYPE, OP,     , 0x0101010101010101ull)
 +    DO_CMP_PPZI(NAME, TYPE, OP, H1_8, 0x0101010101010101ull)
  DO_CMP_PPZI_B(sve_cmpeq_ppzi_b, uint8_t,  ==)
  DO_CMP_PPZI_H(sve_cmpeq_ppzi_h, uint16_t, ==)
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(NAME)(void *vn, void *vg, void *vs, uint32_t desc)    \
  DO_REDUCE(sve_faddv_h, float16, H1_2, add, float16_zero)
  DO_REDUCE(sve_faddv_s, float32, H1_4, add, float32_zero)
 -DO_REDUCE(sve_faddv_d, float64,     , add, float64_zero)
 +DO_REDUCE(sve_faddv_d, float64, H1_8, add, float64_zero)
  /* Identity is floatN_default_nan, without the function call.  */
  DO_REDUCE(sve_fminnmv_h, float16, H1_2, minnum, 0x7E00)
  DO_REDUCE(sve_fminnmv_s, float32, H1_4, minnum, 0x7FC00000)
 -DO_REDUCE(sve_fminnmv_d, float64,     , minnum, 0x7FF8000000000000ULL)
 +DO_REDUCE(sve_fminnmv_d, float64, H1_8, minnum, 0x7FF8000000000000ULL)
  DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, maxnum, 0x7E00)
  DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, maxnum, 0x7FC00000)
 -DO_REDUCE(sve_fmaxnmv_d, float64,     , maxnum, 0x7FF8000000000000ULL)
 +DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, maxnum, 0x7FF8000000000000ULL)
  DO_REDUCE(sve_fminv_h, float16, H1_2, min, float16_infinity)
  DO_REDUCE(sve_fminv_s, float32, H1_4, min, float32_infinity)
 -DO_REDUCE(sve_fminv_d, float64,     , min, float64_infinity)
 +DO_REDUCE(sve_fminv_d, float64, H1_8, min, float64_infinity)
  DO_REDUCE(sve_fmaxv_h, float16, H1_2, max, float16_chs(float16_infinity))
  DO_REDUCE(sve_fmaxv_s, float32, H1_4, max, float32_chs(float32_infinity))
 -DO_REDUCE(sve_fmaxv_d, float64,     , max, float64_chs(float64_infinity))
 +DO_REDUCE(sve_fmaxv_d, float64, H1_8, max, float64_chs(float64_infinity))
  #undef DO_REDUCE
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg,       \
  DO_ZPZZ_FP(sve_fadd_h, uint16_t, H1_2, float16_add)
  DO_ZPZZ_FP(sve_fadd_s, uint32_t, H1_4, float32_add)
 -DO_ZPZZ_FP(sve_fadd_d, uint64_t,     , float64_add)
 +DO_ZPZZ_FP(sve_fadd_d, uint64_t, H1_8, float64_add)
  DO_ZPZZ_FP(sve_fsub_h, uint16_t, H1_2, float16_sub)
  DO_ZPZZ_FP(sve_fsub_s, uint32_t, H1_4, float32_sub)
 -DO_ZPZZ_FP(sve_fsub_d, uint64_t,     , float64_sub)
 +DO_ZPZZ_FP(sve_fsub_d, uint64_t, H1_8, float64_sub)
  DO_ZPZZ_FP(sve_fmul_h, uint16_t, H1_2, float16_mul)
  DO_ZPZZ_FP(sve_fmul_s, uint32_t, H1_4, float32_mul)
 -DO_ZPZZ_FP(sve_fmul_d, uint64_t,     , float64_mul)
 +DO_ZPZZ_FP(sve_fmul_d, uint64_t, H1_8, float64_mul)
  DO_ZPZZ_FP(sve_fdiv_h, uint16_t, H1_2, float16_div)
  DO_ZPZZ_FP(sve_fdiv_s, uint32_t, H1_4, float32_div)
 -DO_ZPZZ_FP(sve_fdiv_d, uint64_t,     , float64_div)
 +DO_ZPZZ_FP(sve_fdiv_d, uint64_t, H1_8, float64_div)
  DO_ZPZZ_FP(sve_fmin_h, uint16_t, H1_2, float16_min)
  DO_ZPZZ_FP(sve_fmin_s, uint32_t, H1_4, float32_min)
 -DO_ZPZZ_FP(sve_fmin_d, uint64_t,     , float64_min)
 +DO_ZPZZ_FP(sve_fmin_d, uint64_t, H1_8, float64_min)
  DO_ZPZZ_FP(sve_fmax_h, uint16_t, H1_2, float16_max)
  DO_ZPZZ_FP(sve_fmax_s, uint32_t, H1_4, float32_max)
 -DO_ZPZZ_FP(sve_fmax_d, uint64_t,     , float64_max)
 +DO_ZPZZ_FP(sve_fmax_d, uint64_t, H1_8, float64_max)
  DO_ZPZZ_FP(sve_fminnum_h, uint16_t, H1_2, float16_minnum)
  DO_ZPZZ_FP(sve_fminnum_s, uint32_t, H1_4, float32_minnum)
 -DO_ZPZZ_FP(sve_fminnum_d, uint64_t,     , float64_minnum)
 +DO_ZPZZ_FP(sve_fminnum_d, uint64_t, H1_8, float64_minnum)
  DO_ZPZZ_FP(sve_fmaxnum_h, uint16_t, H1_2, float16_maxnum)
  DO_ZPZZ_FP(sve_fmaxnum_s, uint32_t, H1_4, float32_maxnum)
 -DO_ZPZZ_FP(sve_fmaxnum_d, uint64_t,     , float64_maxnum)
 +DO_ZPZZ_FP(sve_fmaxnum_d, uint64_t, H1_8, float64_maxnum)
  static inline float16 abd_h(float16 a, float16 b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static inline float64 abd_d(float64 a, float64 b, float_status *s)
  DO_ZPZZ_FP(sve_fabd_h, uint16_t, H1_2, abd_h)
  DO_ZPZZ_FP(sve_fabd_s, uint32_t, H1_4, abd_s)
 -DO_ZPZZ_FP(sve_fabd_d, uint64_t,     , abd_d)
 +DO_ZPZZ_FP(sve_fabd_d, uint64_t, H1_8, abd_d)
  static inline float64 scalbn_d(float64 a, int64_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static inline float64 scalbn_d(float64 a, int64_t b, float_status *s)
  DO_ZPZZ_FP(sve_fscalbn_h, int16_t, H1_2, float16_scalbn)
  DO_ZPZZ_FP(sve_fscalbn_s, int32_t, H1_4, float32_scalbn)
 -DO_ZPZZ_FP(sve_fscalbn_d, int64_t,     , scalbn_d)
 +DO_ZPZZ_FP(sve_fscalbn_d, int64_t, H1_8, scalbn_d)
  DO_ZPZZ_FP(sve_fmulx_h, uint16_t, H1_2, helper_advsimd_mulxh)
  DO_ZPZZ_FP(sve_fmulx_s, uint32_t, H1_4, helper_vfp_mulxs)
 -DO_ZPZZ_FP(sve_fmulx_d, uint64_t,     , helper_vfp_mulxd)
 +DO_ZPZZ_FP(sve_fmulx_d, uint64_t, H1_8, helper_vfp_mulxd)
  #undef DO_ZPZZ_FP
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, uint64_t scalar,  \
  DO_ZPZS_FP(sve_fadds_h, float16, H1_2, float16_add)
  DO_ZPZS_FP(sve_fadds_s, float32, H1_4, float32_add)
 -DO_ZPZS_FP(sve_fadds_d, float64,     , float64_add)
 +DO_ZPZS_FP(sve_fadds_d, float64, H1_8, float64_add)
  DO_ZPZS_FP(sve_fsubs_h, float16, H1_2, float16_sub)
  DO_ZPZS_FP(sve_fsubs_s, float32, H1_4, float32_sub)
 -DO_ZPZS_FP(sve_fsubs_d, float64,     , float64_sub)
 +DO_ZPZS_FP(sve_fsubs_d, float64, H1_8, float64_sub)
  DO_ZPZS_FP(sve_fmuls_h, float16, H1_2, float16_mul)
  DO_ZPZS_FP(sve_fmuls_s, float32, H1_4, float32_mul)
 -DO_ZPZS_FP(sve_fmuls_d, float64,     , float64_mul)
 +DO_ZPZS_FP(sve_fmuls_d, float64, H1_8, float64_mul)
  static inline float16 subr_h(float16 a, float16 b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static inline float64 subr_d(float64 a, float64 b, float_status *s)
  DO_ZPZS_FP(sve_fsubrs_h, float16, H1_2, subr_h)
  DO_ZPZS_FP(sve_fsubrs_s, float32, H1_4, subr_s)
 -DO_ZPZS_FP(sve_fsubrs_d, float64,     , subr_d)
 +DO_ZPZS_FP(sve_fsubrs_d, float64, H1_8, subr_d)
  DO_ZPZS_FP(sve_fmaxnms_h, float16, H1_2, float16_maxnum)
  DO_ZPZS_FP(sve_fmaxnms_s, float32, H1_4, float32_maxnum)
 -DO_ZPZS_FP(sve_fmaxnms_d, float64,     , float64_maxnum)
 +DO_ZPZS_FP(sve_fmaxnms_d, float64, H1_8, float64_maxnum)
  DO_ZPZS_FP(sve_fminnms_h, float16, H1_2, float16_minnum)
  DO_ZPZS_FP(sve_fminnms_s, float32, H1_4, float32_minnum)
 -DO_ZPZS_FP(sve_fminnms_d, float64,     , float64_minnum)
 +DO_ZPZS_FP(sve_fminnms_d, float64, H1_8, float64_minnum)
  DO_ZPZS_FP(sve_fmaxs_h, float16, H1_2, float16_max)
  DO_ZPZS_FP(sve_fmaxs_s, float32, H1_4, float32_max)
 -DO_ZPZS_FP(sve_fmaxs_d, float64,     , float64_max)
 +DO_ZPZS_FP(sve_fmaxs_d, float64, H1_8, float64_max)
  DO_ZPZS_FP(sve_fmins_h, float16, H1_2, float16_min)
  DO_ZPZS_FP(sve_fmins_s, float32, H1_4, float32_min)
 -DO_ZPZS_FP(sve_fmins_d, float64,     , float64_min)
 +DO_ZPZS_FP(sve_fmins_d, float64, H1_8, float64_min)
  /* Fully general two-operand expander, controlled by a predicate,
   * With the extra float_status parameter.
@@ -XXX,XX +XXX,XX @@ static inline uint64_t vfp_float64_to_uint64_rtz(float64 f, float_status *s)
  DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16)
  DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32)
  DO_ZPZ_FP(sve_bfcvt,   uint32_t, H1_4, float32_to_bfloat16)
 -DO_ZPZ_FP(sve_fcvt_dh, uint64_t,     , sve_f64_to_f16)
 -DO_ZPZ_FP(sve_fcvt_hd, uint64_t,     , sve_f16_to_f64)
 -DO_ZPZ_FP(sve_fcvt_ds, uint64_t,     , float64_to_float32)
 -DO_ZPZ_FP(sve_fcvt_sd, uint64_t,     , float32_to_float64)
 +DO_ZPZ_FP(sve_fcvt_dh, uint64_t, H1_8, sve_f64_to_f16)
 +DO_ZPZ_FP(sve_fcvt_hd, uint64_t, H1_8, sve_f16_to_f64)
 +DO_ZPZ_FP(sve_fcvt_ds, uint64_t, H1_8, float64_to_float32)
 +DO_ZPZ_FP(sve_fcvt_sd, uint64_t, H1_8, float32_to_float64)
  DO_ZPZ_FP(sve_fcvtzs_hh, uint16_t, H1_2, vfp_float16_to_int16_rtz)
  DO_ZPZ_FP(sve_fcvtzs_hs, uint32_t, H1_4, helper_vfp_tosizh)
  DO_ZPZ_FP(sve_fcvtzs_ss, uint32_t, H1_4, helper_vfp_tosizs)
 -DO_ZPZ_FP(sve_fcvtzs_hd, uint64_t,     , vfp_float16_to_int64_rtz)
 -DO_ZPZ_FP(sve_fcvtzs_sd, uint64_t,     , vfp_float32_to_int64_rtz)
 -DO_ZPZ_FP(sve_fcvtzs_ds, uint64_t,     , helper_vfp_tosizd)
 -DO_ZPZ_FP(sve_fcvtzs_dd, uint64_t,     , vfp_float64_to_int64_rtz)
 +DO_ZPZ_FP(sve_fcvtzs_hd, uint64_t, H1_8, vfp_float16_to_int64_rtz)
 +DO_ZPZ_FP(sve_fcvtzs_sd, uint64_t, H1_8, vfp_float32_to_int64_rtz)
 +DO_ZPZ_FP(sve_fcvtzs_ds, uint64_t, H1_8, helper_vfp_tosizd)
 +DO_ZPZ_FP(sve_fcvtzs_dd, uint64_t, H1_8, vfp_float64_to_int64_rtz)
  DO_ZPZ_FP(sve_fcvtzu_hh, uint16_t, H1_2, vfp_float16_to_uint16_rtz)
  DO_ZPZ_FP(sve_fcvtzu_hs, uint32_t, H1_4, helper_vfp_touizh)
  DO_ZPZ_FP(sve_fcvtzu_ss, uint32_t, H1_4, helper_vfp_touizs)
 -DO_ZPZ_FP(sve_fcvtzu_hd, uint64_t,     , vfp_float16_to_uint64_rtz)
 -DO_ZPZ_FP(sve_fcvtzu_sd, uint64_t,     , vfp_float32_to_uint64_rtz)
 -DO_ZPZ_FP(sve_fcvtzu_ds, uint64_t,     , helper_vfp_touizd)
 -DO_ZPZ_FP(sve_fcvtzu_dd, uint64_t,     , vfp_float64_to_uint64_rtz)
 +DO_ZPZ_FP(sve_fcvtzu_hd, uint64_t, H1_8, vfp_float16_to_uint64_rtz)
 +DO_ZPZ_FP(sve_fcvtzu_sd, uint64_t, H1_8, vfp_float32_to_uint64_rtz)
 +DO_ZPZ_FP(sve_fcvtzu_ds, uint64_t, H1_8, helper_vfp_touizd)
 +DO_ZPZ_FP(sve_fcvtzu_dd, uint64_t, H1_8, vfp_float64_to_uint64_rtz)
  DO_ZPZ_FP(sve_frint_h, uint16_t, H1_2, helper_advsimd_rinth)
  DO_ZPZ_FP(sve_frint_s, uint32_t, H1_4, helper_rints)
 -DO_ZPZ_FP(sve_frint_d, uint64_t,     , helper_rintd)
 +DO_ZPZ_FP(sve_frint_d, uint64_t, H1_8, helper_rintd)
  DO_ZPZ_FP(sve_frintx_h, uint16_t, H1_2, float16_round_to_int)
  DO_ZPZ_FP(sve_frintx_s, uint32_t, H1_4, float32_round_to_int)
 -DO_ZPZ_FP(sve_frintx_d, uint64_t,     , float64_round_to_int)
 +DO_ZPZ_FP(sve_frintx_d, uint64_t, H1_8, float64_round_to_int)
  DO_ZPZ_FP(sve_frecpx_h, uint16_t, H1_2, helper_frecpx_f16)
  DO_ZPZ_FP(sve_frecpx_s, uint32_t, H1_4, helper_frecpx_f32)
 -DO_ZPZ_FP(sve_frecpx_d, uint64_t,     , helper_frecpx_f64)
 +DO_ZPZ_FP(sve_frecpx_d, uint64_t, H1_8, helper_frecpx_f64)
  DO_ZPZ_FP(sve_fsqrt_h, uint16_t, H1_2, float16_sqrt)
  DO_ZPZ_FP(sve_fsqrt_s, uint32_t, H1_4, float32_sqrt)
 -DO_ZPZ_FP(sve_fsqrt_d, uint64_t,     , float64_sqrt)
 +DO_ZPZ_FP(sve_fsqrt_d, uint64_t, H1_8, float64_sqrt)
  DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16)
  DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16)
  DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32)
 -DO_ZPZ_FP(sve_scvt_sd, uint64_t,     , int32_to_float64)
 -DO_ZPZ_FP(sve_scvt_dh, uint64_t,     , int64_to_float16)
 -DO_ZPZ_FP(sve_scvt_ds, uint64_t,     , int64_to_float32)
 -DO_ZPZ_FP(sve_scvt_dd, uint64_t,     , int64_to_float64)
 +DO_ZPZ_FP(sve_scvt_sd, uint64_t, H1_8, int32_to_float64)
 +DO_ZPZ_FP(sve_scvt_dh, uint64_t, H1_8, int64_to_float16)
 +DO_ZPZ_FP(sve_scvt_ds, uint64_t, H1_8, int64_to_float32)
 +DO_ZPZ_FP(sve_scvt_dd, uint64_t, H1_8, int64_to_float64)
  DO_ZPZ_FP(sve_ucvt_hh, uint16_t, H1_2, uint16_to_float16)
  DO_ZPZ_FP(sve_ucvt_sh, uint32_t, H1_4, uint32_to_float16)
  DO_ZPZ_FP(sve_ucvt_ss, uint32_t, H1_4, uint32_to_float32)
 -DO_ZPZ_FP(sve_ucvt_sd, uint64_t,     , uint32_to_float64)
 -DO_ZPZ_FP(sve_ucvt_dh, uint64_t,     , uint64_to_float16)
 -DO_ZPZ_FP(sve_ucvt_ds, uint64_t,     , uint64_to_float32)
 -DO_ZPZ_FP(sve_ucvt_dd, uint64_t,     , uint64_to_float64)
 +DO_ZPZ_FP(sve_ucvt_sd, uint64_t, H1_8, uint32_to_float64)
 +DO_ZPZ_FP(sve_ucvt_dh, uint64_t, H1_8, uint64_to_float16)
 +DO_ZPZ_FP(sve_ucvt_ds, uint64_t, H1_8, uint64_to_float32)
 +DO_ZPZ_FP(sve_ucvt_dd, uint64_t, H1_8, uint64_to_float64)
  static int16_t do_float16_logb_as_int(float16 a, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static int64_t do_float64_logb_as_int(float64 a, float_status *s)
  DO_ZPZ_FP(flogb_h, float16, H1_2, do_float16_logb_as_int)
  DO_ZPZ_FP(flogb_s, float32, H1_4, do_float32_logb_as_int)
 -DO_ZPZ_FP(flogb_d, float64,     , do_float64_logb_as_int)
 +DO_ZPZ_FP(flogb_d, float64, H1_8, do_float64_logb_as_int)
  #undef DO_ZPZ_FP
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg,               \
  #define DO_FPCMP_PPZZ_S(NAME, OP) \
      DO_FPCMP_PPZZ(NAME##_s, float32, H1_4, OP)
  #define DO_FPCMP_PPZZ_D(NAME, OP) \
 -    DO_FPCMP_PPZZ(NAME##_d, float64,     , OP)
 +    DO_FPCMP_PPZZ(NAME##_d, float64, H1_8, OP)
  #define DO_FPCMP_PPZZ_ALL(NAME, OP) \
      DO_FPCMP_PPZZ_H(NAME, OP)   \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg,            \
  #define DO_FPCMP_PPZ0_S(NAME, OP) \
      DO_FPCMP_PPZ0(NAME##_s, float32, H1_4, OP)
  #define DO_FPCMP_PPZ0_D(NAME, OP) \
 -    DO_FPCMP_PPZ0(NAME##_d, float64,     , OP)
 +    DO_FPCMP_PPZ0(NAME##_d, float64, H1_8, OP)
  #define DO_FPCMP_PPZ0_ALL(NAME, OP) \
      DO_FPCMP_PPZ0_H(NAME, OP)   \
@@ -XXX,XX +XXX,XX @@ DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
  DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t,  int8_t)
  DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
  DO_LD_PRIM_1(ld1bss, H1_4, uint32_t,  int8_t)
 -DO_LD_PRIM_1(ld1bdu,     , uint64_t, uint8_t)
 -DO_LD_PRIM_1(ld1bds,     , uint64_t,  int8_t)
 +DO_LD_PRIM_1(ld1bdu, H1_8, uint64_t, uint8_t)
 +DO_LD_PRIM_1(ld1bds, H1_8, uint64_t,  int8_t)
  #define DO_ST_PRIM_1(NAME, H, TE, TM)                   \
      DO_ST_HOST(st1##NAME, H, TE, TM, stb_p)             \
@@ -XXX,XX +XXX,XX @@ DO_LD_PRIM_1(ld1bds,     , uint64_t,  int8_t)
  DO_ST_PRIM_1(bb,   H1,  uint8_t, uint8_t)
  DO_ST_PRIM_1(bh, H1_2, uint16_t, uint8_t)
  DO_ST_PRIM_1(bs, H1_4, uint32_t, uint8_t)
 -DO_ST_PRIM_1(bd,     , uint64_t, uint8_t)
 +DO_ST_PRIM_1(bd, H1_8, uint64_t, uint8_t)
  #define DO_LD_PRIM_2(NAME, H, TE, TM, LD) \
      DO_LD_HOST(ld1##NAME##_be, H, TE, TM, LD##_be_p)    \
@@ -XXX,XX +XXX,XX @@ DO_ST_PRIM_1(bd,     , uint64_t, uint8_t)
  DO_LD_PRIM_2(hh,  H1_2, uint16_t, uint16_t, lduw)
  DO_LD_PRIM_2(hsu, H1_4, uint32_t, uint16_t, lduw)
  DO_LD_PRIM_2(hss, H1_4, uint32_t,  int16_t, lduw)
 -DO_LD_PRIM_2(hdu,     , uint64_t, uint16_t, lduw)
 -DO_LD_PRIM_2(hds,     , uint64_t,  int16_t, lduw)
 +DO_LD_PRIM_2(hdu, H1_8, uint64_t, uint16_t, lduw)
 +DO_LD_PRIM_2(hds, H1_8, uint64_t,  int16_t, lduw)
  DO_ST_PRIM_2(hh, H1_2, uint16_t, uint16_t, stw)
  DO_ST_PRIM_2(hs, H1_4, uint32_t, uint16_t, stw)
 -DO_ST_PRIM_2(hd,     , uint64_t, uint16_t, stw)
 +DO_ST_PRIM_2(hd, H1_8, uint64_t, uint16_t, stw)
  DO_LD_PRIM_2(ss,  H1_4, uint32_t, uint32_t, ldl)
 -DO_LD_PRIM_2(sdu,     , uint64_t, uint32_t, ldl)
 -DO_LD_PRIM_2(sds,     , uint64_t,  int32_t, ldl)
 +DO_LD_PRIM_2(sdu, H1_8, uint64_t, uint32_t, ldl)
 +DO_LD_PRIM_2(sds, H1_8, uint64_t,  int32_t, ldl)
  DO_ST_PRIM_2(ss, H1_4, uint32_t, uint32_t, stl)
 -DO_ST_PRIM_2(sd,     , uint64_t, uint32_t, stl)
 +DO_ST_PRIM_2(sd, H1_8, uint64_t, uint32_t, stl)
 -DO_LD_PRIM_2(dd,     , uint64_t, uint64_t, ldq)
 -DO_ST_PRIM_2(dd,     , uint64_t, uint64_t, stq)
 +DO_LD_PRIM_2(dd, H1_8, uint64_t, uint64_t, ldq)
 +DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq)
  #undef DO_LD_TLB
  #undef DO_ST_TLB
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
  DO_FCVTNT(sve_bfcvtnt,    uint32_t, uint16_t, H1_4, H1_2, float32_to_bfloat16)
  DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16)
 -DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t,     , H1_4, float64_to_float32)
 +DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t, H1_8, H1_4, float64_to_float32)
  #define DO_FCVTLT(NAME, TYPEW, TYPEN, HW, HN, OP)                             \
  void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
  }
  DO_FCVTLT(sve2_fcvtlt_hs, uint32_t, uint16_t, H1_4, H1_2, sve_f16_to_f32)
 -DO_FCVTLT(sve2_fcvtlt_sd, uint64_t, uint32_t,     , H1_4, float32_to_float64)
 +DO_FCVTLT(sve2_fcvtlt_sd, uint64_t, uint32_t, H1_8, H1_4, float32_to_float64)
  #undef DO_FCVTLT
  #undef DO_FCVTNT
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_DOT_IDX(gvec_sdot_idx_b, int32_t, int8_t, int8_t, H4)
  DO_DOT_IDX(gvec_udot_idx_b, uint32_t, uint8_t, uint8_t, H4)
  DO_DOT_IDX(gvec_sudot_idx_b, int32_t, int8_t, uint8_t, H4)
  DO_DOT_IDX(gvec_usdot_idx_b, int32_t, uint8_t, int8_t, H4)
 -DO_DOT_IDX(gvec_sdot_idx_h, int64_t, int16_t, int16_t, )
 -DO_DOT_IDX(gvec_udot_idx_h, uint64_t, uint16_t, uint16_t, )
 +DO_DOT_IDX(gvec_sdot_idx_h, int64_t, int16_t, int16_t, H8)
 +DO_DOT_IDX(gvec_udot_idx_h, uint64_t, uint16_t, uint16_t, H8)
  void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
                           void *vfpst, uint32_t desc)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
  DO_MUL_IDX(gvec_mul_idx_h, uint16_t, H2)
  DO_MUL_IDX(gvec_mul_idx_s, uint32_t, H4)
 -DO_MUL_IDX(gvec_mul_idx_d, uint64_t, )
 +DO_MUL_IDX(gvec_mul_idx_d, uint64_t, H8)
  #undef DO_MUL_IDX
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)   \
  DO_MLA_IDX(gvec_mla_idx_h, uint16_t, +, H2)
  DO_MLA_IDX(gvec_mla_idx_s, uint32_t, +, H4)
 -DO_MLA_IDX(gvec_mla_idx_d, uint64_t, +,   )
 +DO_MLA_IDX(gvec_mla_idx_d, uint64_t, +, H8)
  DO_MLA_IDX(gvec_mls_idx_h, uint16_t, -, H2)
  DO_MLA_IDX(gvec_mls_idx_s, uint32_t, -, H4)
 -DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -,   )
 +DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -, H8)
  #undef DO_MLA_IDX
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
  DO_FMUL_IDX(gvec_fmul_idx_h, nop, float16, H2)
  DO_FMUL_IDX(gvec_fmul_idx_s, nop, float32, H4)
 -DO_FMUL_IDX(gvec_fmul_idx_d, nop, float64, )
 +DO_FMUL_IDX(gvec_fmul_idx_d, nop, float64, H8)
  /*
   * Non-fused multiply-accumulate operations, for Neon. NB that unlike
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
  DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2)
  DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
 -DO_FMLA_IDX(gvec_fmla_idx_d, float64, )
 +DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8)
  #undef DO_FMLA_IDX
 --
-.20.1
+.25.1

-[PULL 13/28] hw/arm: gsj add i2c comments
+[PULL 25/30] riscv: re-randomize rng-seed on reboot
-From: Patrick Venture <venture@google.com>
+From: "Jason A. Donenfeld" <Jason@zx2c4.com>
-Adds comments to the board init to identify missing i2c devices.
+When the system reboots, the rng-seed that the FDT has should be
 re-randomized, so that the new boot gets a new seed. Since the FDT is in
 the ROM region at this point, we add a hook right after the ROM has been
 added, so that we have a pointer to that copy of the FDT.
-Signed-off-by: Patrick Venture <venture@google.com>
+Cc: Palmer Dabbelt <palmer@dabbelt.com>
-Reviewed-by: Hao Wu <wuhaotsh@google.com>
+Cc: Alistair Francis <alistair.francis@wdc.com>
-Reviewed-by: Joel Stanley <joel@jms.id.au>
+Cc: Bin Meng <bin.meng@windriver.com>
-Message-id: 20210608202522.2677850-2-venture@google.com
+Cc: qemu-riscv@nongnu.org
 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Message-id: 20221025004327.568476-6-Jason@zx2c4.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/npcm7xx_boards.c | 16 +++++++++++++++-
+ hw/riscv/boot.c | 3 +++
-file changed, 15 insertions(+), 1 deletion(-)
+file changed, 3 insertions(+)
-diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
+diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/npcm7xx_boards.c
+--- a/hw/riscv/boot.c
-+++ b/hw/arm/npcm7xx_boards.c
++++ b/hw/riscv/boot.c
-@@ -XXX,XX +XXX,XX @@ static void quanta_gsj_i2c_init(NPCM7xxState *soc)
+@@ -XXX,XX +XXX,XX @@
-     at24c_eeprom_init(soc, 9, 0x55, 8192);
+ #include "sysemu/device_tree.h"
-     at24c_eeprom_init(soc, 10, 0x55, 8192);
+ #include "sysemu/qtest.h"
+ #include "sysemu/kvm.h"
--    /* TODO: Add additional i2c devices. */
++#include "sysemu/reset.h"
-+    /*
-+     * i2c-11:
+ #include <libfdt.h>
-+     * - power-brick@36: delta,dps800
-+     * - hotswap@15: ti,lm5066i
+@@ -XXX,XX +XXX,XX @@ uint64_t riscv_load_fdt(hwaddr dram_base, uint64_t mem_size, void *fdt)
-+     */
-+
+     rom_add_blob_fixed_as("fdt", fdt, fdtsize, fdt_addr,
-+    /*
+                           &address_space_memory);
-+     * i2c-12:
++    qemu_register_reset_nosnapshotload(qemu_fdt_randomize_seeds,
-+     * - ucd90160@6b
++                        rom_ptr_for_as(&address_space_memory, fdt_addr, fdtsize));
-+     */
-+
+     return fdt_addr;
 +    /*
 +     * i2c-15:
 +     * - pca9548@75
 +     */
  }
- static void quanta_gsj_fan_init(NPCM7xxMachine *machine, NPCM7xxState *soc)
 --
-.20.1
+.25.1

-[PULL 07/28] hw/arm: quanta-gbs-bmc add i2c comments
+[PULL 26/30] m68k/virt: do not re-randomize RNG seed on snapshot load
-From: Patrick Venture <venture@google.com>
+From: "Jason A. Donenfeld" <Jason@zx2c4.com>
-Add a comment and i2c method that describes the board layout.
+Snapshot loading is supposed to be deterministic, so we shouldn't
 re-randomize the various seeds used.
-Tested: firmware booted to userspace.
+Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
-Signed-off-by: Patrick Venture <venture@google.com>
+Message-id: 20221025004327.568476-7-Jason@zx2c4.com
-Reviewed-by: Brandon Kim <brandonkim@google.com>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Hao Wu <wuhaotsh@google.com>
 Message-id: 20210608193605.2611114-3-venture@google.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/npcm7xx_boards.c | 60 +++++++++++++++++++++++++++++++++++++++++
+ hw/m68k/virt.c | 20 +++++++++++---------
-file changed, 60 insertions(+)
+file changed, 11 insertions(+), 9 deletions(-)
-diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
+diff --git a/hw/m68k/virt.c b/hw/m68k/virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/npcm7xx_boards.c
+--- a/hw/m68k/virt.c
-+++ b/hw/arm/npcm7xx_boards.c
++++ b/hw/m68k/virt.c
-@@ -XXX,XX +XXX,XX @@ static void quanta_gsj_fan_init(NPCM7xxMachine *machine, NPCM7xxState *soc)
+@@ -XXX,XX +XXX,XX @@ typedef struct {
-     npcm7xx_connect_pwm_fan(soc, &splitter[2], 0x05, 1);
+     M68kCPU *cpu;
      hwaddr initial_pc;
      hwaddr initial_stack;
 -    struct bi_record *rng_seed;
  } ResetInfo;
  static void main_cpu_reset(void *opaque)
@@ -XXX,XX +XXX,XX @@ static void main_cpu_reset(void *opaque)
      M68kCPU *cpu = reset_info->cpu;
      CPUState *cs = CPU(cpu);
 -    if (reset_info->rng_seed) {
 -        qemu_guest_getrandom_nofail((void *)reset_info->rng_seed->data + 2,
 -            be16_to_cpu(*(uint16_t *)reset_info->rng_seed->data));
 -    }
 -
      cpu_reset(cs);
      cpu->env.aregs[7] = reset_info->initial_stack;
      cpu->env.pc = reset_info->initial_pc;
  }
-+static void quanta_gbs_i2c_init(NPCM7xxState *soc)
++static void rerandomize_rng_seed(void *opaque)
 +{
-+    /*
++    struct bi_record *rng_seed = opaque;
-+     * i2c-0:
++    qemu_guest_getrandom_nofail((void *)rng_seed->data + 2,
-+     *     pca9546@71
++                                be16_to_cpu(*(uint16_t *)rng_seed->data));
 +     *
 +     * i2c-1:
 +     *     pca9535@24
 +     *     pca9535@20
 +     *     pca9535@21
 +     *     pca9535@22
 +     *     pca9535@23
 +     *     pca9535@25
 +     *     pca9535@26
 +     *
 +     * i2c-2:
 +     *     sbtsi@4c
 +     *
 +     * i2c-5:
 +     *     atmel,24c64@50 mb_fru
 +     *     pca9546@71
 +     *         - channel 0: max31725@54
 +     *         - channel 1: max31725@55
 +     *         - channel 2: max31725@5d
 +     *                      atmel,24c64@51 fan_fru
 +     *         - channel 3: atmel,24c64@52 hsbp_fru
 +     *
 +     * i2c-6:
 +     *     pca9545@73
 +     *
 +     * i2c-7:
 +     *     pca9545@72
 +     *
 +     * i2c-8:
 +     *     adi,adm1272@10
 +     *
 +     * i2c-9:
 +     *     pca9546@71
 +     *         - channel 0: isil,isl68137@60
 +     *         - channel 1: isil,isl68137@61
 +     *         - channel 2: isil,isl68137@63
 +     *         - channel 3: isil,isl68137@45
 +     *
 +     * i2c-10:
 +     *     pca9545@71
 +     *
 +     * i2c-11:
 +     *     pca9545@76
 +     *
 +     * i2c-12:
 +     *     maxim,max34451@4e
 +     *     isil,isl68137@5d
 +     *     isil,isl68137@5e
 +     *
 +     * i2c-14:
 +     *     pca9545@70
 +     */
 +}
 +
- static void npcm750_evb_init(MachineState *machine)
+ static void virt_init(MachineState *machine)
  {
-     NPCM7xxState *soc;
+     M68kCPU *cpu = NULL;
-@@ -XXX,XX +XXX,XX @@ static void quanta_gbs_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ static void virt_init(MachineState *machine)
-     npcm7xx_connect_flash(&soc->fiu[0], 0, "mx66u51235f",
+         BOOTINFO0(param_ptr, BI_LAST);
-                           drive_get(IF_MTD, 0, 0));
+         rom_add_blob_fixed_as("bootinfo", param_blob, param_ptr - param_blob,
+                               parameters_base, cs->as);
-+    quanta_gbs_i2c_init(soc);
+-        reset_info->rng_seed = rom_ptr_for_as(cs->as, parameters_base,
-     npcm7xx_load_kernel(machine, soc);
+-                                              param_ptr - param_blob) +
 -                               (param_rng_seed - param_blob);
 +        qemu_register_reset_nosnapshotload(rerandomize_rng_seed,
 +                            rom_ptr_for_as(cs->as, parameters_base,
 +                                           param_ptr - param_blob) +
 +                            (param_rng_seed - param_blob));
          g_free(param_blob);
      }
  }
 --
-.20.1
+.25.1

-[PULL 06/28] hw/arm: add quanta-gbs-bmc machine
+[PULL 27/30] m68k/q800: do not re-randomize RNG seed on snapshot load
-From: Patrick Venture <venture@google.com>
+From: "Jason A. Donenfeld" <Jason@zx2c4.com>
-Adds initial quanta-gbs-bmc machine support.
+Snapshot loading is supposed to be deterministic, so we shouldn't
 re-randomize the various seeds used.
-Tested: Boots to userspace.
+Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
-Signed-off-by: Patrick Venture <venture@google.com>
+Message-id: 20221025004327.568476-8-Jason@zx2c4.com
 Reviewed-by: Brandon Kim <brandonkim@google.com>
 Reviewed-by: Hao Wu <wuhaotsh@google.com>
 Message-id: 20210608193605.2611114-2-venture@google.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/npcm7xx_boards.c | 33 +++++++++++++++++++++++++++++++++
+ hw/m68k/q800.c | 33 +++++++++++++--------------------
-file changed, 33 insertions(+)
+file changed, 13 insertions(+), 20 deletions(-)
-diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
+diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/npcm7xx_boards.c
+--- a/hw/m68k/q800.c
-+++ b/hw/arm/npcm7xx_boards.c
++++ b/hw/m68k/q800.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static const TypeInfo glue_info = {
+     },
- #define NPCM750_EVB_POWER_ON_STRAPS 0x00001ff7
+ };
- #define QUANTA_GSJ_POWER_ON_STRAPS 0x00001fff
-+#define QUANTA_GBS_POWER_ON_STRAPS 0x000017ff
+-typedef struct {
+-    M68kCPU *cpu;
- static const char npcm7xx_default_bootrom[] = "npcm7xx_bootrom.bin";
+-    struct bi_record *rng_seed;
+-} ResetInfo;
-@@ -XXX,XX +XXX,XX @@ static void quanta_gsj_init(MachineState *machine)
+-
-     npcm7xx_load_kernel(machine, soc);
+ static void main_cpu_reset(void *opaque)
  {
 -    ResetInfo *reset_info = opaque;
 -    M68kCPU *cpu = reset_info->cpu;
 +    M68kCPU *cpu = opaque;
      CPUState *cs = CPU(cpu);
 -    if (reset_info->rng_seed) {
 -        qemu_guest_getrandom_nofail((void *)reset_info->rng_seed->data + 2,
 -            be16_to_cpu(*(uint16_t *)reset_info->rng_seed->data));
 -    }
 -
      cpu_reset(cs);
      cpu->env.aregs[7] = ldl_phys(cs->as, 0);
      cpu->env.pc = ldl_phys(cs->as, 4);
  }
-+static void quanta_gbs_init(MachineState *machine)
++static void rerandomize_rng_seed(void *opaque)
 +{
-+    NPCM7xxState *soc;
++    struct bi_record *rng_seed = opaque;
-+
++    qemu_guest_getrandom_nofail((void *)rng_seed->data + 2,
-+    soc = npcm7xx_create_soc(machine, QUANTA_GBS_POWER_ON_STRAPS);
++                                be16_to_cpu(*(uint16_t *)rng_seed->data));
 +    npcm7xx_connect_dram(soc, machine->ram);
 +    qdev_realize(DEVICE(soc), NULL, &error_fatal);
 +
 +    npcm7xx_load_bootrom(machine, soc);
 +
 +    npcm7xx_connect_flash(&soc->fiu[0], 0, "mx66u51235f",
 +                          drive_get(IF_MTD, 0, 0));
 +
 +    npcm7xx_load_kernel(machine, soc);
 +}
 +
- static void npcm7xx_set_soc_type(NPCM7xxMachineClass *nmc, const char *type)
+ static uint8_t fake_mac_rom[] = {
- {
+, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-     NPCM7xxClass *sc = NPCM7XX_CLASS(object_class_by_name(type));
-@@ -XXX,XX +XXX,XX @@ static void gsj_machine_class_init(ObjectClass *oc, void *data)
+@@ -XXX,XX +XXX,XX @@ static void q800_init(MachineState *machine)
-     mc->default_ram_size = 512 * MiB;
+     NubusBus *nubus;
- };
+     DeviceState *glue;
+     DriveInfo *dinfo;
-+static void gbs_bmc_machine_class_init(ObjectClass *oc, void *data)
+-    ResetInfo *reset_info;
-+{
+     uint8_t rng_seed[32];
-+    NPCM7xxMachineClass *nmc = NPCM7XX_MACHINE_CLASS(oc);
-+    MachineClass *mc = MACHINE_CLASS(oc);
+     linux_boot = (kernel_filename != NULL);
-+
+@@ -XXX,XX +XXX,XX @@ static void q800_init(MachineState *machine)
-+    npcm7xx_set_soc_type(nmc, TYPE_NPCM730);
+         exit(1);
-+
+     }
-+    mc->desc = "Quanta GBS (Cortex-A9)";
-+    mc->init = quanta_gbs_init;
+-    reset_info = g_new0(ResetInfo, 1);
-+    mc->default_ram_size = 1 * GiB;
+-
-+}
+     /* init CPUs */
-+
+     cpu = M68K_CPU(cpu_create(machine->cpu_type));
- static const TypeInfo npcm7xx_machine_types[] = {
+-    reset_info->cpu = cpu;
-     {
+-    qemu_register_reset(main_cpu_reset, reset_info);
-         .name           = TYPE_NPCM7XX_MACHINE,
++    qemu_register_reset(main_cpu_reset, cpu);
-@@ -XXX,XX +XXX,XX @@ static const TypeInfo npcm7xx_machine_types[] = {
-         .name           = MACHINE_TYPE_NAME("quanta-gsj"),
+     /* RAM */
-         .parent         = TYPE_NPCM7XX_MACHINE,
+     memory_region_add_subregion(get_system_memory(), 0, machine->ram);
-         .class_init     = gsj_machine_class_init,
+@@ -XXX,XX +XXX,XX @@ static void q800_init(MachineState *machine)
-+    }, {
+         BOOTINFO0(param_ptr, BI_LAST);
-+        .name           = MACHINE_TYPE_NAME("quanta-gbs-bmc"),
+         rom_add_blob_fixed_as("bootinfo", param_blob, param_ptr - param_blob,
-+        .parent         = TYPE_NPCM7XX_MACHINE,
+                               parameters_base, cs->as);
-+        .class_init     = gbs_bmc_machine_class_init,
+-        reset_info->rng_seed = rom_ptr_for_as(cs->as, parameters_base,
-     },
+-                                              param_ptr - param_blob) +
- };
+-                               (param_rng_seed - param_blob);
++        qemu_register_reset_nosnapshotload(rerandomize_rng_seed,
 +                            rom_ptr_for_as(cs->as, parameters_base,
 +                                           param_ptr - param_blob) +
 +                            (param_rng_seed - param_blob));
          g_free(param_blob);
      } else {
          uint8_t *ptr;
 --
-.20.1
+.25.1

-[PULL 11/28] target/arm: Use acpi_ghes_present() to see if we report ACPI memory errors
+[PULL 28/30] mips/boston: re-randomize rng-seed on reboot
-The virt_is_acpi_enabled() function is specific to the virt board, as
+From: "Jason A. Donenfeld" <Jason@zx2c4.com>
 is the check for its 'ras' property.  Use the new acpi_ghes_present()
 function to check whether we should report memory errors via
 acpi_ghes_record_errors().
-This avoids a link error if QEMU was built without support for the
+When the system reboots, the rng-seed that the FDT has should be
-virt board, and provides a mechanism that can be used by any future
+re-randomized, so that the new boot gets a new seed. Since the FDT is in
-board models that want to add ACPI memory error reporting support
+the ROM region at this point, we add a hook right after the ROM has been
-(they only need to call acpi_ghes_add_fw_cfg()).
+added, so that we have a pointer to that copy of the FDT.
+Cc: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>
+Cc: Paul Burton <paulburton@kernel.org>
+Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
+Message-id: 20221025004327.568476-9-Jason@zx2c4.com
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
-Message-id: 20210603171259.27962-4-peter.maydell@linaro.org
 ---
- target/arm/kvm64.c | 6 +-----
+ hw/mips/boston.c | 3 +++
-file changed, 1 insertion(+), 5 deletions(-)
+file changed, 3 insertions(+)
-diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
+diff --git a/hw/mips/boston.c b/hw/mips/boston.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm64.c
+--- a/hw/mips/boston.c
-+++ b/target/arm/kvm64.c
++++ b/hw/mips/boston.c
-@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
+@@ -XXX,XX +XXX,XX @@
- {
+ #include "sysemu/sysemu.h"
-     ram_addr_t ram_addr;
+ #include "sysemu/qtest.h"
-     hwaddr paddr;
+ #include "sysemu/runstate.h"
--    Object *obj = qdev_get_machine();
++#include "sysemu/reset.h"
--    VirtMachineState *vms = VIRT_MACHINE(obj);
--    bool acpi_enabled = virt_is_acpi_enabled(vms);
+ #include <libfdt.h>
+ #include "qom/object.h"
-     assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
+@@ -XXX,XX +XXX,XX @@ static void boston_mach_init(MachineState *machine)
+             /* Calculate real fdt size after filter */
--    if (acpi_enabled && addr &&
+             dt_size = fdt_totalsize(dtb_load_data);
--            object_property_get_bool(obj, "ras", NULL)) {
+             rom_add_blob_fixed("dtb", dtb_load_data, dt_size, dtb_paddr);
-+    if (acpi_ghes_present() && addr) {
++            qemu_register_reset_nosnapshotload(qemu_fdt_randomize_seeds,
-         ram_addr = qemu_ram_addr_from_host(addr);
++                                rom_ptr(dtb_paddr, dt_size));
-         if (ram_addr != RAM_ADDR_INVALID &&
+         } else {
-             kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
+             /* Try to load file as FIT */
              fit_err = load_fit(&boston_fit_loader, machine->kernel_filename, s);
 --
-.20.1
+.25.1

-[PULL 09/28] hw/acpi: Provide stub version of acpi_ghes_record_errors()
+[PULL 29/30] openrisc: re-randomize rng-seed on reboot
-Generic code in target/arm wants to call acpi_ghes_record_errors();
+From: "Jason A. Donenfeld" <Jason@zx2c4.com>
 provide a stub version so that we don't fail to link when
 CONFIG_ACPI_APEI is not set. This requires us to add a new
 ghes-stub.c file to contain it and the meson.build mechanics
 to use it when appropriate.
+When the system reboots, the rng-seed that the FDT has should be
+re-randomized, so that the new boot gets a new seed. Since the FDT is in
+the ROM region at this point, we add a hook right after the ROM has been
+added, so that we have a pointer to that copy of the FDT.
+Cc: Stafford Horne <shorne@gmail.com>
+Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
+Message-id: 20221025004327.568476-11-Jason@zx2c4.com
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
-Message-id: 20210603171259.27962-2-peter.maydell@linaro.org
 ---
- hw/acpi/ghes-stub.c | 17 +++++++++++++++++
+ hw/openrisc/boot.c | 3 +++
- hw/acpi/meson.build |  6 +++---
+file changed, 3 insertions(+)
 files changed, 20 insertions(+), 3 deletions(-)
  create mode 100644 hw/acpi/ghes-stub.c
-diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
+diff --git a/hw/openrisc/boot.c b/hw/openrisc/boot.c
-new file mode 100644
+index XXXXXXX..XXXXXXX 100644
-index XXXXXXX..XXXXXXX
+--- a/hw/openrisc/boot.c
---- /dev/null
++++ b/hw/openrisc/boot.c
 +++ b/hw/acpi/ghes-stub.c
 @@ -XXX,XX +XXX,XX @@
-+/*
+ #include "hw/openrisc/boot.h"
-+ * Support for generating APEI tables and recording CPER for Guests:
+ #include "sysemu/device_tree.h"
-+ * stub functions.
+ #include "sysemu/qtest.h"
-+ *
++#include "sysemu/reset.h"
-+ * Copyright (c) 2021 Linaro, Ltd
-+ *
+ #include <libfdt.h>
-+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
-+ * See the COPYING file in the top-level directory.
+@@ -XXX,XX +XXX,XX @@ uint32_t openrisc_load_fdt(void *fdt, hwaddr load_start,
-+ */
-+
+     rom_add_blob_fixed_as("fdt", fdt, fdtsize, fdt_addr,
-+#include "qemu/osdep.h"
+                           &address_space_memory);
-+#include "hw/acpi/ghes.h"
++    qemu_register_reset_nosnapshotload(qemu_fdt_randomize_seeds,
-+
++                        rom_ptr_for_as(&address_space_memory, fdt_addr, fdtsize));
-+int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
-+{
+     return fdt_addr;
-+    return -1;
+ }
 +}
 diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/acpi/meson.build
 +++ b/hw/acpi/meson.build
@@ -XXX,XX +XXX,XX @@ acpi_ss.add(when: 'CONFIG_ACPI_PCI', if_true: files('pci.c'))
  acpi_ss.add(when: 'CONFIG_ACPI_VMGENID', if_true: files('vmgenid.c'))
  acpi_ss.add(when: 'CONFIG_ACPI_HW_REDUCED', if_true: files('generic_event_device.c'))
  acpi_ss.add(when: 'CONFIG_ACPI_HMAT', if_true: files('hmat.c'))
 -acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'))
 +acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'), if_false:('ghes-stub.c'))
  acpi_ss.add(when: 'CONFIG_ACPI_X86', if_true: files('core.c', 'piix4.c', 'pcihp.c'), if_false: files('acpi-stub.c'))
  acpi_ss.add(when: 'CONFIG_ACPI_X86_ICH', if_true: files('ich9.c', 'tco.c'))
  acpi_ss.add(when: 'CONFIG_IPMI', if_true: files('ipmi.c'), if_false: files('ipmi-stub.c'))
  acpi_ss.add(when: 'CONFIG_PC', if_false: files('acpi-x86-stub.c'))
  acpi_ss.add(when: 'CONFIG_TPM', if_true: files('tpm.c'))
 -softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c'))
 +softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c'))
  softmmu_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
  softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('acpi-stub.c', 'aml-build-stub.c',
 -                                                  'acpi-x86-stub.c', 'ipmi-stub.c'))
 +                                                  'acpi-x86-stub.c', 'ipmi-stub.c', 'ghes-stub.c'))
 --
-.20.1
+.25.1

-[PULL 08/28] hw/intc/armv7m_nvic: Remove stale comment
+[PULL 30/30] rx: re-randomize rng-seed on reboot
-In commit da6d674e509f0939b we split the NVIC code out from the GIC.
+From: "Jason A. Donenfeld" <Jason@zx2c4.com>
 This allowed us to specify the NVIC's default value for the num-irq
 property (64) in the usual way in its property list, and we deleted
 the previous hack where we updated the value in the state struct in
 the instance init function.  Remove a stale comment about that hack
 which we forgot to delete at that time.
+When the system reboots, the rng-seed that the FDT has should be
+re-randomized, so that the new boot gets a new seed. Since the FDT is in
+the ROM region at this point, we add a hook right after the ROM has been
+added, so that we have a pointer to that copy of the FDT.
+Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
+Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
+Message-id: 20221025004327.568476-12-Jason@zx2c4.com
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210614161243.14211-1-peter.maydell@linaro.org
 ---
- hw/intc/armv7m_nvic.c | 6 ------
+ hw/rx/rx-gdbsim.c | 3 +++
-file changed, 6 deletions(-)
+file changed, 3 insertions(+)
-diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
+diff --git a/hw/rx/rx-gdbsim.c b/hw/rx/rx-gdbsim.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/armv7m_nvic.c
+--- a/hw/rx/rx-gdbsim.c
-+++ b/hw/intc/armv7m_nvic.c
++++ b/hw/rx/rx-gdbsim.c
-@@ -XXX,XX +XXX,XX @@ static void armv7m_nvic_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@
+ #include "hw/rx/rx62n.h"
- static void armv7m_nvic_instance_init(Object *obj)
+ #include "sysemu/qtest.h"
- {
+ #include "sysemu/device_tree.h"
--    /* We have a different default value for the num-irq property
++#include "sysemu/reset.h"
--     * than our superclass. This function runs after qdev init
+ #include "hw/boards.h"
--     * has set the defaults from the Property array and before
+ #include "qom/object.h"
--     * any user-specified property setting, so just modify the
--     * value in the GICState struct.
+@@ -XXX,XX +XXX,XX @@ static void rx_gdbsim_init(MachineState *machine)
--     */
+             dtb_offset = ROUND_DOWN(machine->ram_size - dtb_size, 16);
-     DeviceState *dev = DEVICE(obj);
+             rom_add_blob_fixed("dtb", dtb, dtb_size,
-     NVICState *nvic = NVIC(obj);
+                                SDRAM_BASE + dtb_offset);
-     SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
++            qemu_register_reset_nosnapshotload(qemu_fdt_randomize_seeds,
 +                                rom_ptr(SDRAM_BASE + dtb_offset, dtb_size));
              /* Set dtb address to R1 */
              RX_CPU(first_cpu)->env.regs[1] = SDRAM_BASE + dtb_offset;
          }
 --
-.20.1
+.25.1

The following changes since commit 1ea06abceec61b6f3ab33dadb0510b6e09fb61e2:

Merge remote-tracking branch 'remotes/berrange-gitlab/tags/misc-fixes-pull-request' into staging (2021-06-14 15:59:13 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210615

for you to fetch changes up to c611c956c7fdce651e30687b1f5d19b4cab78b6a:

include/qemu/int128.h: Add function to create Int128 from int64_t (2021-06-15 16:18:50 +0100)

----------------------------------------------------------------
target-arm queue:
 * hw/intc/arm_gicv3_cpuif: Tolerate spurious EOIR writes
 * handle some UNALLOCATED decode cases correctly rather
   than asserting
 * hw: virt: consider hw_compat_6_0
 * hw/arm: add quanta-gbs-bmc machine
 * hw/intc/armv7m_nvic: Remove stale comment
 * arm, acpi: Remove dependency on presence of 'virt' board
 * target/arm: Fix mte page crossing test
 * hw/arm: quanta-q71l add pca954x muxes
 * target/arm: First few parts of MVE support

----------------------------------------------------------------
Heinrich Schuchardt (1):
      hw: virt: consider hw_compat_6_0

Jean-Philippe Brucker (1):
      hw/intc/arm_gicv3_cpuif: Tolerate spurious EOIR writes

Patrick Venture (5):
      hw/arm: add quanta-gbs-bmc machine
      hw/arm: quanta-gbs-bmc add i2c comments
      hw/arm: gsj add i2c comments
      hw/arm: gsj add pca9548
      hw/arm: quanta-q71l add pca954x muxes

Peter Maydell (17):
      hw/intc/armv7m_nvic: Remove stale comment
      hw/acpi: Provide stub version of acpi_ghes_record_errors()
      hw/acpi: Provide function acpi_ghes_present()
      target/arm: Use acpi_ghes_present() to see if we report ACPI memory errors
      target/arm: Provide and use H8 and H1_8 macros
      target/arm: Enable FPSCR.QC bit for MVE
      target/arm: Handle VPR semantics in existing code
      target/arm: Add handling for PSR.ECI/ICI
      target/arm: Let vfp_access_check() handle late NOCP checks
      target/arm: Implement MVE LCTP
      target/arm: Implement MVE WLSTP insn
      target/arm: Implement MVE DLSTP
      target/arm: Implement MVE LETP insn
      target/arm: Add framework for MVE decode
      target/arm: Move expand_pred_b() data to vec_helper.c
      bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations
      include/qemu/int128.h: Add function to create Int128 from int64_t

Richard Henderson (4):
      target/arm: Diagnose UNALLOCATED in disas_simd_two_reg_misc_fp16
      target/arm: Remove fprintf from disas_simd_mod_imm
      target/arm: Diagnose UNALLOCATED in disas_simd_three_reg_same_fp16
      target/arm: Fix mte page crossing test

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

Commit 382c7160d1cd ("hw/intc/arm_gicv3_cpuif: Fix EOIR write access
check logic") added an assert_not_reached() if the guest writes the EOIR
register while no interrupt is active.

It turns out some software does this: EDK2, in
GicV3ExitBootServicesEvent(), unconditionally write EOIR for all
interrupts that it manages. This now causes QEMU to abort when running
UEFI on a VM with GICv3. Although it is UNPREDICTABLE behavior and EDK2
does need fixing, the punishment seems a little harsh, especially since
icc_eoir_write() already tolerates writes of nonexistent interrupt
numbers. Display a guest error and tolerate spurious EOIR writes.

Fixes: 382c7160d1cd ("hw/intc/arm_gicv3_cpuif: Fix EOIR write access check logic")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210604130352.1887560-1-jean-philippe@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/intc/arm_gicv3_cpuif.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/osdep.h"
 #include "qemu/bitops.h"
+#include "qemu/log.h"
 #include "qemu/main-loop.h"
 #include "trace.h"
 #include "gicv3_internal.h"
@@ -XXX,XX +XXX,XX @@ static void icc_eoir_write(CPUARMState *env, const ARMCPRegInfo *ri,
         }
         break;
     default:
-        g_assert_not_reached();
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: IRQ %d isn't active\n", __func__, irq);
+        return;
     }
 
     icc_drop_prio(cs, grp);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This fprintf+assert has been in place since the beginning.
It is prior to the fp_access_check, so we're still good to
raise sigill here.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/381
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210604183506.916654-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

This fprintf+assert has been in place since the beginning.
It is after to the fp_access_check, so we need to move the
check up.  Fold that in to the pairwise filter.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210604183506.916654-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 82 +++++++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 32 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same(DisasContext *s, uint32_t insn)
  */
 static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
 {
-    int opcode, fpopcode;
-    int is_q, u, a, rm, rn, rd;
-    int datasize, elements;
-    int pass;
+    int opcode = extract32(insn, 11, 3);
+    int u = extract32(insn, 29, 1);
+    int a = extract32(insn, 23, 1);
+    int is_q = extract32(insn, 30, 1);
+    int rm = extract32(insn, 16, 5);
+    int rn = extract32(insn, 5, 5);
+    int rd = extract32(insn, 0, 5);
+    /*
+     * For these floating point ops, the U, a and opcode bits
+     * together indicate the operation.
+     */
+    int fpopcode = opcode | (a << 3) | (u << 4);
+    int datasize = is_q ? 128 : 64;
+    int elements = datasize / 16;
+    bool pairwise;
     TCGv_ptr fpst;
-    bool pairwise = false;
+    int pass;
+
+    switch (fpopcode) {
+    case 0x0: /* FMAXNM */
+    case 0x1: /* FMLA */
+    case 0x2: /* FADD */
+    case 0x3: /* FMULX */
+    case 0x4: /* FCMEQ */
+    case 0x6: /* FMAX */
+    case 0x7: /* FRECPS */
+    case 0x8: /* FMINNM */
+    case 0x9: /* FMLS */
+    case 0xa: /* FSUB */
+    case 0xe: /* FMIN */
+    case 0xf: /* FRSQRTS */
+    case 0x13: /* FMUL */
+    case 0x14: /* FCMGE */
+    case 0x15: /* FACGE */
+    case 0x17: /* FDIV */
+    case 0x1a: /* FABD */
+    case 0x1c: /* FCMGT */
+    case 0x1d: /* FACGT */
+        pairwise = false;
+        break;
+    case 0x10: /* FMAXNMP */
+    case 0x12: /* FADDP */
+    case 0x16: /* FMAXP */
+    case 0x18: /* FMINNMP */
+    case 0x1e: /* FMINP */
+        pairwise = true;
+        break;
+    default:
+        unallocated_encoding(s);
+        return;
+    }
 
     if (!dc_isar_feature(aa64_fp16, s)) {
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
         return;
     }
 
-    /* For these floating point ops, the U, a and opcode bits
-     * together indicate the operation.
-     */
-    opcode = extract32(insn, 11, 3);
-    u = extract32(insn, 29, 1);
-    a = extract32(insn, 23, 1);
-    is_q = extract32(insn, 30, 1);
-    rm = extract32(insn, 16, 5);
-    rn = extract32(insn, 5, 5);
-    rd = extract32(insn, 0, 5);
-
-    fpopcode = opcode | (a << 3) |  (u << 4);
-    datasize = is_q ? 128 : 64;
-    elements = datasize / 16;
-
-    switch (fpopcode) {
-    case 0x10: /* FMAXNMP */
-    case 0x12: /* FADDP */
-    case 0x16: /* FMAXP */
-    case 0x18: /* FMINNMP */
-    case 0x1e: /* FMINP */
-        pairwise = true;
-        break;
-    }
-
     fpst = fpstatus_ptr(FPST_FPCR_F16);
 
     if (pairwise) {
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
                 gen_helper_advsimd_acgt_f16(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
             default:
-                fprintf(stderr, "%s: insn 0x%04x, fpop 0x%2x @ 0x%" PRIx64 "\n",
-                        __func__, insn, fpopcode, s->pc_curr);
                 g_assert_not_reached();
             }
 
-- 
2.20.1

From: Patrick Venture <venture@google.com>

Adds initial quanta-gbs-bmc machine support.

Tested: Boots to userspace.
Signed-off-by: Patrick Venture <venture@google.com>
Reviewed-by: Brandon Kim <brandonkim@google.com>
Reviewed-by: Hao Wu <wuhaotsh@google.com>
Message-id: 20210608193605.2611114-2-venture@google.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/npcm7xx_boards.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/npcm7xx_boards.c
+++ b/hw/arm/npcm7xx_boards.c
@@ -XXX,XX +XXX,XX @@
 
 #define NPCM750_EVB_POWER_ON_STRAPS 0x00001ff7
 #define QUANTA_GSJ_POWER_ON_STRAPS 0x00001fff
+#define QUANTA_GBS_POWER_ON_STRAPS 0x000017ff
 
 static const char npcm7xx_default_bootrom[] = "npcm7xx_bootrom.bin";
 
@@ -XXX,XX +XXX,XX @@ static void quanta_gsj_init(MachineState *machine)
     npcm7xx_load_kernel(machine, soc);
 }
 
+static void quanta_gbs_init(MachineState *machine)
+{
+    NPCM7xxState *soc;
+
+    soc = npcm7xx_create_soc(machine, QUANTA_GBS_POWER_ON_STRAPS);
+    npcm7xx_connect_dram(soc, machine->ram);
+    qdev_realize(DEVICE(soc), NULL, &error_fatal);
+
+    npcm7xx_load_bootrom(machine, soc);
+
+    npcm7xx_connect_flash(&soc->fiu[0], 0, "mx66u51235f",
+                          drive_get(IF_MTD, 0, 0));
+
+    npcm7xx_load_kernel(machine, soc);
+}
+
 static void npcm7xx_set_soc_type(NPCM7xxMachineClass *nmc, const char *type)
 {
     NPCM7xxClass *sc = NPCM7XX_CLASS(object_class_by_name(type));
@@ -XXX,XX +XXX,XX @@ static void gsj_machine_class_init(ObjectClass *oc, void *data)
     mc->default_ram_size = 512 * MiB;
 };
 
+static void gbs_bmc_machine_class_init(ObjectClass *oc, void *data)
+{
+    NPCM7xxMachineClass *nmc = NPCM7XX_MACHINE_CLASS(oc);
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    npcm7xx_set_soc_type(nmc, TYPE_NPCM730);
+
+    mc->desc = "Quanta GBS (Cortex-A9)";
+    mc->init = quanta_gbs_init;
+    mc->default_ram_size = 1 * GiB;
+}
+
 static const TypeInfo npcm7xx_machine_types[] = {
     {
         .name           = TYPE_NPCM7XX_MACHINE,
@@ -XXX,XX +XXX,XX @@ static const TypeInfo npcm7xx_machine_types[] = {
         .name           = MACHINE_TYPE_NAME("quanta-gsj"),
         .parent         = TYPE_NPCM7XX_MACHINE,
         .class_init     = gsj_machine_class_init,
+    }, {
+        .name           = MACHINE_TYPE_NAME("quanta-gbs-bmc"),
+        .parent         = TYPE_NPCM7XX_MACHINE,
+        .class_init     = gbs_bmc_machine_class_init,
     },
 };
 
-- 
2.20.1

From: Patrick Venture <venture@google.com>

Add a comment and i2c method that describes the board layout.

Tested: firmware booted to userspace.
Signed-off-by: Patrick Venture <venture@google.com>
Reviewed-by: Brandon Kim <brandonkim@google.com>
Reviewed-by: Hao Wu <wuhaotsh@google.com>
Message-id: 20210608193605.2611114-3-venture@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/npcm7xx_boards.c | 60 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/npcm7xx_boards.c
+++ b/hw/arm/npcm7xx_boards.c
@@ -XXX,XX +XXX,XX @@ static void quanta_gsj_fan_init(NPCM7xxMachine *machine, NPCM7xxState *soc)
     npcm7xx_connect_pwm_fan(soc, &splitter[2], 0x05, 1);
 }
 
+static void quanta_gbs_i2c_init(NPCM7xxState *soc)
+{
+    /*
+     * i2c-0:
+     *     pca9546@71
+     *
+     * i2c-1:
+     *     pca9535@24
+     *     pca9535@20
+     *     pca9535@21
+     *     pca9535@22
+     *     pca9535@23
+     *     pca9535@25
+     *     pca9535@26
+     *
+     * i2c-2:
+     *     sbtsi@4c
+     *
+     * i2c-5:
+     *     atmel,24c64@50 mb_fru
+     *     pca9546@71
+     *         - channel 0: max31725@54
+     *         - channel 1: max31725@55
+     *         - channel 2: max31725@5d
+     *                      atmel,24c64@51 fan_fru
+     *         - channel 3: atmel,24c64@52 hsbp_fru
+     *
+     * i2c-6:
+     *     pca9545@73
+     *
+     * i2c-7:
+     *     pca9545@72
+     *
+     * i2c-8:
+     *     adi,adm1272@10
+     *
+     * i2c-9:
+     *     pca9546@71
+     *         - channel 0: isil,isl68137@60
+     *         - channel 1: isil,isl68137@61
+     *         - channel 2: isil,isl68137@63
+     *         - channel 3: isil,isl68137@45
+     *
+     * i2c-10:
+     *     pca9545@71
+     *
+     * i2c-11:
+     *     pca9545@76
+     *
+     * i2c-12:
+     *     maxim,max34451@4e
+     *     isil,isl68137@5d
+     *     isil,isl68137@5e
+     *
+     * i2c-14:
+     *     pca9545@70
+     */
+}
+
 static void npcm750_evb_init(MachineState *machine)
 {
     NPCM7xxState *soc;
@@ -XXX,XX +XXX,XX @@ static void quanta_gbs_init(MachineState *machine)
     npcm7xx_connect_flash(&soc->fiu[0], 0, "mx66u51235f",
                           drive_get(IF_MTD, 0, 0));
 
+    quanta_gbs_i2c_init(soc);
     npcm7xx_load_kernel(machine, soc);
 }
 
-- 
2.20.1

In commit da6d674e509f0939b we split the NVIC code out from the GIC.
This allowed us to specify the NVIC's default value for the num-irq
property (64) in the usual way in its property list, and we deleted
the previous hack where we updated the value in the state struct in
the instance init function.  Remove a stale comment about that hack
which we forgot to delete at that time.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614161243.14211-1-peter.maydell@linaro.org
---
 hw/intc/armv7m_nvic.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ static void armv7m_nvic_realize(DeviceState *dev, Error **errp)
 
 static void armv7m_nvic_instance_init(Object *obj)
 {
-    /* We have a different default value for the num-irq property
-     * than our superclass. This function runs after qdev init
-     * has set the defaults from the Property array and before
-     * any user-specified property setting, so just modify the
-     * value in the GICState struct.
-     */
     DeviceState *dev = DEVICE(obj);
     NVICState *nvic = NVIC(obj);
     SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
-- 
2.20.1

Generic code in target/arm wants to call acpi_ghes_record_errors();
provide a stub version so that we don't fail to link when
CONFIG_ACPI_APEI is not set. This requires us to add a new
ghes-stub.c file to contain it and the meson.build mechanics
to use it when appropriate.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
Message-id: 20210603171259.27962-2-peter.maydell@linaro.org
---
 hw/acpi/ghes-stub.c | 17 +++++++++++++++++
 hw/acpi/meson.build |  6 +++---
 2 files changed, 20 insertions(+), 3 deletions(-)
 create mode 100644 hw/acpi/ghes-stub.c

diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/acpi/ghes-stub.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Support for generating APEI tables and recording CPER for Guests:
+ * stub functions.
+ *
+ * Copyright (c) 2021 Linaro, Ltd
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/acpi/ghes.h"
+
+int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+{
+    return -1;
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -XXX,XX +XXX,XX @@ acpi_ss.add(when: 'CONFIG_ACPI_PCI', if_true: files('pci.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_VMGENID', if_true: files('vmgenid.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_HW_REDUCED', if_true: files('generic_event_device.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_HMAT', if_true: files('hmat.c'))
-acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'))
+acpi_ss.add(when: 'CONFIG_ACPI_APEI', if_true: files('ghes.c'), if_false:('ghes-stub.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_X86', if_true: files('core.c', 'piix4.c', 'pcihp.c'), if_false: files('acpi-stub.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_X86_ICH', if_true: files('ich9.c', 'tco.c'))
 acpi_ss.add(when: 'CONFIG_IPMI', if_true: files('ipmi.c'), if_false: files('ipmi-stub.c'))
 acpi_ss.add(when: 'CONFIG_PC', if_false: files('acpi-x86-stub.c'))
 acpi_ss.add(when: 'CONFIG_TPM', if_true: files('tpm.c'))
-softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c'))
+softmmu_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c'))
 softmmu_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
 softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('acpi-stub.c', 'aml-build-stub.c',
-                                                  'acpi-x86-stub.c', 'ipmi-stub.c'))
+                                                  'acpi-x86-stub.c', 'ipmi-stub.c', 'ghes-stub.c'))
-- 
2.20.1

Allow code elsewhere in the system to check whether the ACPI GHES
table is present, so it can determine whether it is OK to try to
record an error by calling acpi_ghes_record_errors().

(We don't need to migrate the new 'present' field in AcpiGhesState,
because it is set once at system initialization and doesn't change.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
Message-id: 20210603171259.27962-3-peter.maydell@linaro.org
---
 include/hw/acpi/ghes.h |  9 +++++++++
 hw/acpi/ghes-stub.c    |  5 +++++
 hw/acpi/ghes.c         | 17 +++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -XXX,XX +XXX,XX @@ enum {
 
 typedef struct AcpiGhesState {
     uint64_t ghes_addr_le;
+    bool present; /* True if GHES is present at all on this board */
 } AcpiGhesState;
 
 void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
@@ -XXX,XX +XXX,XX @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
 void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
 int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
+
+/**
+ * acpi_ghes_present: Report whether ACPI GHES table is present
+ *
+ * Returns: true if the system has an ACPI GHES table and it is
+ * safe to call acpi_ghes_record_errors() to record a memory error.
+ */
+bool acpi_ghes_present(void);
 #endif
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
 {
     return -1;
 }
+
+bool acpi_ghes_present(void)
+{
+    return false;
+}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -XXX,XX +XXX,XX @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
     /* Create a read-write fw_cfg file for Address */
     fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
         NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
+
+    ags->present = true;
 }
 
 int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
@@ -XXX,XX +XXX,XX @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
 
     return ret;
 }
+
+bool acpi_ghes_present(void)
+{
+    AcpiGedState *acpi_ged_state;
+    AcpiGhesState *ags;
+
+    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
+                                                       NULL));
+
+    if (!acpi_ged_state) {
+        return false;
+    }
+    ags = &acpi_ged_state->ghes_state;
+    return ags->present;
+}
-- 
2.20.1

The virt_is_acpi_enabled() function is specific to the virt board, as
is the check for its 'ras' property.  Use the new acpi_ghes_present()
function to check whether we should report memory errors via
acpi_ghes_record_errors().

This avoids a link error if QEMU was built without support for the
virt board, and provides a mechanism that can be used by any future
board models that want to add ACPI memory error reporting support
(they only need to call acpi_ghes_add_fw_cfg()).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
Message-id: 20210603171259.27962-4-peter.maydell@linaro.org
---
 target/arm/kvm64.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
 {
     ram_addr_t ram_addr;
     hwaddr paddr;
-    Object *obj = qdev_get_machine();
-    VirtMachineState *vms = VIRT_MACHINE(obj);
-    bool acpi_enabled = virt_is_acpi_enabled(vms);
 
     assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
 
-    if (acpi_enabled && addr &&
-            object_property_get_bool(obj, "ras", NULL)) {
+    if (acpi_ghes_present() && addr) {
         ram_addr = qemu_ram_addr_from_host(addr);
         if (ram_addr != RAM_ADDR_INVALID &&
             kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The test was off-by-one, because tag_last points to the
last byte of the tag to check, thus tag_last - prev_page
will equal TARGET_PAGE_SIZE when we use the first byte
of the next page.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/403
Reported-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210612195707.840217-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/mte_helper.c           |  2 +-
 tests/tcg/aarch64/mte-7.c         | 31 +++++++++++++++++++++++++++++++
 tests/tcg/aarch64/Makefile.target |  2 +-
 3 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100644 tests/tcg/aarch64/mte-7.c

diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -XXX,XX +XXX,XX @@ static int mte_probe_int(CPUARMState *env, uint32_t desc, uint64_t ptr,
     prev_page = ptr & TARGET_PAGE_MASK;
     next_page = prev_page + TARGET_PAGE_SIZE;
 
-    if (likely(tag_last - prev_page <= TARGET_PAGE_SIZE)) {
+    if (likely(tag_last - prev_page < TARGET_PAGE_SIZE)) {
         /* Memory access stays on one page. */
         tag_size = ((tag_byte_last - tag_byte_first) / (2 * TAG_GRANULE)) + 1;
         mem1 = allocation_tag_mem(env, mmu_idx, ptr, type, sizem1 + 1,
diff --git a/tests/tcg/aarch64/mte-7.c b/tests/tcg/aarch64/mte-7.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/mte-7.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Memory tagging, unaligned access crossing pages.
+ * https://gitlab.com/qemu-project/qemu/-/issues/403
+ *
+ * Copyright (c) 2021 Linaro Ltd
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "mte.h"
+
+int main(int ac, char **av)
+{
+    void *p;
+
+    enable_mte(PR_MTE_TCF_SYNC);
+    p = alloc_mte_mem(2 * 0x1000);
+
+    /* Tag the pointer. */
+    p = (void *)((unsigned long)p | (1ul << 56));
+
+    /* Store tag in sequential granules. */
+    asm("stg %0, [%0]" : : "r"(p + 0x0ff0));
+    asm("stg %0, [%0]" : : "r"(p + 0x1000));
+
+    /*
+     * Perform an unaligned store with tag 1 crossing the pages.
+     * Failure dies with SIGSEGV.
+     */
+    asm("str %0, [%0]" : : "r"(p + 0x0ffc));
+    return 0;
+}
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -XXX,XX +XXX,XX @@ AARCH64_TESTS += bti-2
 
 # MTE Tests
 ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_ARMV8_MTE),)
-AARCH64_TESTS += mte-1 mte-2 mte-3 mte-4 mte-5 mte-6
+AARCH64_TESTS += mte-1 mte-2 mte-3 mte-4 mte-5 mte-6 mte-7
 mte-%: CFLAGS += -march=armv8.5-a+memtag
 endif
 
-- 
2.20.1

From: Patrick Venture <venture@google.com>

Adds comments to the board init to identify missing i2c devices.

Signed-off-by: Patrick Venture <venture@google.com>
Reviewed-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-id: 20210608202522.2677850-2-venture@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/npcm7xx_boards.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/npcm7xx_boards.c
+++ b/hw/arm/npcm7xx_boards.c
@@ -XXX,XX +XXX,XX @@ static void quanta_gsj_i2c_init(NPCM7xxState *soc)
     at24c_eeprom_init(soc, 9, 0x55, 8192);
     at24c_eeprom_init(soc, 10, 0x55, 8192);
 
-    /* TODO: Add additional i2c devices. */
+    /*
+     * i2c-11:
+     * - power-brick@36: delta,dps800
+     * - hotswap@15: ti,lm5066i
+     */
+
+    /*
+     * i2c-12:
+     * - ucd90160@6b
+     */
+
+    /*
+     * i2c-15:
+     * - pca9548@75
+     */
 }
 
 static void quanta_gsj_fan_init(NPCM7xxMachine *machine, NPCM7xxState *soc)
-- 
2.20.1

From: Patrick Venture <venture@google.com>

Tested: Quanta-gsj firmware booted.

i2c /dev entries driver
I2C init bus 1 freq 100000
I2C init bus 2 freq 100000
I2C init bus 3 freq 100000
I2C init bus 4 freq 100000
I2C init bus 8 freq 100000
I2C init bus 9 freq 100000
at24 9-0055: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
I2C init bus 10 freq 100000
at24 10-0055: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
I2C init bus 12 freq 100000
I2C init bus 15 freq 100000
i2c i2c-15: Added multiplexed i2c bus 16
i2c i2c-15: Added multiplexed i2c bus 17
i2c i2c-15: Added multiplexed i2c bus 18
i2c i2c-15: Added multiplexed i2c bus 19
i2c i2c-15: Added multiplexed i2c bus 20
i2c i2c-15: Added multiplexed i2c bus 21
i2c i2c-15: Added multiplexed i2c bus 22
i2c i2c-15: Added multiplexed i2c bus 23
pca954x 15-0075: registered 8 multiplexed busses for I2C switch pca9548

Signed-off-by: Patrick Venture <venture@google.com>
Reviewed-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-id: 20210608202522.2677850-3-venture@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/npcm7xx_boards.c | 6 ++----
 hw/arm/Kconfig          | 1 +
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/npcm7xx_boards.c
+++ b/hw/arm/npcm7xx_boards.c
@@ -XXX,XX +XXX,XX @@
 
 #include "hw/arm/npcm7xx.h"
 #include "hw/core/cpu.h"
+#include "hw/i2c/i2c_mux_pca954x.h"
 #include "hw/i2c/smbus_eeprom.h"
 #include "hw/loader.h"
 #include "hw/qdev-core.h"
@@ -XXX,XX +XXX,XX @@ static void quanta_gsj_i2c_init(NPCM7xxState *soc)
      * - ucd90160@6b
      */
 
-    /*
-     * i2c-15:
-     * - pca9548@75
-     */
+    i2c_slave_create_simple(npcm7xx_i2c_get_bus(soc, 15), "pca9548", 0x75);
 }
 
 static void quanta_gsj_fan_init(NPCM7xxMachine *machine, NPCM7xxState *soc)
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -XXX,XX +XXX,XX @@ config NPCM7XX
     select SERIAL
     select SSI
     select UNIMP
+    select PCA954X
 
 config FSL_IMX25
     bool
-- 
2.20.1

From: Patrick Venture <venture@google.com>

Adds the pca954x muxes expected.

Tested: Booted quanta-q71l image to userspace.
Signed-off-by: Patrick Venture <venture@google.com>
Reviewed-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20210608202522.2677850-4-venture@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/aspeed.c | 11 ++++++++---
 hw/arm/Kconfig  |  1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/arm/boot.h"
 #include "hw/arm/aspeed.h"
 #include "hw/arm/aspeed_soc.h"
+#include "hw/i2c/i2c_mux_pca954x.h"
 #include "hw/i2c/smbus_eeprom.h"
 #include "hw/misc/pca9552.h"
 #include "hw/misc/tmp105.h"
@@ -XXX,XX +XXX,XX @@ static void quanta_q71l_bmc_i2c_init(AspeedMachineState *bmc)
     /* TODO: i2c-1: Add Frontpanel FRU eeprom@57 24c64 */
     /* TODO: Add Memory Riser i2c mux and eeproms. */
 
-    /* TODO: i2c-2: pca9546@74 */
-    /* TODO: i2c-2: pca9548@77 */
+    i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 2), "pca9546", 0x74);
+    i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 2), "pca9548", 0x77);
+
     /* TODO: i2c-3: Add BIOS FRU eeprom@56 24c64 */
-    /* TODO: i2c-7: Add pca9546@70 */
+
+    /* i2c-7 */
+    i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 7), "pca9546", 0x70);
     /*        - i2c@0: pmbus@59 */
     /*        - i2c@1: pmbus@58 */
     /*        - i2c@2: pmbus@58 */
     /*        - i2c@3: pmbus@59 */
+
     /* TODO: i2c-7: Add PDB FRU eeprom@52 */
     /* TODO: i2c-8: Add BMC FRU eeprom@50 */
 }
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -XXX,XX +XXX,XX @@ config ASPEED_SOC
     select PCA9552
     select SERIAL
     select SMBUS_EEPROM
+    select PCA954X
     select SSI
     select SSI_M25P80
     select TMP105
-- 
2.20.1

Currently we provide Hn and H1_n macros for accessing the correct
data within arrays of vector elements of size 1, 2 and 4, accounting
for host endianness.  We don't provide any macros for elements of
size 8 because there the host endianness doesn't matter.  However,
this does result in awkwardness where we need to pass empty arguments
to macros, because checkpatch complains about them.  The empty
argument is a little confusing for humans to read as well.

Add H8() and H1_8() macros and use them where we were previously
passing empty arguments to macros.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-2-peter.maydell@linaro.org
Message-id: 20210610132505.5827-1-peter.maydell@linaro.org
---
 target/arm/vec_internal.h |   8 +-
 target/arm/sve_helper.c   | 258 +++++++++++++++++++-------------------
 target/arm/vec_helper.c   |  14 +--
 3 files changed, 143 insertions(+), 137 deletions(-)

diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_internal.h
+++ b/target/arm/vec_internal.h
@@ -XXX,XX +XXX,XX @@
 #define H2(x)   (x)
 #define H4(x)   (x)
 #endif
-
+/*
+ * Access to 64-bit elements isn't host-endian dependent; we provide H8
+ * and H1_8 so that when a function is being generated from a macro we
+ * can pass these rather than an empty macro argument, for clarity.
+ */
+#define H8(x)   (x)
+#define H1_8(x) (x)
 
 static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
 {
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg,               \
 
 DO_ZPZZ_PAIR_FP(sve2_faddp_zpzz_h, float16, H1_2, float16_add)
 DO_ZPZZ_PAIR_FP(sve2_faddp_zpzz_s, float32, H1_4, float32_add)
-DO_ZPZZ_PAIR_FP(sve2_faddp_zpzz_d, float64,     , float64_add)
+DO_ZPZZ_PAIR_FP(sve2_faddp_zpzz_d, float64, H1_8, float64_add)
 
 DO_ZPZZ_PAIR_FP(sve2_fmaxnmp_zpzz_h, float16, H1_2, float16_maxnum)
 DO_ZPZZ_PAIR_FP(sve2_fmaxnmp_zpzz_s, float32, H1_4, float32_maxnum)
-DO_ZPZZ_PAIR_FP(sve2_fmaxnmp_zpzz_d, float64,     , float64_maxnum)
+DO_ZPZZ_PAIR_FP(sve2_fmaxnmp_zpzz_d, float64, H1_8, float64_maxnum)
 
 DO_ZPZZ_PAIR_FP(sve2_fminnmp_zpzz_h, float16, H1_2, float16_minnum)
 DO_ZPZZ_PAIR_FP(sve2_fminnmp_zpzz_s, float32, H1_4, float32_minnum)
-DO_ZPZZ_PAIR_FP(sve2_fminnmp_zpzz_d, float64,     , float64_minnum)
+DO_ZPZZ_PAIR_FP(sve2_fminnmp_zpzz_d, float64, H1_8, float64_minnum)
 
 DO_ZPZZ_PAIR_FP(sve2_fmaxp_zpzz_h, float16, H1_2, float16_max)
 DO_ZPZZ_PAIR_FP(sve2_fmaxp_zpzz_s, float32, H1_4, float32_max)
-DO_ZPZZ_PAIR_FP(sve2_fmaxp_zpzz_d, float64,     , float64_max)
+DO_ZPZZ_PAIR_FP(sve2_fmaxp_zpzz_d, float64, H1_8, float64_max)
 
 DO_ZPZZ_PAIR_FP(sve2_fminp_zpzz_h, float16, H1_2, float16_min)
 DO_ZPZZ_PAIR_FP(sve2_fminp_zpzz_s, float32, H1_4, float32_min)
-DO_ZPZZ_PAIR_FP(sve2_fminp_zpzz_d, float64,     , float64_min)
+DO_ZPZZ_PAIR_FP(sve2_fminp_zpzz_d, float64, H1_8, float64_min)
 
 #undef DO_ZPZZ_PAIR_FP
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)          \
 
 DO_ZZZ_TB(sve2_saddl_h, int16_t, int8_t, H1_2, H1, DO_ADD)
 DO_ZZZ_TB(sve2_saddl_s, int32_t, int16_t, H1_4, H1_2, DO_ADD)
-DO_ZZZ_TB(sve2_saddl_d, int64_t, int32_t,     , H1_4, DO_ADD)
+DO_ZZZ_TB(sve2_saddl_d, int64_t, int32_t, H1_8, H1_4, DO_ADD)
 
 DO_ZZZ_TB(sve2_ssubl_h, int16_t, int8_t, H1_2, H1, DO_SUB)
 DO_ZZZ_TB(sve2_ssubl_s, int32_t, int16_t, H1_4, H1_2, DO_SUB)
-DO_ZZZ_TB(sve2_ssubl_d, int64_t, int32_t,     , H1_4, DO_SUB)
+DO_ZZZ_TB(sve2_ssubl_d, int64_t, int32_t, H1_8, H1_4, DO_SUB)
 
 DO_ZZZ_TB(sve2_sabdl_h, int16_t, int8_t, H1_2, H1, DO_ABD)
 DO_ZZZ_TB(sve2_sabdl_s, int32_t, int16_t, H1_4, H1_2, DO_ABD)
-DO_ZZZ_TB(sve2_sabdl_d, int64_t, int32_t,     , H1_4, DO_ABD)
+DO_ZZZ_TB(sve2_sabdl_d, int64_t, int32_t, H1_8, H1_4, DO_ABD)
 
 DO_ZZZ_TB(sve2_uaddl_h, uint16_t, uint8_t, H1_2, H1, DO_ADD)
 DO_ZZZ_TB(sve2_uaddl_s, uint32_t, uint16_t, H1_4, H1_2, DO_ADD)
-DO_ZZZ_TB(sve2_uaddl_d, uint64_t, uint32_t,     , H1_4, DO_ADD)
+DO_ZZZ_TB(sve2_uaddl_d, uint64_t, uint32_t, H1_8, H1_4, DO_ADD)
 
 DO_ZZZ_TB(sve2_usubl_h, uint16_t, uint8_t, H1_2, H1, DO_SUB)
 DO_ZZZ_TB(sve2_usubl_s, uint32_t, uint16_t, H1_4, H1_2, DO_SUB)
-DO_ZZZ_TB(sve2_usubl_d, uint64_t, uint32_t,     , H1_4, DO_SUB)
+DO_ZZZ_TB(sve2_usubl_d, uint64_t, uint32_t, H1_8, H1_4, DO_SUB)
 
 DO_ZZZ_TB(sve2_uabdl_h, uint16_t, uint8_t, H1_2, H1, DO_ABD)
 DO_ZZZ_TB(sve2_uabdl_s, uint32_t, uint16_t, H1_4, H1_2, DO_ABD)
-DO_ZZZ_TB(sve2_uabdl_d, uint64_t, uint32_t,     , H1_4, DO_ABD)
+DO_ZZZ_TB(sve2_uabdl_d, uint64_t, uint32_t, H1_8, H1_4, DO_ABD)
 
 DO_ZZZ_TB(sve2_smull_zzz_h, int16_t, int8_t, H1_2, H1, DO_MUL)
 DO_ZZZ_TB(sve2_smull_zzz_s, int32_t, int16_t, H1_4, H1_2, DO_MUL)
-DO_ZZZ_TB(sve2_smull_zzz_d, int64_t, int32_t,     , H1_4, DO_MUL)
+DO_ZZZ_TB(sve2_smull_zzz_d, int64_t, int32_t, H1_8, H1_4, DO_MUL)
 
 DO_ZZZ_TB(sve2_umull_zzz_h, uint16_t, uint8_t, H1_2, H1, DO_MUL)
 DO_ZZZ_TB(sve2_umull_zzz_s, uint32_t, uint16_t, H1_4, H1_2, DO_MUL)
-DO_ZZZ_TB(sve2_umull_zzz_d, uint64_t, uint32_t,     , H1_4, DO_MUL)
+DO_ZZZ_TB(sve2_umull_zzz_d, uint64_t, uint32_t, H1_8, H1_4, DO_MUL)
 
 /* Note that the multiply cannot overflow, but the doubling can. */
 static inline int16_t do_sqdmull_h(int16_t n, int16_t m)
@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqdmull_d(int64_t n, int64_t m)
 
 DO_ZZZ_TB(sve2_sqdmull_zzz_h, int16_t, int8_t, H1_2, H1, do_sqdmull_h)
 DO_ZZZ_TB(sve2_sqdmull_zzz_s, int32_t, int16_t, H1_4, H1_2, do_sqdmull_s)
-DO_ZZZ_TB(sve2_sqdmull_zzz_d, int64_t, int32_t,     , H1_4, do_sqdmull_d)
+DO_ZZZ_TB(sve2_sqdmull_zzz_d, int64_t, int32_t, H1_8, H1_4, do_sqdmull_d)
 
 #undef DO_ZZZ_TB
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
 
 DO_ZZZ_WTB(sve2_saddw_h, int16_t, int8_t, H1_2, H1, DO_ADD)
 DO_ZZZ_WTB(sve2_saddw_s, int32_t, int16_t, H1_4, H1_2, DO_ADD)
-DO_ZZZ_WTB(sve2_saddw_d, int64_t, int32_t,     , H1_4, DO_ADD)
+DO_ZZZ_WTB(sve2_saddw_d, int64_t, int32_t, H1_8, H1_4, DO_ADD)
 
 DO_ZZZ_WTB(sve2_ssubw_h, int16_t, int8_t, H1_2, H1, DO_SUB)
 DO_ZZZ_WTB(sve2_ssubw_s, int32_t, int16_t, H1_4, H1_2, DO_SUB)
-DO_ZZZ_WTB(sve2_ssubw_d, int64_t, int32_t,     , H1_4, DO_SUB)
+DO_ZZZ_WTB(sve2_ssubw_d, int64_t, int32_t, H1_8, H1_4, DO_SUB)
 
 DO_ZZZ_WTB(sve2_uaddw_h, uint16_t, uint8_t, H1_2, H1, DO_ADD)
 DO_ZZZ_WTB(sve2_uaddw_s, uint32_t, uint16_t, H1_4, H1_2, DO_ADD)
-DO_ZZZ_WTB(sve2_uaddw_d, uint64_t, uint32_t,     , H1_4, DO_ADD)
+DO_ZZZ_WTB(sve2_uaddw_d, uint64_t, uint32_t, H1_8, H1_4, DO_ADD)
 
 DO_ZZZ_WTB(sve2_usubw_h, uint16_t, uint8_t, H1_2, H1, DO_SUB)
 DO_ZZZ_WTB(sve2_usubw_s, uint32_t, uint16_t, H1_4, H1_2, DO_SUB)
-DO_ZZZ_WTB(sve2_usubw_d, uint64_t, uint32_t,     , H1_4, DO_SUB)
+DO_ZZZ_WTB(sve2_usubw_d, uint64_t, uint32_t, H1_8, H1_4, DO_SUB)
 
 #undef DO_ZZZ_WTB
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)          \
 DO_ZZZ_NTB(sve2_eoril_b, uint8_t, H1, DO_EOR)
 DO_ZZZ_NTB(sve2_eoril_h, uint16_t, H1_2, DO_EOR)
 DO_ZZZ_NTB(sve2_eoril_s, uint32_t, H1_4, DO_EOR)
-DO_ZZZ_NTB(sve2_eoril_d, uint64_t,     , DO_EOR)
+DO_ZZZ_NTB(sve2_eoril_d, uint64_t, H1_8, DO_EOR)
 
 #undef DO_ZZZ_NTB
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
 
 DO_ZZZW_ACC(sve2_sabal_h, int16_t, int8_t, H1_2, H1, DO_ABD)
 DO_ZZZW_ACC(sve2_sabal_s, int32_t, int16_t, H1_4, H1_2, DO_ABD)
-DO_ZZZW_ACC(sve2_sabal_d, int64_t, int32_t,     , H1_4, DO_ABD)
+DO_ZZZW_ACC(sve2_sabal_d, int64_t, int32_t, H1_8, H1_4, DO_ABD)
 
 DO_ZZZW_ACC(sve2_uabal_h, uint16_t, uint8_t, H1_2, H1, DO_ABD)
 DO_ZZZW_ACC(sve2_uabal_s, uint32_t, uint16_t, H1_4, H1_2, DO_ABD)
-DO_ZZZW_ACC(sve2_uabal_d, uint64_t, uint32_t,     , H1_4, DO_ABD)
+DO_ZZZW_ACC(sve2_uabal_d, uint64_t, uint32_t, H1_8, H1_4, DO_ABD)
 
 DO_ZZZW_ACC(sve2_smlal_zzzw_h, int16_t, int8_t, H1_2, H1, DO_MUL)
 DO_ZZZW_ACC(sve2_smlal_zzzw_s, int32_t, int16_t, H1_4, H1_2, DO_MUL)
-DO_ZZZW_ACC(sve2_smlal_zzzw_d, int64_t, int32_t,     , H1_4, DO_MUL)
+DO_ZZZW_ACC(sve2_smlal_zzzw_d, int64_t, int32_t, H1_8, H1_4, DO_MUL)
 
 DO_ZZZW_ACC(sve2_umlal_zzzw_h, uint16_t, uint8_t, H1_2, H1, DO_MUL)
 DO_ZZZW_ACC(sve2_umlal_zzzw_s, uint32_t, uint16_t, H1_4, H1_2, DO_MUL)
-DO_ZZZW_ACC(sve2_umlal_zzzw_d, uint64_t, uint32_t,     , H1_4, DO_MUL)
+DO_ZZZW_ACC(sve2_umlal_zzzw_d, uint64_t, uint32_t, H1_8, H1_4, DO_MUL)
 
 #define DO_NMUL(N, M)  -(N * M)
 
 DO_ZZZW_ACC(sve2_smlsl_zzzw_h, int16_t, int8_t, H1_2, H1, DO_NMUL)
 DO_ZZZW_ACC(sve2_smlsl_zzzw_s, int32_t, int16_t, H1_4, H1_2, DO_NMUL)
-DO_ZZZW_ACC(sve2_smlsl_zzzw_d, int64_t, int32_t,     , H1_4, DO_NMUL)
+DO_ZZZW_ACC(sve2_smlsl_zzzw_d, int64_t, int32_t, H1_8, H1_4, DO_NMUL)
 
 DO_ZZZW_ACC(sve2_umlsl_zzzw_h, uint16_t, uint8_t, H1_2, H1, DO_NMUL)
 DO_ZZZW_ACC(sve2_umlsl_zzzw_s, uint32_t, uint16_t, H1_4, H1_2, DO_NMUL)
-DO_ZZZW_ACC(sve2_umlsl_zzzw_d, uint64_t, uint32_t,     , H1_4, DO_NMUL)
+DO_ZZZW_ACC(sve2_umlsl_zzzw_d, uint64_t, uint32_t, H1_8, H1_4, DO_NMUL)
 
 #undef DO_ZZZW_ACC
 
@@ -XXX,XX +XXX,XX @@ DO_SQDMLAL(sve2_sqdmlal_zzzw_h, int16_t, int8_t, H1_2, H1,
            do_sqdmull_h, DO_SQADD_H)
 DO_SQDMLAL(sve2_sqdmlal_zzzw_s, int32_t, int16_t, H1_4, H1_2,
            do_sqdmull_s, DO_SQADD_S)
-DO_SQDMLAL(sve2_sqdmlal_zzzw_d, int64_t, int32_t,     , H1_4,
+DO_SQDMLAL(sve2_sqdmlal_zzzw_d, int64_t, int32_t, H1_8, H1_4,
            do_sqdmull_d, do_sqadd_d)
 
 DO_SQDMLAL(sve2_sqdmlsl_zzzw_h, int16_t, int8_t, H1_2, H1,
            do_sqdmull_h, DO_SQSUB_H)
 DO_SQDMLAL(sve2_sqdmlsl_zzzw_s, int32_t, int16_t, H1_4, H1_2,
            do_sqdmull_s, DO_SQSUB_S)
-DO_SQDMLAL(sve2_sqdmlsl_zzzw_d, int64_t, int32_t,     , H1_4,
+DO_SQDMLAL(sve2_sqdmlsl_zzzw_d, int64_t, int32_t, H1_8, H1_4,
            do_sqdmull_d, do_sqsub_d)
 
 #undef DO_SQDMLAL
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
 DO_CMLA_FUNC(sve2_cmla_zzzz_b, uint8_t, H1, DO_CMLA)
 DO_CMLA_FUNC(sve2_cmla_zzzz_h, uint16_t, H2, DO_CMLA)
 DO_CMLA_FUNC(sve2_cmla_zzzz_s, uint32_t, H4, DO_CMLA)
-DO_CMLA_FUNC(sve2_cmla_zzzz_d, uint64_t,   , DO_CMLA)
+DO_CMLA_FUNC(sve2_cmla_zzzz_d, uint64_t, H8, DO_CMLA)
 
 #define DO_SQRDMLAH_B(N, M, A, S) \
     do_sqrdmlah_b(N, M, A, S, true)
@@ -XXX,XX +XXX,XX @@ DO_CMLA_FUNC(sve2_cmla_zzzz_d, uint64_t,   , DO_CMLA)
 DO_CMLA_FUNC(sve2_sqrdcmlah_zzzz_b, int8_t, H1, DO_SQRDMLAH_B)
 DO_CMLA_FUNC(sve2_sqrdcmlah_zzzz_h, int16_t, H2, DO_SQRDMLAH_H)
 DO_CMLA_FUNC(sve2_sqrdcmlah_zzzz_s, int32_t, H4, DO_SQRDMLAH_S)
-DO_CMLA_FUNC(sve2_sqrdcmlah_zzzz_d, int64_t,   , DO_SQRDMLAH_D)
+DO_CMLA_FUNC(sve2_sqrdcmlah_zzzz_d, int64_t, H8, DO_SQRDMLAH_D)
 
 #define DO_CMLA_IDX_FUNC(NAME, TYPE, H, OP) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)    \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
 
 DO_ZZXZ(sve2_sqrdmlah_idx_h, int16_t, H2, DO_SQRDMLAH_H)
 DO_ZZXZ(sve2_sqrdmlah_idx_s, int32_t, H4, DO_SQRDMLAH_S)
-DO_ZZXZ(sve2_sqrdmlah_idx_d, int64_t,   , DO_SQRDMLAH_D)
+DO_ZZXZ(sve2_sqrdmlah_idx_d, int64_t, H8, DO_SQRDMLAH_D)
 
 #define DO_SQRDMLSH_H(N, M, A) \
     ({ uint32_t discard; do_sqrdmlah_h(N, M, A, true, true, &discard); })
@@ -XXX,XX +XXX,XX @@ DO_ZZXZ(sve2_sqrdmlah_idx_d, int64_t,   , DO_SQRDMLAH_D)
 
 DO_ZZXZ(sve2_sqrdmlsh_idx_h, int16_t, H2, DO_SQRDMLSH_H)
 DO_ZZXZ(sve2_sqrdmlsh_idx_s, int32_t, H4, DO_SQRDMLSH_S)
-DO_ZZXZ(sve2_sqrdmlsh_idx_d, int64_t,   , DO_SQRDMLSH_D)
+DO_ZZXZ(sve2_sqrdmlsh_idx_d, int64_t, H8, DO_SQRDMLSH_D)
 
 #undef DO_ZZXZ
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)  \
 #define DO_MLA(N, M, A)  (A + N * M)
 
 DO_ZZXW(sve2_smlal_idx_s, int32_t, int16_t, H1_4, H1_2, DO_MLA)
-DO_ZZXW(sve2_smlal_idx_d, int64_t, int32_t,     , H1_4, DO_MLA)
+DO_ZZXW(sve2_smlal_idx_d, int64_t, int32_t, H1_8, H1_4, DO_MLA)
 DO_ZZXW(sve2_umlal_idx_s, uint32_t, uint16_t, H1_4, H1_2, DO_MLA)
-DO_ZZXW(sve2_umlal_idx_d, uint64_t, uint32_t,     , H1_4, DO_MLA)
+DO_ZZXW(sve2_umlal_idx_d, uint64_t, uint32_t, H1_8, H1_4, DO_MLA)
 
 #define DO_MLS(N, M, A)  (A - N * M)
 
 DO_ZZXW(sve2_smlsl_idx_s, int32_t, int16_t, H1_4, H1_2, DO_MLS)
-DO_ZZXW(sve2_smlsl_idx_d, int64_t, int32_t,     , H1_4, DO_MLS)
+DO_ZZXW(sve2_smlsl_idx_d, int64_t, int32_t, H1_8, H1_4, DO_MLS)
 DO_ZZXW(sve2_umlsl_idx_s, uint32_t, uint16_t, H1_4, H1_2, DO_MLS)
-DO_ZZXW(sve2_umlsl_idx_d, uint64_t, uint32_t,     , H1_4, DO_MLS)
+DO_ZZXW(sve2_umlsl_idx_d, uint64_t, uint32_t, H1_8, H1_4, DO_MLS)
 
 #define DO_SQDMLAL_S(N, M, A)  DO_SQADD_S(A, do_sqdmull_s(N, M))
 #define DO_SQDMLAL_D(N, M, A)  do_sqadd_d(A, do_sqdmull_d(N, M))
 
 DO_ZZXW(sve2_sqdmlal_idx_s, int32_t, int16_t, H1_4, H1_2, DO_SQDMLAL_S)
-DO_ZZXW(sve2_sqdmlal_idx_d, int64_t, int32_t,     , H1_4, DO_SQDMLAL_D)
+DO_ZZXW(sve2_sqdmlal_idx_d, int64_t, int32_t, H1_8, H1_4, DO_SQDMLAL_D)
 
 #define DO_SQDMLSL_S(N, M, A)  DO_SQSUB_S(A, do_sqdmull_s(N, M))
 #define DO_SQDMLSL_D(N, M, A)  do_sqsub_d(A, do_sqdmull_d(N, M))
 
 DO_ZZXW(sve2_sqdmlsl_idx_s, int32_t, int16_t, H1_4, H1_2, DO_SQDMLSL_S)
-DO_ZZXW(sve2_sqdmlsl_idx_d, int64_t, int32_t,     , H1_4, DO_SQDMLSL_D)
+DO_ZZXW(sve2_sqdmlsl_idx_d, int64_t, int32_t, H1_8, H1_4, DO_SQDMLSL_D)
 
 #undef DO_MLA
 #undef DO_MLS
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)            \
 }
 
 DO_ZZX(sve2_sqdmull_idx_s, int32_t, int16_t, H1_4, H1_2, do_sqdmull_s)
-DO_ZZX(sve2_sqdmull_idx_d, int64_t, int32_t,     , H1_4, do_sqdmull_d)
+DO_ZZX(sve2_sqdmull_idx_d, int64_t, int32_t, H1_8, H1_4, do_sqdmull_d)
 
 DO_ZZX(sve2_smull_idx_s, int32_t, int16_t, H1_4, H1_2, DO_MUL)
-DO_ZZX(sve2_smull_idx_d, int64_t, int32_t,     , H1_4, DO_MUL)
+DO_ZZX(sve2_smull_idx_d, int64_t, int32_t, H1_8, H1_4, DO_MUL)
 
 DO_ZZX(sve2_umull_idx_s, uint32_t, uint16_t, H1_4, H1_2, DO_MUL)
-DO_ZZX(sve2_umull_idx_d, uint64_t, uint32_t,     , H1_4, DO_MUL)
+DO_ZZX(sve2_umull_idx_d, uint64_t, uint32_t, H1_8, H1_4, DO_MUL)
 
 #undef DO_ZZX
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)  \
 DO_CADD(sve2_cadd_b, int8_t, H1, DO_ADD, DO_SUB)
 DO_CADD(sve2_cadd_h, int16_t, H1_2, DO_ADD, DO_SUB)
 DO_CADD(sve2_cadd_s, int32_t, H1_4, DO_ADD, DO_SUB)
-DO_CADD(sve2_cadd_d, int64_t,     , DO_ADD, DO_SUB)
+DO_CADD(sve2_cadd_d, int64_t, H1_8, DO_ADD, DO_SUB)
 
 DO_CADD(sve2_sqcadd_b, int8_t, H1, DO_SQADD_B, DO_SQSUB_B)
 DO_CADD(sve2_sqcadd_h, int16_t, H1_2, DO_SQADD_H, DO_SQSUB_H)
 DO_CADD(sve2_sqcadd_s, int32_t, H1_4, DO_SQADD_S, DO_SQSUB_S)
-DO_CADD(sve2_sqcadd_d, int64_t,     , do_sqadd_d, do_sqsub_d)
+DO_CADD(sve2_sqcadd_d, int64_t, H1_8, do_sqadd_d, do_sqsub_d)
 
 #undef DO_CADD
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, uint32_t desc)           \
 
 DO_ZZI_SHLL(sve2_sshll_h, int16_t, int8_t, H1_2, H1)
 DO_ZZI_SHLL(sve2_sshll_s, int32_t, int16_t, H1_4, H1_2)
-DO_ZZI_SHLL(sve2_sshll_d, int64_t, int32_t,     , H1_4)
+DO_ZZI_SHLL(sve2_sshll_d, int64_t, int32_t, H1_8, H1_4)
 
 DO_ZZI_SHLL(sve2_ushll_h, uint16_t, uint8_t, H1_2, H1)
 DO_ZZI_SHLL(sve2_ushll_s, uint32_t, uint16_t, H1_4, H1_2)
-DO_ZZI_SHLL(sve2_ushll_d, uint64_t, uint32_t,     , H1_4)
+DO_ZZI_SHLL(sve2_ushll_d, uint64_t, uint32_t, H1_8, H1_4)
 
 #undef DO_ZZI_SHLL
 
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_shrnb_d, uint64_t, uint32_t, DO_SHR)
 
 DO_SHRNT(sve2_shrnt_h, uint16_t, uint8_t, H1_2, H1, DO_SHR)
 DO_SHRNT(sve2_shrnt_s, uint32_t, uint16_t, H1_4, H1_2, DO_SHR)
-DO_SHRNT(sve2_shrnt_d, uint64_t, uint32_t,     , H1_4, DO_SHR)
+DO_SHRNT(sve2_shrnt_d, uint64_t, uint32_t, H1_8, H1_4, DO_SHR)
 
 DO_SHRNB(sve2_rshrnb_h, uint16_t, uint8_t, do_urshr)
 DO_SHRNB(sve2_rshrnb_s, uint32_t, uint16_t, do_urshr)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_rshrnb_d, uint64_t, uint32_t, do_urshr)
 
 DO_SHRNT(sve2_rshrnt_h, uint16_t, uint8_t, H1_2, H1, do_urshr)
 DO_SHRNT(sve2_rshrnt_s, uint32_t, uint16_t, H1_4, H1_2, do_urshr)
-DO_SHRNT(sve2_rshrnt_d, uint64_t, uint32_t,     , H1_4, do_urshr)
+DO_SHRNT(sve2_rshrnt_d, uint64_t, uint32_t, H1_8, H1_4, do_urshr)
 
 #define DO_SQSHRUN_H(x, sh) do_sat_bhs((int64_t)(x) >> sh, 0, UINT8_MAX)
 #define DO_SQSHRUN_S(x, sh) do_sat_bhs((int64_t)(x) >> sh, 0, UINT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_sqshrunb_d, int64_t, uint32_t, DO_SQSHRUN_D)
 
 DO_SHRNT(sve2_sqshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRUN_H)
 DO_SHRNT(sve2_sqshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRUN_S)
-DO_SHRNT(sve2_sqshrunt_d, int64_t, uint32_t,     , H1_4, DO_SQSHRUN_D)
+DO_SHRNT(sve2_sqshrunt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQSHRUN_D)
 
 #define DO_SQRSHRUN_H(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT8_MAX)
 #define DO_SQRSHRUN_S(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_sqrshrunb_d, int64_t, uint32_t, DO_SQRSHRUN_D)
 
 DO_SHRNT(sve2_sqrshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQRSHRUN_H)
 DO_SHRNT(sve2_sqrshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQRSHRUN_S)
-DO_SHRNT(sve2_sqrshrunt_d, int64_t, uint32_t,     , H1_4, DO_SQRSHRUN_D)
+DO_SHRNT(sve2_sqrshrunt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQRSHRUN_D)
 
 #define DO_SQSHRN_H(x, sh) do_sat_bhs(x >> sh, INT8_MIN, INT8_MAX)
 #define DO_SQSHRN_S(x, sh) do_sat_bhs(x >> sh, INT16_MIN, INT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_sqshrnb_d, int64_t, uint32_t, DO_SQSHRN_D)
 
 DO_SHRNT(sve2_sqshrnt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRN_H)
 DO_SHRNT(sve2_sqshrnt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRN_S)
-DO_SHRNT(sve2_sqshrnt_d, int64_t, uint32_t,     , H1_4, DO_SQSHRN_D)
+DO_SHRNT(sve2_sqshrnt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQSHRN_D)
 
 #define DO_SQRSHRN_H(x, sh) do_sat_bhs(do_srshr(x, sh), INT8_MIN, INT8_MAX)
 #define DO_SQRSHRN_S(x, sh) do_sat_bhs(do_srshr(x, sh), INT16_MIN, INT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_sqrshrnb_d, int64_t, uint32_t, DO_SQRSHRN_D)
 
 DO_SHRNT(sve2_sqrshrnt_h, int16_t, uint8_t, H1_2, H1, DO_SQRSHRN_H)
 DO_SHRNT(sve2_sqrshrnt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQRSHRN_S)
-DO_SHRNT(sve2_sqrshrnt_d, int64_t, uint32_t,     , H1_4, DO_SQRSHRN_D)
+DO_SHRNT(sve2_sqrshrnt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQRSHRN_D)
 
 #define DO_UQSHRN_H(x, sh) MIN(x >> sh, UINT8_MAX)
 #define DO_UQSHRN_S(x, sh) MIN(x >> sh, UINT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_uqshrnb_d, uint64_t, uint32_t, DO_UQSHRN_D)
 
 DO_SHRNT(sve2_uqshrnt_h, uint16_t, uint8_t, H1_2, H1, DO_UQSHRN_H)
 DO_SHRNT(sve2_uqshrnt_s, uint32_t, uint16_t, H1_4, H1_2, DO_UQSHRN_S)
-DO_SHRNT(sve2_uqshrnt_d, uint64_t, uint32_t,     , H1_4, DO_UQSHRN_D)
+DO_SHRNT(sve2_uqshrnt_d, uint64_t, uint32_t, H1_8, H1_4, DO_UQSHRN_D)
 
 #define DO_UQRSHRN_H(x, sh) MIN(do_urshr(x, sh), UINT8_MAX)
 #define DO_UQRSHRN_S(x, sh) MIN(do_urshr(x, sh), UINT16_MAX)
@@ -XXX,XX +XXX,XX @@ DO_SHRNB(sve2_uqrshrnb_d, uint64_t, uint32_t, DO_UQRSHRN_D)
 
 DO_SHRNT(sve2_uqrshrnt_h, uint16_t, uint8_t, H1_2, H1, DO_UQRSHRN_H)
 DO_SHRNT(sve2_uqrshrnt_s, uint32_t, uint16_t, H1_4, H1_2, DO_UQRSHRN_S)
-DO_SHRNT(sve2_uqrshrnt_d, uint64_t, uint32_t,     , H1_4, DO_UQRSHRN_D)
+DO_SHRNT(sve2_uqrshrnt_d, uint64_t, uint32_t, H1_8, H1_4, DO_UQRSHRN_D)
 
 #undef DO_SHRNB
 #undef DO_SHRNT
@@ -XXX,XX +XXX,XX @@ DO_BINOPNB(sve2_addhnb_d, uint64_t, uint32_t, 32, DO_ADDHN)
 
 DO_BINOPNT(sve2_addhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_ADDHN)
 DO_BINOPNT(sve2_addhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_ADDHN)
-DO_BINOPNT(sve2_addhnt_d, uint64_t, uint32_t, 32,     , H1_4, DO_ADDHN)
+DO_BINOPNT(sve2_addhnt_d, uint64_t, uint32_t, 32, H1_8, H1_4, DO_ADDHN)
 
 DO_BINOPNB(sve2_raddhnb_h, uint16_t, uint8_t, 8, DO_RADDHN)
 DO_BINOPNB(sve2_raddhnb_s, uint32_t, uint16_t, 16, DO_RADDHN)
@@ -XXX,XX +XXX,XX @@ DO_BINOPNB(sve2_raddhnb_d, uint64_t, uint32_t, 32, DO_RADDHN)
 
 DO_BINOPNT(sve2_raddhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_RADDHN)
 DO_BINOPNT(sve2_raddhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_RADDHN)
-DO_BINOPNT(sve2_raddhnt_d, uint64_t, uint32_t, 32,     , H1_4, DO_RADDHN)
+DO_BINOPNT(sve2_raddhnt_d, uint64_t, uint32_t, 32, H1_8, H1_4, DO_RADDHN)
 
 DO_BINOPNB(sve2_subhnb_h, uint16_t, uint8_t, 8, DO_SUBHN)
 DO_BINOPNB(sve2_subhnb_s, uint32_t, uint16_t, 16, DO_SUBHN)
@@ -XXX,XX +XXX,XX @@ DO_BINOPNB(sve2_subhnb_d, uint64_t, uint32_t, 32, DO_SUBHN)
 
 DO_BINOPNT(sve2_subhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_SUBHN)
 DO_BINOPNT(sve2_subhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_SUBHN)
-DO_BINOPNT(sve2_subhnt_d, uint64_t, uint32_t, 32,     , H1_4, DO_SUBHN)
+DO_BINOPNT(sve2_subhnt_d, uint64_t, uint32_t, 32, H1_8, H1_4, DO_SUBHN)
 
 DO_BINOPNB(sve2_rsubhnb_h, uint16_t, uint8_t, 8, DO_RSUBHN)
 DO_BINOPNB(sve2_rsubhnb_s, uint32_t, uint16_t, 16, DO_RSUBHN)
@@ -XXX,XX +XXX,XX @@ DO_BINOPNB(sve2_rsubhnb_d, uint64_t, uint32_t, 32, DO_RSUBHN)
 
 DO_BINOPNT(sve2_rsubhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_RSUBHN)
 DO_BINOPNT(sve2_rsubhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_RSUBHN)
-DO_BINOPNT(sve2_rsubhnt_d, uint64_t, uint32_t, 32,     , H1_4, DO_RSUBHN)
+DO_BINOPNT(sve2_rsubhnt_d, uint64_t, uint32_t, 32, H1_8, H1_4, DO_RSUBHN)
 
 #undef DO_RSUBHN
 #undef DO_SUBHN
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, uint64_t val, uint32_t desc) \
 DO_INSR(sve_insr_b, uint8_t, H1)
 DO_INSR(sve_insr_h, uint16_t, H1_2)
 DO_INSR(sve_insr_s, uint32_t, H1_4)
-DO_INSR(sve_insr_d, uint64_t, )
+DO_INSR(sve_insr_d, uint64_t, H1_8)
 
 #undef DO_INSR
 
@@ -XXX,XX +XXX,XX @@ void HELPER(sve2_tbx_##SUFF)(void *vd, void *vn, void *vm, uint32_t desc) \
 DO_TB(b, uint8_t, H1)
 DO_TB(h, uint16_t, H2)
 DO_TB(s, uint32_t, H4)
-DO_TB(d, uint64_t,   )
+DO_TB(d, uint64_t, H8)
 
 #undef DO_TB
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, uint32_t desc)           \
 
 DO_UNPK(sve_sunpk_h, int16_t, int8_t, H2, H1)
 DO_UNPK(sve_sunpk_s, int32_t, int16_t, H4, H2)
-DO_UNPK(sve_sunpk_d, int64_t, int32_t, , H4)
+DO_UNPK(sve_sunpk_d, int64_t, int32_t, H8, H4)
 
 DO_UNPK(sve_uunpk_h, uint16_t, uint8_t, H2, H1)
 DO_UNPK(sve_uunpk_s, uint32_t, uint16_t, H4, H2)
-DO_UNPK(sve_uunpk_d, uint64_t, uint32_t, , H4)
+DO_UNPK(sve_uunpk_d, uint64_t, uint32_t, H8, H4)
 
 #undef DO_UNPK
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)       \
 DO_ZIP(sve_zip_b, uint8_t, H1)
 DO_ZIP(sve_zip_h, uint16_t, H1_2)
 DO_ZIP(sve_zip_s, uint32_t, H1_4)
-DO_ZIP(sve_zip_d, uint64_t, )
+DO_ZIP(sve_zip_d, uint64_t, H1_8)
 DO_ZIP(sve2_zip_q, Int128, )
 
 #define DO_UZP(NAME, TYPE, H) \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)         \
 DO_UZP(sve_uzp_b, uint8_t, H1)
 DO_UZP(sve_uzp_h, uint16_t, H1_2)
 DO_UZP(sve_uzp_s, uint32_t, H1_4)
-DO_UZP(sve_uzp_d, uint64_t, )
+DO_UZP(sve_uzp_d, uint64_t, H1_8)
 DO_UZP(sve2_uzp_q, Int128, )
 
 #define DO_TRN(NAME, TYPE, H) \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)         \
 DO_TRN(sve_trn_b, uint8_t, H1)
 DO_TRN(sve_trn_h, uint16_t, H1_2)
 DO_TRN(sve_trn_s, uint32_t, H1_4)
-DO_TRN(sve_trn_d, uint64_t, )
+DO_TRN(sve_trn_d, uint64_t, H1_8)
 DO_TRN(sve2_trn_q, Int128, )
 
 #undef DO_ZIP
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
 #define DO_CMP_PPZZ_S(NAME, TYPE, OP) \
     DO_CMP_PPZZ(NAME, TYPE, OP, H1_4, 0x1111111111111111ull)
 #define DO_CMP_PPZZ_D(NAME, TYPE, OP) \
-    DO_CMP_PPZZ(NAME, TYPE, OP,     , 0x0101010101010101ull)
+    DO_CMP_PPZZ(NAME, TYPE, OP, H1_8, 0x0101010101010101ull)
 
 DO_CMP_PPZZ_B(sve_cmpeq_ppzz_b, uint8_t,  ==)
 DO_CMP_PPZZ_H(sve_cmpeq_ppzz_h, uint16_t, ==)
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc)   \
 #define DO_CMP_PPZI_S(NAME, TYPE, OP) \
     DO_CMP_PPZI(NAME, TYPE, OP, H1_4, 0x1111111111111111ull)
 #define DO_CMP_PPZI_D(NAME, TYPE, OP) \
-    DO_CMP_PPZI(NAME, TYPE, OP,     , 0x0101010101010101ull)
+    DO_CMP_PPZI(NAME, TYPE, OP, H1_8, 0x0101010101010101ull)
 
 DO_CMP_PPZI_B(sve_cmpeq_ppzi_b, uint8_t,  ==)
 DO_CMP_PPZI_H(sve_cmpeq_ppzi_h, uint16_t, ==)
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(NAME)(void *vn, void *vg, void *vs, uint32_t desc)    \
 
 DO_REDUCE(sve_faddv_h, float16, H1_2, add, float16_zero)
 DO_REDUCE(sve_faddv_s, float32, H1_4, add, float32_zero)
-DO_REDUCE(sve_faddv_d, float64,     , add, float64_zero)
+DO_REDUCE(sve_faddv_d, float64, H1_8, add, float64_zero)
 
 /* Identity is floatN_default_nan, without the function call.  */
 DO_REDUCE(sve_fminnmv_h, float16, H1_2, minnum, 0x7E00)
 DO_REDUCE(sve_fminnmv_s, float32, H1_4, minnum, 0x7FC00000)
-DO_REDUCE(sve_fminnmv_d, float64,     , minnum, 0x7FF8000000000000ULL)
+DO_REDUCE(sve_fminnmv_d, float64, H1_8, minnum, 0x7FF8000000000000ULL)
 
 DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, maxnum, 0x7E00)
 DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, maxnum, 0x7FC00000)
-DO_REDUCE(sve_fmaxnmv_d, float64,     , maxnum, 0x7FF8000000000000ULL)
+DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, maxnum, 0x7FF8000000000000ULL)
 
 DO_REDUCE(sve_fminv_h, float16, H1_2, min, float16_infinity)
 DO_REDUCE(sve_fminv_s, float32, H1_4, min, float32_infinity)
-DO_REDUCE(sve_fminv_d, float64,     , min, float64_infinity)
+DO_REDUCE(sve_fminv_d, float64, H1_8, min, float64_infinity)
 
 DO_REDUCE(sve_fmaxv_h, float16, H1_2, max, float16_chs(float16_infinity))
 DO_REDUCE(sve_fmaxv_s, float32, H1_4, max, float32_chs(float32_infinity))
-DO_REDUCE(sve_fmaxv_d, float64,     , max, float64_chs(float64_infinity))
+DO_REDUCE(sve_fmaxv_d, float64, H1_8, max, float64_chs(float64_infinity))
 
 #undef DO_REDUCE
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg,       \
 
 DO_ZPZZ_FP(sve_fadd_h, uint16_t, H1_2, float16_add)
 DO_ZPZZ_FP(sve_fadd_s, uint32_t, H1_4, float32_add)
-DO_ZPZZ_FP(sve_fadd_d, uint64_t,     , float64_add)
+DO_ZPZZ_FP(sve_fadd_d, uint64_t, H1_8, float64_add)
 
 DO_ZPZZ_FP(sve_fsub_h, uint16_t, H1_2, float16_sub)
 DO_ZPZZ_FP(sve_fsub_s, uint32_t, H1_4, float32_sub)
-DO_ZPZZ_FP(sve_fsub_d, uint64_t,     , float64_sub)
+DO_ZPZZ_FP(sve_fsub_d, uint64_t, H1_8, float64_sub)
 
 DO_ZPZZ_FP(sve_fmul_h, uint16_t, H1_2, float16_mul)
 DO_ZPZZ_FP(sve_fmul_s, uint32_t, H1_4, float32_mul)
-DO_ZPZZ_FP(sve_fmul_d, uint64_t,     , float64_mul)
+DO_ZPZZ_FP(sve_fmul_d, uint64_t, H1_8, float64_mul)
 
 DO_ZPZZ_FP(sve_fdiv_h, uint16_t, H1_2, float16_div)
 DO_ZPZZ_FP(sve_fdiv_s, uint32_t, H1_4, float32_div)
-DO_ZPZZ_FP(sve_fdiv_d, uint64_t,     , float64_div)
+DO_ZPZZ_FP(sve_fdiv_d, uint64_t, H1_8, float64_div)
 
 DO_ZPZZ_FP(sve_fmin_h, uint16_t, H1_2, float16_min)
 DO_ZPZZ_FP(sve_fmin_s, uint32_t, H1_4, float32_min)
-DO_ZPZZ_FP(sve_fmin_d, uint64_t,     , float64_min)
+DO_ZPZZ_FP(sve_fmin_d, uint64_t, H1_8, float64_min)
 
 DO_ZPZZ_FP(sve_fmax_h, uint16_t, H1_2, float16_max)
 DO_ZPZZ_FP(sve_fmax_s, uint32_t, H1_4, float32_max)
-DO_ZPZZ_FP(sve_fmax_d, uint64_t,     , float64_max)
+DO_ZPZZ_FP(sve_fmax_d, uint64_t, H1_8, float64_max)
 
 DO_ZPZZ_FP(sve_fminnum_h, uint16_t, H1_2, float16_minnum)
 DO_ZPZZ_FP(sve_fminnum_s, uint32_t, H1_4, float32_minnum)
-DO_ZPZZ_FP(sve_fminnum_d, uint64_t,     , float64_minnum)
+DO_ZPZZ_FP(sve_fminnum_d, uint64_t, H1_8, float64_minnum)
 
 DO_ZPZZ_FP(sve_fmaxnum_h, uint16_t, H1_2, float16_maxnum)
 DO_ZPZZ_FP(sve_fmaxnum_s, uint32_t, H1_4, float32_maxnum)
-DO_ZPZZ_FP(sve_fmaxnum_d, uint64_t,     , float64_maxnum)
+DO_ZPZZ_FP(sve_fmaxnum_d, uint64_t, H1_8, float64_maxnum)
 
 static inline float16 abd_h(float16 a, float16 b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static inline float64 abd_d(float64 a, float64 b, float_status *s)
 
 DO_ZPZZ_FP(sve_fabd_h, uint16_t, H1_2, abd_h)
 DO_ZPZZ_FP(sve_fabd_s, uint32_t, H1_4, abd_s)
-DO_ZPZZ_FP(sve_fabd_d, uint64_t,     , abd_d)
+DO_ZPZZ_FP(sve_fabd_d, uint64_t, H1_8, abd_d)
 
 static inline float64 scalbn_d(float64 a, int64_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static inline float64 scalbn_d(float64 a, int64_t b, float_status *s)
 
 DO_ZPZZ_FP(sve_fscalbn_h, int16_t, H1_2, float16_scalbn)
 DO_ZPZZ_FP(sve_fscalbn_s, int32_t, H1_4, float32_scalbn)
-DO_ZPZZ_FP(sve_fscalbn_d, int64_t,     , scalbn_d)
+DO_ZPZZ_FP(sve_fscalbn_d, int64_t, H1_8, scalbn_d)
 
 DO_ZPZZ_FP(sve_fmulx_h, uint16_t, H1_2, helper_advsimd_mulxh)
 DO_ZPZZ_FP(sve_fmulx_s, uint32_t, H1_4, helper_vfp_mulxs)
-DO_ZPZZ_FP(sve_fmulx_d, uint64_t,     , helper_vfp_mulxd)
+DO_ZPZZ_FP(sve_fmulx_d, uint64_t, H1_8, helper_vfp_mulxd)
 
 #undef DO_ZPZZ_FP
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, uint64_t scalar,  \
 
 DO_ZPZS_FP(sve_fadds_h, float16, H1_2, float16_add)
 DO_ZPZS_FP(sve_fadds_s, float32, H1_4, float32_add)
-DO_ZPZS_FP(sve_fadds_d, float64,     , float64_add)
+DO_ZPZS_FP(sve_fadds_d, float64, H1_8, float64_add)
 
 DO_ZPZS_FP(sve_fsubs_h, float16, H1_2, float16_sub)
 DO_ZPZS_FP(sve_fsubs_s, float32, H1_4, float32_sub)
-DO_ZPZS_FP(sve_fsubs_d, float64,     , float64_sub)
+DO_ZPZS_FP(sve_fsubs_d, float64, H1_8, float64_sub)
 
 DO_ZPZS_FP(sve_fmuls_h, float16, H1_2, float16_mul)
 DO_ZPZS_FP(sve_fmuls_s, float32, H1_4, float32_mul)
-DO_ZPZS_FP(sve_fmuls_d, float64,     , float64_mul)
+DO_ZPZS_FP(sve_fmuls_d, float64, H1_8, float64_mul)
 
 static inline float16 subr_h(float16 a, float16 b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static inline float64 subr_d(float64 a, float64 b, float_status *s)
 
 DO_ZPZS_FP(sve_fsubrs_h, float16, H1_2, subr_h)
 DO_ZPZS_FP(sve_fsubrs_s, float32, H1_4, subr_s)
-DO_ZPZS_FP(sve_fsubrs_d, float64,     , subr_d)
+DO_ZPZS_FP(sve_fsubrs_d, float64, H1_8, subr_d)
 
 DO_ZPZS_FP(sve_fmaxnms_h, float16, H1_2, float16_maxnum)
 DO_ZPZS_FP(sve_fmaxnms_s, float32, H1_4, float32_maxnum)
-DO_ZPZS_FP(sve_fmaxnms_d, float64,     , float64_maxnum)
+DO_ZPZS_FP(sve_fmaxnms_d, float64, H1_8, float64_maxnum)
 
 DO_ZPZS_FP(sve_fminnms_h, float16, H1_2, float16_minnum)
 DO_ZPZS_FP(sve_fminnms_s, float32, H1_4, float32_minnum)
-DO_ZPZS_FP(sve_fminnms_d, float64,     , float64_minnum)
+DO_ZPZS_FP(sve_fminnms_d, float64, H1_8, float64_minnum)
 
 DO_ZPZS_FP(sve_fmaxs_h, float16, H1_2, float16_max)
 DO_ZPZS_FP(sve_fmaxs_s, float32, H1_4, float32_max)
-DO_ZPZS_FP(sve_fmaxs_d, float64,     , float64_max)
+DO_ZPZS_FP(sve_fmaxs_d, float64, H1_8, float64_max)
 
 DO_ZPZS_FP(sve_fmins_h, float16, H1_2, float16_min)
 DO_ZPZS_FP(sve_fmins_s, float32, H1_4, float32_min)
-DO_ZPZS_FP(sve_fmins_d, float64,     , float64_min)
+DO_ZPZS_FP(sve_fmins_d, float64, H1_8, float64_min)
 
 /* Fully general two-operand expander, controlled by a predicate,
  * With the extra float_status parameter.
@@ -XXX,XX +XXX,XX @@ static inline uint64_t vfp_float64_to_uint64_rtz(float64 f, float_status *s)
 DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16)
 DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32)
 DO_ZPZ_FP(sve_bfcvt,   uint32_t, H1_4, float32_to_bfloat16)
-DO_ZPZ_FP(sve_fcvt_dh, uint64_t,     , sve_f64_to_f16)
-DO_ZPZ_FP(sve_fcvt_hd, uint64_t,     , sve_f16_to_f64)
-DO_ZPZ_FP(sve_fcvt_ds, uint64_t,     , float64_to_float32)
-DO_ZPZ_FP(sve_fcvt_sd, uint64_t,     , float32_to_float64)
+DO_ZPZ_FP(sve_fcvt_dh, uint64_t, H1_8, sve_f64_to_f16)
+DO_ZPZ_FP(sve_fcvt_hd, uint64_t, H1_8, sve_f16_to_f64)
+DO_ZPZ_FP(sve_fcvt_ds, uint64_t, H1_8, float64_to_float32)
+DO_ZPZ_FP(sve_fcvt_sd, uint64_t, H1_8, float32_to_float64)
 
 DO_ZPZ_FP(sve_fcvtzs_hh, uint16_t, H1_2, vfp_float16_to_int16_rtz)
 DO_ZPZ_FP(sve_fcvtzs_hs, uint32_t, H1_4, helper_vfp_tosizh)
 DO_ZPZ_FP(sve_fcvtzs_ss, uint32_t, H1_4, helper_vfp_tosizs)
-DO_ZPZ_FP(sve_fcvtzs_hd, uint64_t,     , vfp_float16_to_int64_rtz)
-DO_ZPZ_FP(sve_fcvtzs_sd, uint64_t,     , vfp_float32_to_int64_rtz)
-DO_ZPZ_FP(sve_fcvtzs_ds, uint64_t,     , helper_vfp_tosizd)
-DO_ZPZ_FP(sve_fcvtzs_dd, uint64_t,     , vfp_float64_to_int64_rtz)
+DO_ZPZ_FP(sve_fcvtzs_hd, uint64_t, H1_8, vfp_float16_to_int64_rtz)
+DO_ZPZ_FP(sve_fcvtzs_sd, uint64_t, H1_8, vfp_float32_to_int64_rtz)
+DO_ZPZ_FP(sve_fcvtzs_ds, uint64_t, H1_8, helper_vfp_tosizd)
+DO_ZPZ_FP(sve_fcvtzs_dd, uint64_t, H1_8, vfp_float64_to_int64_rtz)
 
 DO_ZPZ_FP(sve_fcvtzu_hh, uint16_t, H1_2, vfp_float16_to_uint16_rtz)
 DO_ZPZ_FP(sve_fcvtzu_hs, uint32_t, H1_4, helper_vfp_touizh)
 DO_ZPZ_FP(sve_fcvtzu_ss, uint32_t, H1_4, helper_vfp_touizs)
-DO_ZPZ_FP(sve_fcvtzu_hd, uint64_t,     , vfp_float16_to_uint64_rtz)
-DO_ZPZ_FP(sve_fcvtzu_sd, uint64_t,     , vfp_float32_to_uint64_rtz)
-DO_ZPZ_FP(sve_fcvtzu_ds, uint64_t,     , helper_vfp_touizd)
-DO_ZPZ_FP(sve_fcvtzu_dd, uint64_t,     , vfp_float64_to_uint64_rtz)
+DO_ZPZ_FP(sve_fcvtzu_hd, uint64_t, H1_8, vfp_float16_to_uint64_rtz)
+DO_ZPZ_FP(sve_fcvtzu_sd, uint64_t, H1_8, vfp_float32_to_uint64_rtz)
+DO_ZPZ_FP(sve_fcvtzu_ds, uint64_t, H1_8, helper_vfp_touizd)
+DO_ZPZ_FP(sve_fcvtzu_dd, uint64_t, H1_8, vfp_float64_to_uint64_rtz)
 
 DO_ZPZ_FP(sve_frint_h, uint16_t, H1_2, helper_advsimd_rinth)
 DO_ZPZ_FP(sve_frint_s, uint32_t, H1_4, helper_rints)
-DO_ZPZ_FP(sve_frint_d, uint64_t,     , helper_rintd)
+DO_ZPZ_FP(sve_frint_d, uint64_t, H1_8, helper_rintd)
 
 DO_ZPZ_FP(sve_frintx_h, uint16_t, H1_2, float16_round_to_int)
 DO_ZPZ_FP(sve_frintx_s, uint32_t, H1_4, float32_round_to_int)
-DO_ZPZ_FP(sve_frintx_d, uint64_t,     , float64_round_to_int)
+DO_ZPZ_FP(sve_frintx_d, uint64_t, H1_8, float64_round_to_int)
 
 DO_ZPZ_FP(sve_frecpx_h, uint16_t, H1_2, helper_frecpx_f16)
 DO_ZPZ_FP(sve_frecpx_s, uint32_t, H1_4, helper_frecpx_f32)
-DO_ZPZ_FP(sve_frecpx_d, uint64_t,     , helper_frecpx_f64)
+DO_ZPZ_FP(sve_frecpx_d, uint64_t, H1_8, helper_frecpx_f64)
 
 DO_ZPZ_FP(sve_fsqrt_h, uint16_t, H1_2, float16_sqrt)
 DO_ZPZ_FP(sve_fsqrt_s, uint32_t, H1_4, float32_sqrt)
-DO_ZPZ_FP(sve_fsqrt_d, uint64_t,     , float64_sqrt)
+DO_ZPZ_FP(sve_fsqrt_d, uint64_t, H1_8, float64_sqrt)
 
 DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16)
 DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16)
 DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32)
-DO_ZPZ_FP(sve_scvt_sd, uint64_t,     , int32_to_float64)
-DO_ZPZ_FP(sve_scvt_dh, uint64_t,     , int64_to_float16)
-DO_ZPZ_FP(sve_scvt_ds, uint64_t,     , int64_to_float32)
-DO_ZPZ_FP(sve_scvt_dd, uint64_t,     , int64_to_float64)
+DO_ZPZ_FP(sve_scvt_sd, uint64_t, H1_8, int32_to_float64)
+DO_ZPZ_FP(sve_scvt_dh, uint64_t, H1_8, int64_to_float16)
+DO_ZPZ_FP(sve_scvt_ds, uint64_t, H1_8, int64_to_float32)
+DO_ZPZ_FP(sve_scvt_dd, uint64_t, H1_8, int64_to_float64)
 
 DO_ZPZ_FP(sve_ucvt_hh, uint16_t, H1_2, uint16_to_float16)
 DO_ZPZ_FP(sve_ucvt_sh, uint32_t, H1_4, uint32_to_float16)
 DO_ZPZ_FP(sve_ucvt_ss, uint32_t, H1_4, uint32_to_float32)
-DO_ZPZ_FP(sve_ucvt_sd, uint64_t,     , uint32_to_float64)
-DO_ZPZ_FP(sve_ucvt_dh, uint64_t,     , uint64_to_float16)
-DO_ZPZ_FP(sve_ucvt_ds, uint64_t,     , uint64_to_float32)
-DO_ZPZ_FP(sve_ucvt_dd, uint64_t,     , uint64_to_float64)
+DO_ZPZ_FP(sve_ucvt_sd, uint64_t, H1_8, uint32_to_float64)
+DO_ZPZ_FP(sve_ucvt_dh, uint64_t, H1_8, uint64_to_float16)
+DO_ZPZ_FP(sve_ucvt_ds, uint64_t, H1_8, uint64_to_float32)
+DO_ZPZ_FP(sve_ucvt_dd, uint64_t, H1_8, uint64_to_float64)
 
 static int16_t do_float16_logb_as_int(float16 a, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static int64_t do_float64_logb_as_int(float64 a, float_status *s)
 
 DO_ZPZ_FP(flogb_h, float16, H1_2, do_float16_logb_as_int)
 DO_ZPZ_FP(flogb_s, float32, H1_4, do_float32_logb_as_int)
-DO_ZPZ_FP(flogb_d, float64,     , do_float64_logb_as_int)
+DO_ZPZ_FP(flogb_d, float64, H1_8, do_float64_logb_as_int)
 
 #undef DO_ZPZ_FP
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg,               \
 #define DO_FPCMP_PPZZ_S(NAME, OP) \
     DO_FPCMP_PPZZ(NAME##_s, float32, H1_4, OP)
 #define DO_FPCMP_PPZZ_D(NAME, OP) \
-    DO_FPCMP_PPZZ(NAME##_d, float64,     , OP)
+    DO_FPCMP_PPZZ(NAME##_d, float64, H1_8, OP)
 
 #define DO_FPCMP_PPZZ_ALL(NAME, OP) \
     DO_FPCMP_PPZZ_H(NAME, OP)   \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg,            \
 #define DO_FPCMP_PPZ0_S(NAME, OP) \
     DO_FPCMP_PPZ0(NAME##_s, float32, H1_4, OP)
 #define DO_FPCMP_PPZ0_D(NAME, OP) \
-    DO_FPCMP_PPZ0(NAME##_d, float64,     , OP)
+    DO_FPCMP_PPZ0(NAME##_d, float64, H1_8, OP)
 
 #define DO_FPCMP_PPZ0_ALL(NAME, OP) \
     DO_FPCMP_PPZ0_H(NAME, OP)   \
@@ -XXX,XX +XXX,XX @@ DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
 DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t,  int8_t)
 DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
 DO_LD_PRIM_1(ld1bss, H1_4, uint32_t,  int8_t)
-DO_LD_PRIM_1(ld1bdu,     , uint64_t, uint8_t)
-DO_LD_PRIM_1(ld1bds,     , uint64_t,  int8_t)
+DO_LD_PRIM_1(ld1bdu, H1_8, uint64_t, uint8_t)
+DO_LD_PRIM_1(ld1bds, H1_8, uint64_t,  int8_t)
 
 #define DO_ST_PRIM_1(NAME, H, TE, TM)                   \
     DO_ST_HOST(st1##NAME, H, TE, TM, stb_p)             \
@@ -XXX,XX +XXX,XX @@ DO_LD_PRIM_1(ld1bds,     , uint64_t,  int8_t)
 DO_ST_PRIM_1(bb,   H1,  uint8_t, uint8_t)
 DO_ST_PRIM_1(bh, H1_2, uint16_t, uint8_t)
 DO_ST_PRIM_1(bs, H1_4, uint32_t, uint8_t)
-DO_ST_PRIM_1(bd,     , uint64_t, uint8_t)
+DO_ST_PRIM_1(bd, H1_8, uint64_t, uint8_t)
 
 #define DO_LD_PRIM_2(NAME, H, TE, TM, LD) \
     DO_LD_HOST(ld1##NAME##_be, H, TE, TM, LD##_be_p)    \
@@ -XXX,XX +XXX,XX @@ DO_ST_PRIM_1(bd,     , uint64_t, uint8_t)
 DO_LD_PRIM_2(hh,  H1_2, uint16_t, uint16_t, lduw)
 DO_LD_PRIM_2(hsu, H1_4, uint32_t, uint16_t, lduw)
 DO_LD_PRIM_2(hss, H1_4, uint32_t,  int16_t, lduw)
-DO_LD_PRIM_2(hdu,     , uint64_t, uint16_t, lduw)
-DO_LD_PRIM_2(hds,     , uint64_t,  int16_t, lduw)
+DO_LD_PRIM_2(hdu, H1_8, uint64_t, uint16_t, lduw)
+DO_LD_PRIM_2(hds, H1_8, uint64_t,  int16_t, lduw)
 
 DO_ST_PRIM_2(hh, H1_2, uint16_t, uint16_t, stw)
 DO_ST_PRIM_2(hs, H1_4, uint32_t, uint16_t, stw)
-DO_ST_PRIM_2(hd,     , uint64_t, uint16_t, stw)
+DO_ST_PRIM_2(hd, H1_8, uint64_t, uint16_t, stw)
 
 DO_LD_PRIM_2(ss,  H1_4, uint32_t, uint32_t, ldl)
-DO_LD_PRIM_2(sdu,     , uint64_t, uint32_t, ldl)
-DO_LD_PRIM_2(sds,     , uint64_t,  int32_t, ldl)
+DO_LD_PRIM_2(sdu, H1_8, uint64_t, uint32_t, ldl)
+DO_LD_PRIM_2(sds, H1_8, uint64_t,  int32_t, ldl)
 
 DO_ST_PRIM_2(ss, H1_4, uint32_t, uint32_t, stl)
-DO_ST_PRIM_2(sd,     , uint64_t, uint32_t, stl)
+DO_ST_PRIM_2(sd, H1_8, uint64_t, uint32_t, stl)
 
-DO_LD_PRIM_2(dd,     , uint64_t, uint64_t, ldq)
-DO_ST_PRIM_2(dd,     , uint64_t, uint64_t, stq)
+DO_LD_PRIM_2(dd, H1_8, uint64_t, uint64_t, ldq)
+DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq)
 
 #undef DO_LD_TLB
 #undef DO_ST_TLB
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
 
 DO_FCVTNT(sve_bfcvtnt,    uint32_t, uint16_t, H1_4, H1_2, float32_to_bfloat16)
 DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16)
-DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t,     , H1_4, float64_to_float32)
+DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t, H1_8, H1_4, float64_to_float32)
 
 #define DO_FCVTLT(NAME, TYPEW, TYPEN, HW, HN, OP)                             \
 void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
 }
 
 DO_FCVTLT(sve2_fcvtlt_hs, uint32_t, uint16_t, H1_4, H1_2, sve_f16_to_f32)
-DO_FCVTLT(sve2_fcvtlt_sd, uint64_t, uint32_t,     , H1_4, float32_to_float64)
+DO_FCVTLT(sve2_fcvtlt_sd, uint64_t, uint32_t, H1_8, H1_4, float32_to_float64)
 
 #undef DO_FCVTLT
 #undef DO_FCVTNT
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_DOT_IDX(gvec_sdot_idx_b, int32_t, int8_t, int8_t, H4)
 DO_DOT_IDX(gvec_udot_idx_b, uint32_t, uint8_t, uint8_t, H4)
 DO_DOT_IDX(gvec_sudot_idx_b, int32_t, int8_t, uint8_t, H4)
 DO_DOT_IDX(gvec_usdot_idx_b, int32_t, uint8_t, int8_t, H4)
-DO_DOT_IDX(gvec_sdot_idx_h, int64_t, int16_t, int16_t, )
-DO_DOT_IDX(gvec_udot_idx_h, uint64_t, uint16_t, uint16_t, )
+DO_DOT_IDX(gvec_sdot_idx_h, int64_t, int16_t, int16_t, H8)
+DO_DOT_IDX(gvec_udot_idx_h, uint64_t, uint16_t, uint16_t, H8)
 
 void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
                          void *vfpst, uint32_t desc)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
 
 DO_MUL_IDX(gvec_mul_idx_h, uint16_t, H2)
 DO_MUL_IDX(gvec_mul_idx_s, uint32_t, H4)
-DO_MUL_IDX(gvec_mul_idx_d, uint64_t, )
+DO_MUL_IDX(gvec_mul_idx_d, uint64_t, H8)
 
 #undef DO_MUL_IDX
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)   \
 
 DO_MLA_IDX(gvec_mla_idx_h, uint16_t, +, H2)
 DO_MLA_IDX(gvec_mla_idx_s, uint32_t, +, H4)
-DO_MLA_IDX(gvec_mla_idx_d, uint64_t, +,   )
+DO_MLA_IDX(gvec_mla_idx_d, uint64_t, +, H8)
 
 DO_MLA_IDX(gvec_mls_idx_h, uint16_t, -, H2)
 DO_MLA_IDX(gvec_mls_idx_s, uint32_t, -, H4)
-DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -,   )
+DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -, H8)
 
 #undef DO_MLA_IDX
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
 
 DO_FMUL_IDX(gvec_fmul_idx_h, nop, float16, H2)
 DO_FMUL_IDX(gvec_fmul_idx_s, nop, float32, H4)
-DO_FMUL_IDX(gvec_fmul_idx_d, nop, float64, )
+DO_FMUL_IDX(gvec_fmul_idx_d, nop, float64, H8)
 
 /*
  * Non-fused multiply-accumulate operations, for Neon. NB that unlike
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
 
 DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2)
 DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
-DO_FMLA_IDX(gvec_fmla_idx_d, float64, )
+DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8)
 
 #undef DO_FMLA_IDX
 
-- 
2.20.1

MVE has an FPSCR.QC bit similar to the A-profile Neon one; when MVE
is implemented make the bit writeable, both in the generic "load and
store FPSCR" helper functions and in the code for handling the NZCVQC
sysreg which we had previously left as "TODO when we implement MVE".

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-3-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 30 +++++++++++++++++++++---------
 target/arm/vfp_helper.c    |  3 ++-
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
     {
         TCGv_i32 fpscr;
         tmp = loadfn(s, opaque);
-        /*
-         * TODO: when we implement MVE, write the QC bit.
-         * For non-MVE, QC is RES0.
-         */
+        if (dc_isar_feature(aa32_mve, s)) {
+            /* QC is only present for MVE; otherwise RES0 */
+            TCGv_i32 qc = tcg_temp_new_i32();
+            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
+            /*
+             * The 4 vfp.qc[] fields need only be "zero" vs "non-zero";
+             * here writing the same value into all elements is simplest.
+             */
+            tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
+                                 16, 16, qc);
+        }
         tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
         fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
         tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         break;
     }
 
+    if (regno == ARM_VFP_FPSCR_NZCVQC && !dc_isar_feature(aa32_mve, s)) {
+        /* QC is RES0 without MVE, so NZCVQC simplifies to NZCV */
+        regno = QEMU_VFP_FPSCR_NZCV;
+    }
+
     switch (regno) {
     case ARM_VFP_FPSCR:
         tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         storefn(s, opaque, tmp);
         break;
     case ARM_VFP_FPSCR_NZCVQC:
-        /*
-         * TODO: MVE has a QC bit, which we probably won't store
-         * in the xregs[] field. For non-MVE, where QC is RES0,
-         * we can just fall through to the FPSCR_NZCV case.
-         */
+        tmp = tcg_temp_new_i32();
+        gen_helper_vfp_get_fpscr(tmp, cpu_env);
+        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
+        storefn(s, opaque, tmp);
+        break;
     case QEMU_VFP_FPSCR_NZCV:
         /*
          * Read just NZCV; this is a special case to avoid the
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
                                      FPCR_LTPSIZE_LENGTH);
     }
 
-    if (arm_feature(env, ARM_FEATURE_NEON)) {
+    if (arm_feature(env, ARM_FEATURE_NEON) ||
+        cpu_isar_feature(aa32_mve, cpu)) {
         /*
          * The bit we set within fpscr_q is arbitrary; the register as a
          * whole being zero/non-zero is what counts.
-- 
2.20.1

When MVE is supported, the VPR register has a place on the exception
stack frame in a previously reserved slot just above the FPSCR.
It must also be zeroed in various situations when we invalidate
FPU context.

Update the code which handles the stack frames (exception entry and
exit code, VLLDM, and VLSTM) to save/restore VPR.

Update code which invalidates FP registers (mostly also exception
entry and exit code, but also VSCCLRM and the code in
full_vfp_access_check() that corresponds to the ExecuteFPCheck()
pseudocode) to zero VPR.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-4-peter.maydell@linaro.org
---
 target/arm/m_helper.c         | 54 +++++++++++++++++++++++++++++------
 target/arm/translate-m-nocp.c |  5 +++-
 target/arm/translate-vfp.c    |  9 ++++--
 3 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
             uint32_t shi = extract64(dn, 32, 32);
 
             if (i >= 16) {
-                faddr += 8; /* skip the slot for the FPSCR */
+                faddr += 8; /* skip the slot for the FPSCR/VPR */
             }
             stacked_ok = stacked_ok &&
                 v7m_stack_write(cpu, faddr, slo, mmu_idx, STACK_LAZYFP) &&
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
         stacked_ok = stacked_ok &&
             v7m_stack_write(cpu, fpcar + 0x40,
                             vfp_get_fpscr(env), mmu_idx, STACK_LAZYFP);
+        if (cpu_isar_feature(aa32_mve, cpu)) {
+            stacked_ok = stacked_ok &&
+                v7m_stack_write(cpu, fpcar + 0x44,
+                                env->v7m.vpr, mmu_idx, STACK_LAZYFP);
+        }
     }
 
     /*
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
     env->v7m.fpccr[is_secure] &= ~R_V7M_FPCCR_LSPACT_MASK;
 
     if (ts) {
-        /* Clear s0 to s31 and the FPSCR */
+        /* Clear s0 to s31 and the FPSCR and VPR */
         int i;
 
         for (i = 0; i < 32; i += 2) {
             *aa32_vfp_dreg(env, i / 2) = 0;
         }
         vfp_set_fpscr(env, 0);
+        if (cpu_isar_feature(aa32_mve, cpu)) {
+            env->v7m.vpr = 0;
+        }
     }
     /*
-     * Otherwise s0 to s15 and FPSCR are UNKNOWN; we choose to leave them
+     * Otherwise s0 to s15, FPSCR and VPR are UNKNOWN; we choose to leave them
      * unchanged.
      */
 }
@@ -XXX,XX +XXX,XX @@ static void v7m_update_fpccr(CPUARMState *env, uint32_t frameptr,
 void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
 {
     /* fptr is the value of Rn, the frame pointer we store the FP regs to */
+    ARMCPU *cpu = env_archcpu(env);
     bool s = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
     bool lspact = env->v7m.fpccr[s] & R_V7M_FPCCR_LSPACT_MASK;
     uintptr_t ra = GETPC();
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
             cpu_stl_data_ra(env, faddr + 4, shi, ra);
         }
         cpu_stl_data_ra(env, fptr + 0x40, vfp_get_fpscr(env), ra);
+        if (cpu_isar_feature(aa32_mve, cpu)) {
+            cpu_stl_data_ra(env, fptr + 0x44, env->v7m.vpr, ra);
+        }
 
         /*
-         * If TS is 0 then s0 to s15 and FPSCR are UNKNOWN; we choose to
+         * If TS is 0 then s0 to s15, FPSCR and VPR are UNKNOWN; we choose to
          * leave them unchanged, matching our choice in v7m_preserve_fp_state.
          */
         if (ts) {
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
                 *aa32_vfp_dreg(env, i / 2) = 0;
             }
             vfp_set_fpscr(env, 0);
+            if (cpu_isar_feature(aa32_mve, cpu)) {
+                env->v7m.vpr = 0;
+            }
         }
     } else {
         v7m_update_fpccr(env, fptr, false);
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
 
 void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
 {
+    ARMCPU *cpu = env_archcpu(env);
     uintptr_t ra = GETPC();
 
     /* fptr is the value of Rn, the frame pointer we load the FP regs from */
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
             uint32_t faddr = fptr + 4 * i;
 
             if (i >= 16) {
-                faddr += 8; /* skip the slot for the FPSCR */
+                faddr += 8; /* skip the slot for the FPSCR and VPR */
             }
 
             slo = cpu_ldl_data_ra(env, faddr, ra);
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
         }
         fpscr = cpu_ldl_data_ra(env, fptr + 0x40, ra);
         vfp_set_fpscr(env, fpscr);
+        if (cpu_isar_feature(aa32_mve, cpu)) {
+            env->v7m.vpr = cpu_ldl_data_ra(env, fptr + 0x44, ra);
+        }
     }
 
     env->v7m.control[M_REG_S] |= R_V7M_CONTROL_FPCA_MASK;
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
                     uint32_t shi = extract64(dn, 32, 32);
 
                     if (i >= 16) {
-                        faddr += 8; /* skip the slot for the FPSCR */
+                        faddr += 8; /* skip the slot for the FPSCR and VPR */
                     }
                     stacked_ok = stacked_ok &&
                         v7m_stack_write(cpu, faddr, slo,
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
                 stacked_ok = stacked_ok &&
                     v7m_stack_write(cpu, frameptr + 0x60,
                                     vfp_get_fpscr(env), mmu_idx, STACK_NORMAL);
+                if (cpu_isar_feature(aa32_mve, cpu)) {
+                    stacked_ok = stacked_ok &&
+                        v7m_stack_write(cpu, frameptr + 0x64,
+                                        env->v7m.vpr, mmu_idx, STACK_NORMAL);
+                }
                 if (cpacr_pass) {
                     for (i = 0; i < ((framesize == 0xa8) ? 32 : 16); i += 2) {
                         *aa32_vfp_dreg(env, i / 2) = 0;
                     }
                     vfp_set_fpscr(env, 0);
+                    if (cpu_isar_feature(aa32_mve, cpu)) {
+                        env->v7m.vpr = 0;
+                    }
                 }
             } else {
                 /* Lazy stacking enabled, save necessary info to stack later */
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
                     v7m_exception_taken(cpu, excret, true, false);
                 }
             }
-            /* Clear s0..s15 and FPSCR; TODO also VPR when MVE is implemented */
+            /* Clear s0..s15, FPSCR and VPR */
             int i;
 
             for (i = 0; i < 16; i += 2) {
                 *aa32_vfp_dreg(env, i / 2) = 0;
             }
             vfp_set_fpscr(env, 0);
+            if (cpu_isar_feature(aa32_mve, cpu)) {
+                env->v7m.vpr = 0;
+            }
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
                     uint32_t faddr = frameptr + 0x20 + 4 * i;
 
                     if (i >= 16) {
-                        faddr += 8; /* Skip the slot for the FPSCR */
+                        faddr += 8; /* Skip the slot for the FPSCR and VPR */
                     }
 
                     pop_ok = pop_ok &&
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
                 if (pop_ok) {
                     vfp_set_fpscr(env, fpscr);
                 }
+                if (cpu_isar_feature(aa32_mve, cpu)) {
+                    pop_ok = pop_ok &&
+                        v7m_stack_read(cpu, &env->v7m.vpr,
+                                       frameptr + 0x64, mmu_idx);
+                }
                 if (!pop_ok) {
                     /*
                      * These regs are 0 if security extension present;
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
                         *aa32_vfp_dreg(env, i / 2) = 0;
                     }
                     vfp_set_fpscr(env, 0);
+                    if (cpu_isar_feature(aa32_mve, cpu)) {
+                        env->v7m.vpr = 0;
+                    }
                 }
             }
         }
diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-m-nocp.c
+++ b/target/arm/translate-m-nocp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
         btmreg++;
     }
     assert(btmreg == topreg + 1);
-    /* TODO: when MVE is implemented, zero VPR here */
+    if (dc_isar_feature(aa32_mve, s)) {
+        TCGv_i32 z32 = tcg_const_i32(0);
+        store_cpu_field(z32, v7m.vpr);
+    }
     return true;
 }
 
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
 
         if (s->v7m_new_fp_ctxt_needed) {
             /*
-             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
-             * and the FPSCR.
+             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA,
+             * the FPSCR, and VPR.
              */
             TCGv_i32 control, fpscr;
             uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
             fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
             gen_helper_vfp_set_fpscr(cpu_env, fpscr);
             tcg_temp_free_i32(fpscr);
+            if (dc_isar_feature(aa32_mve, s)) {
+                TCGv_i32 z32 = tcg_const_i32(0);
+                store_cpu_field(z32, v7m.vpr);
+            }
+
             /*
              * We don't need to arrange to end the TB, because the only
              * parts of FPSCR which we cache in the TB flags are the VECLEN
-- 
2.20.1

On A-profile, PSR bits [15:10][26:25] are always the IT state bits.
On M-profile, some of the reserved encodings of the IT state are used
to instead indicate partial progress through instructions that were
interrupted partway through by an exception and can be resumed.

These resumable instructions fall into two categories:

(1) load/store multiple instructions, where these bits are called
"ICI" and specify the register in the ldm/stm list where execution
should resume.  (Specifically: LDM, STM, VLDM, VSTM, VLLDM, VLSTM,
CLRM, VSCCLRM.)

(2) MVE instructions subject to beatwise execution, where these bits
are called "ECI" and specify which beats in this and possibly also
the following MVE insn have been executed.

There are also a few insns (LE, LETP, and BKPT) which do not use the
ICI/ECI bits but must leave them alone.

Otherwise, we should raise an INVSTATE UsageFault for any attempt to
execute an insn with non-zero ICI/ECI bits.

So far we have been able to ignore ECI/ICI, because the architecture
allows the IMPDEF choice of "always restart load/store multiple from
the beginning regardless of ICI state", so the only thing we have
been missing is that we don't raise the INVSTATE fault for bad guest
code.  However, MVE requires that we honour ECI bits and do not
rexecute beats of an insn that have already been executed.

Add the support in the decoder for handling ECI/ICI:
 * identify the ECI/ICI case in the CONDEXEC TB flags
 * when a load/store multiple insn succeeds, it updates the ECI/ICI
   state (both in DisasContext and in the CPU state), and sets a flag
   to say that the ECI/ICI state was handled
 * if we find that the insn we just decoded did not handle the
   ECI/ICI state, we delete all the code that we just generated for
   it and instead emit the code to raise the INVFAULT.  This allows
   us to avoid having to update every non-MVE non-LDM/STM insn to
   make it check for "is ECI/ICI set?".

We continue with our existing IMPDEF choice of not caring about the
ICI state for the load/store multiples and simply restarting them
from the beginning.  Because we don't allow interrupts in the middle
of an insn, the only way we would see this state is if the guest set
ICI manually on return from an exception handler, so it's a corner
case which doesn't merit optimisation.

ICI update for LDM/STM is simple -- it always zeroes the state.  ECI
update for MVE beatwise insns will be a little more complex, since
the ECI state may include information for the following insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-5-peter.maydell@linaro.org
---
 target/arm/translate-a32.h    |   1 +
 target/arm/translate.h        |   9 +++
 target/arm/translate-m-nocp.c |  11 ++++
 target/arm/translate-vfp.c    |   6 ++
 target/arm/translate.c        | 111 ++++++++++++++++++++++++++++++++--
 5 files changed, 133 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@ long vfp_reg_offset(bool dp, unsigned reg);
 long neon_full_reg_offset(unsigned reg);
 long neon_element_offset(int reg, int element, MemOp memop);
 void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
+void clear_eci_state(DisasContext *s);
 
 static inline TCGv_i32 load_cpu_offset(int offset)
 {
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     /* Thumb-2 conditional execution bits.  */
     int condexec_mask;
     int condexec_cond;
+    /* M-profile ECI/ICI exception-continuable instruction state */
+    int eci;
+    /*
+     * trans_ functions for insns which are continuable should set this true
+     * after decode (ie after any UNDEF checks)
+     */
+    bool eci_handled;
+    /* TCG op to rewind to if this turns out to be an invalid ECI state */
+    TCGOp *insn_eci_rewind;
     int thumb;
     int sctlr_b;
     MemOp be_data;
diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-m-nocp.c
+++ b/target/arm/translate-m-nocp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
         unallocated_encoding(s);
         return true;
     }
+
+    s->eci_handled = true;
+
     /* If no fpu, NOP. */
     if (!dc_isar_feature(aa32_vfp, s)) {
+        clear_eci_state(s);
         return true;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
     }
     tcg_temp_free_i32(fptr);
 
+    clear_eci_state(s);
+
     /* End the TB, because we have updated FP control bits */
     s->base.is_jmp = DISAS_UPDATE_EXIT;
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
         return true;
     }
 
+    s->eci_handled = true;
+
     if (!dc_isar_feature(aa32_vfp_simd, s)) {
         /* NOP if we have neither FP nor MVE */
+        clear_eci_state(s);
         return true;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
         TCGv_i32 z32 = tcg_const_i32(0);
         store_cpu_field(z32, v7m.vpr);
     }
+
+    clear_eci_state(s);
     return true;
 }
 
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
         return false;
     }
 
+    s->eci_handled = true;
+
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
         tcg_temp_free_i32(addr);
     }
 
+    clear_eci_state(s);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
         return false;
     }
 
+    s->eci_handled = true;
+
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
         tcg_temp_free_i32(addr);
     }
 
+    clear_eci_state(s);
     return true;
 }
 
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline bool is_singlestepping(DisasContext *s)
     return s->base.singlestep_enabled || s->ss_active;
 }
 
+void clear_eci_state(DisasContext *s)
+{
+    /*
+     * Clear any ECI/ICI state: used when a load multiple/store
+     * multiple insn executes.
+     */
+    if (s->eci) {
+        TCGv_i32 tmp = tcg_const_i32(0);
+        store_cpu_field(tmp, condexec_bits);
+        s->eci = 0;
+    }
+}
+
 static void gen_smul_dual(TCGv_i32 a, TCGv_i32 b)
 {
     TCGv_i32 tmp1 = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static bool trans_BKPT(DisasContext *s, arg_BKPT *a)
     if (!ENABLE_ARCH_5) {
         return false;
     }
+    /* BKPT is OK with ECI set and leaves it untouched */
+    s->eci_handled = true;
     if (arm_dc_feature(s, ARM_FEATURE_M) &&
         semihosting_enabled() &&
 #ifndef CONFIG_USER_ONLY
@@ -XXX,XX +XXX,XX @@ static bool op_stm(DisasContext *s, arg_ldst_block *a, int min_n)
         return true;
     }
 
+    s->eci_handled = true;
+
     addr = op_addr_block_pre(s, a, n);
     mem_idx = get_mem_index(s);
 
@@ -XXX,XX +XXX,XX @@ static bool op_stm(DisasContext *s, arg_ldst_block *a, int min_n)
     }
 
     op_addr_block_post(s, a, addr, n);
+    clear_eci_state(s);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_ldm(DisasContext *s, arg_ldst_block *a, int min_n)
         return true;
     }
 
+    s->eci_handled = true;
+
     addr = op_addr_block_pre(s, a, n);
     mem_idx = get_mem_index(s);
     loaded_base = false;
@@ -XXX,XX +XXX,XX @@ static bool do_ldm(DisasContext *s, arg_ldst_block *a, int min_n)
         /* Must exit loop to check un-masked IRQs */
         s->base.is_jmp = DISAS_EXIT;
     }
+    clear_eci_state(s);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_CLRM(DisasContext *s, arg_CLRM *a)
         return false;
     }
 
+    s->eci_handled = true;
+
     zero = tcg_const_i32(0);
     for (i = 0; i < 15; i++) {
         if (extract32(a->list, i, 1)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_CLRM(DisasContext *s, arg_CLRM *a)
         tcg_temp_free_i32(maskreg);
     }
     tcg_temp_free_i32(zero);
+    clear_eci_state(s);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_LE(DisasContext *s, arg_LE *a)
         return false;
     }
 
+    /* LE/LETP is OK with ECI set and leaves it untouched */
+    s->eci_handled = true;
+
     if (!a->f) {
         /* Not loop-forever. If LR <= 1 this is the last loop: do nothing. */
         arm_gen_condlabel(s);
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     dc->thumb = EX_TBFLAG_AM32(tb_flags, THUMB);
     dc->be_data = EX_TBFLAG_ANY(tb_flags, BE_DATA) ? MO_BE : MO_LE;
     condexec = EX_TBFLAG_AM32(tb_flags, CONDEXEC);
-    dc->condexec_mask = (condexec & 0xf) << 1;
-    dc->condexec_cond = condexec >> 4;
+    /*
+     * the CONDEXEC TB flags are CPSR bits [15:10][26:25]. On A-profile this
+     * is always the IT bits. On M-profile, some of the reserved encodings
+     * of IT are used instead to indicate either ICI or ECI, which
+     * indicate partial progress of a restartable insn that was interrupted
+     * partway through by an exception:
+     *  * if CONDEXEC[3:0] != 0b0000 : CONDEXEC is IT bits
+     *  * if CONDEXEC[3:0] == 0b0000 : CONDEXEC is ICI or ECI bits
+     * In all cases CONDEXEC == 0 means "not in IT block or restartable
+     * insn, behave normally".
+     */
+    dc->eci = dc->condexec_mask = dc->condexec_cond = 0;
+    dc->eci_handled = false;
+    dc->insn_eci_rewind = NULL;
+    if (condexec & 0xf) {
+        dc->condexec_mask = (condexec & 0xf) << 1;
+        dc->condexec_cond = condexec >> 4;
+    } else {
+        if (arm_feature(env, ARM_FEATURE_M)) {
+            dc->eci = condexec >> 4;
+        }
+    }
 
     core_mmu_idx = EX_TBFLAG_ANY(tb_flags, MMUIDX);
     dc->mmu_idx = core_to_arm_mmu_idx(env, core_mmu_idx);
@@ -XXX,XX +XXX,XX @@ static void arm_tr_tb_start(DisasContextBase *dcbase, CPUState *cpu)
 static void arm_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
 {
     DisasContext *dc = container_of(dcbase, DisasContext, base);
+    /*
+     * The ECI/ICI bits share PSR bits with the IT bits, so we
+     * need to reconstitute the bits from the split-out DisasContext
+     * fields here.
+     */
+    uint32_t condexec_bits;
 
-    tcg_gen_insn_start(dc->base.pc_next,
-                       (dc->condexec_cond << 4) | (dc->condexec_mask >> 1),
-                       0);
+    if (dc->eci) {
+        condexec_bits = dc->eci << 4;
+    } else {
+        condexec_bits = (dc->condexec_cond << 4) | (dc->condexec_mask >> 1);
+    }
+    tcg_gen_insn_start(dc->base.pc_next, condexec_bits, 0);
     dc->insn_start = tcg_last_op();
 }
 
@@ -XXX,XX +XXX,XX @@ static void thumb_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
     }
     dc->insn = insn;
 
+    if (dc->eci) {
+        /*
+         * For M-profile continuable instructions, ECI/ICI handling
+         * falls into these cases:
+         *  - interrupt-continuable instructions
+         *     These are the various load/store multiple insns (both
+         *     integer and fp). The ICI bits indicate the register
+         *     where the load/store can resume. We make the IMPDEF
+         *     choice to always do "instruction restart", ie ignore
+         *     the ICI value and always execute the ldm/stm from the
+         *     start. So all we need to do is zero PSR.ICI if the
+         *     insn executes.
+         *  - MVE instructions subject to beat-wise execution
+         *     Here the ECI bits indicate which beats have already been
+         *     executed, and we must honour this. Each insn of this
+         *     type will handle it correctly. We will update PSR.ECI
+         *     in the helper function for the insn (some ECI values
+         *     mean that the following insn also has been partially
+         *     executed).
+         *  - Special cases which don't advance ECI
+         *     The insns LE, LETP and BKPT leave the ECI/ICI state
+         *     bits untouched.
+         *  - all other insns (the common case)
+         *     Non-zero ECI/ICI means an INVSTATE UsageFault.
+         *     We place a rewind-marker here. Insns in the previous
+         *     three categories will set a flag in the DisasContext.
+         *     If the flag isn't set after we call disas_thumb_insn()
+         *     or disas_thumb2_insn() then we know we have a "some other
+         *     insn" case. We will rewind to the marker (ie throwing away
+         *     all the generated code) and instead emit "take exception".
+         */
+        dc->insn_eci_rewind = tcg_last_op();
+    }
+
     if (dc->condexec_mask && !thumb_insn_is_unconditional(dc, insn)) {
         uint32_t cond = dc->condexec_cond;
 
@@ -XXX,XX +XXX,XX @@ static void thumb_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
         }
     }
 
+    if (dc->eci && !dc->eci_handled) {
+        /*
+         * Insn wasn't valid for ECI/ICI at all: undo what we
+         * just generated and instead emit an exception
+         */
+        tcg_remove_ops_after(dc->insn_eci_rewind);
+        dc->condjmp = 0;
+        gen_exception_insn(dc, dc->pc_curr, EXCP_INVSTATE, syn_uncategorized(),
+                           default_exception_el(dc));
+    }
+
     arm_post_translate_insn(dc);
 
     /* Thumb is a variable-length ISA.  Stop translation when the next insn
-- 
2.20.1

In commit a3494d4671797c we reworked the M-profile handling of its
checks for when the NOCP exception should be raised because the FPU
is disabled, so that (in line with the architecture) the NOCP check
is done early over a large range of the encoding space, and takes
precedence over UNDEF exceptions.  As part of this, we removed the
code from full_vfp_access_check() which raised an exception there for
M-profile with the FPU disabled, because it was no longer reachable.

For MVE, some instructions which are outside the "coprocessor space"
region of the encoding space must nonetheless do "is the FPU enabled"
checks and possibly raise a NOCP exception.  (In particular this
covers the MVE-specific low-overhead branch insns LCTP, DLSTP and
WLSTP.) To support these insns, reinstate the code in
full_vfp_access_check(), so that their trans functions can call
vfp_access_check() and get the correct behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-6-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static void gen_preserve_fp_state(DisasContext *s)
 static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
 {
     if (s->fp_excp_el) {
-        /* M-profile handled this earlier, in disas_m_nocp() */
-        assert (!arm_dc_feature(s, ARM_FEATURE_M));
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                           syn_fp_access_trap(1, 0xe, false),
-                           s->fp_excp_el);
+        if (arm_dc_feature(s, ARM_FEATURE_M)) {
+            /*
+             * M-profile mostly catches the "FPU disabled" case early, in
+             * disas_m_nocp(), but a few insns (eg LCTP, WLSTP, DLSTP)
+             * which do coprocessor-checks are outside the large ranges of
+             * the encoding space handled by the patterns in m-nocp.decode,
+             * and for them we may need to raise NOCP here.
+             */
+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                               syn_uncategorized(), s->fp_excp_el);
+        } else {
+            gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+                               syn_fp_access_trap(1, 0xe, false),
+                               s->fp_excp_el);
+        }
         return false;
     }
 
-- 
2.20.1

Implement the MVE LCTP instruction.

We put its decode and implementation with the other
low-overhead-branch insns because although it is only present if MVE
is implemented it is logically in the same group as the other LOB
insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-7-peter.maydell@linaro.org
---
 target/arm/t32.decode  |  2 ++
 target/arm/translate.c | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

Implement the MVE WLSTP insn; this is like the existing WLS insn,
except that it specifies a size value which is used to set
FPSCR.LTPSIZE.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-8-peter.maydell@linaro.org
---
 target/arm/t32.decode  |  8 ++++++--
 target/arm/translate.c | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 3 deletions(-)

Implement the MVE DLSTP insn; this is like the existing DLS
insn, except that it must do an FPU access check and it
sets LTPSIZE to the value specified in the insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-9-peter.maydell@linaro.org
---
 target/arm/t32.decode  |  9 ++++++---
 target/arm/translate.c | 23 +++++++++++++++++++++--
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@ BL               1111 0. .......... 11.1 ............         @branch24
     # LE and WLS immediate
     %lob_imm 1:10 11:1 !function=times_2
 
-    DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001
+    DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001 size=4
     WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm size=4
     {
       LE         1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
       # This is WLSTP
       WLS        1111 0 0000 0 size:2 rn:4 1100 . .......... 1 imm=%lob_imm
     }
-
-    LCTP         1111 0 0000 000     1111 1110 0000 0000 0001
+    {
+      LCTP       1111 0 0000 000     1111 1110 0000 0000 0001
+      # This is DLSTP
+      DLS        1111 0 0000 0 size:2 rn:4 1110 0000 0000 0001
+    }
   ]
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_DLS(DisasContext *s, arg_DLS *a)
         return false;
     }
     if (a->rn == 13 || a->rn == 15) {
-        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        /*
+         * For DLSTP rn == 15 is a related encoding (LCTP); the
+         * other cases caught by this condition are all
+         * CONSTRAINED UNPREDICTABLE: we choose to UNDEF
+         */
         return false;
     }
 
-    /* Not a while loop, no tail predication: just set LR to the count */
+    if (a->size != 4) {
+        /* DLSTP */
+        if (!dc_isar_feature(aa32_mve, s)) {
+            return false;
+        }
+        if (!vfp_access_check(s)) {
+            return true;
+        }
+    }
+
+    /* Not a while loop: set LR to the count, and set LTPSIZE for DLSTP */
     tmp = load_reg(s, a->rn);
     store_reg(s, 14, tmp);
+    if (a->size != 4) {
+        /* DLSTP: set FPSCR.LTPSIZE */
+        tmp = tcg_const_i32(a->size);
+        store_cpu_field(tmp, v7m.ltpsize);
+    }
     return true;
 }
 
-- 
2.20.1

Implement the MVE LETP insn.  This is like the existing LE loop-end
insn, but it must perform an FPU-enabled check, and on loop-exit it
resets LTPSIZE to 4.

To accommodate the requirement to do something on loop-exit, we drop
the use of condlabel and instead manage both the TB exits manually,
in the same way we already do in trans_WLS().

The other MVE-specific change to the LE insn is that we must raise an
INVSTATE UsageFault insn if LTPSIZE is not 4.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-10-peter.maydell@linaro.org
---
 target/arm/t32.decode  |   2 +-
 target/arm/translate.c | 104 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 97 insertions(+), 9 deletions(-)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@ BL               1111 0. .......... 11.1 ............         @branch24
     DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001 size=4
     WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm size=4
     {
-      LE         1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
+      LE         1111 0 0000 0 f:1 tp:1 1111 1100 . .......... 1 imm=%lob_imm
       # This is WLSTP
       WLS        1111 0 0000 0 size:2 rn:4 1100 . .......... 1 imm=%lob_imm
     }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_LE(DisasContext *s, arg_LE *a)
      * any faster.
      */
     TCGv_i32 tmp;
+    TCGLabel *loopend;
+    bool fpu_active;
 
     if (!dc_isar_feature(aa32_lob, s)) {
         return false;
     }
+    if (a->f && a->tp) {
+        return false;
+    }
+    if (s->condexec_mask) {
+        /*
+         * LE in an IT block is CONSTRAINED UNPREDICTABLE;
+         * we choose to UNDEF, because otherwise our use of
+         * gen_goto_tb(1) would clash with the use of TB exit 1
+         * in the dc->condjmp condition-failed codepath in
+         * arm_tr_tb_stop() and we'd get an assertion.
+         */
+        return false;
+    }
+    if (a->tp) {
+        /* LETP */
+        if (!dc_isar_feature(aa32_mve, s)) {
+            return false;
+        }
+        if (!vfp_access_check(s)) {
+            s->eci_handled = true;
+            return true;
+        }
+    }
 
     /* LE/LETP is OK with ECI set and leaves it untouched */
     s->eci_handled = true;
 
-    if (!a->f) {
-        /* Not loop-forever. If LR <= 1 this is the last loop: do nothing. */
-        arm_gen_condlabel(s);
-        tcg_gen_brcondi_i32(TCG_COND_LEU, cpu_R[14], 1, s->condlabel);
-        /* Decrement LR */
-        tmp = load_reg(s, 14);
-        tcg_gen_addi_i32(tmp, tmp, -1);
-        store_reg(s, 14, tmp);
+    /*
+     * With MVE, LTPSIZE might not be 4, and we must emit an INVSTATE
+     * UsageFault exception for the LE insn in that case. Note that we
+     * are not directly checking FPSCR.LTPSIZE but instead check the
+     * pseudocode LTPSIZE() function, which returns 4 if the FPU is
+     * not currently active (ie ActiveFPState() returns false). We
+     * can identify not-active purely from our TB state flags, as the
+     * FPU is active only if:
+     *  the FPU is enabled
+     *  AND lazy state preservation is not active
+     *  AND we do not need a new fp context (this is the ASPEN/FPCA check)
+     *
+     * Usually we don't need to care about this distinction between
+     * LTPSIZE and FPSCR.LTPSIZE, because the code in vfp_access_check()
+     * will either take an exception or clear the conditions that make
+     * the FPU not active. But LE is an unusual case of a non-FP insn
+     * that looks at LTPSIZE.
+     */
+    fpu_active = !s->fp_excp_el && !s->v7m_lspact && !s->v7m_new_fp_ctxt_needed;
+
+    if (!a->tp && dc_isar_feature(aa32_mve, s) && fpu_active) {
+        /* Need to do a runtime check for LTPSIZE != 4 */
+        TCGLabel *skipexc = gen_new_label();
+        tmp = load_cpu_field(v7m.ltpsize);
+        tcg_gen_brcondi_i32(TCG_COND_EQ, tmp, 4, skipexc);
+        tcg_temp_free_i32(tmp);
+        gen_exception_insn(s, s->pc_curr, EXCP_INVSTATE, syn_uncategorized(),
+                           default_exception_el(s));
+        gen_set_label(skipexc);
+    }
+
+    if (a->f) {
+        /* Loop-forever: just jump back to the loop start */
+        gen_jmp(s, read_pc(s) - a->imm);
+        return true;
+    }
+
+    /*
+     * Not loop-forever. If LR <= loop-decrement-value this is the last loop.
+     * For LE, we know at this point that LTPSIZE must be 4 and the
+     * loop decrement value is 1. For LETP we need to calculate the decrement
+     * value from LTPSIZE.
+     */
+    loopend = gen_new_label();
+    if (!a->tp) {
+        tcg_gen_brcondi_i32(TCG_COND_LEU, cpu_R[14], 1, loopend);
+        tcg_gen_addi_i32(cpu_R[14], cpu_R[14], -1);
+    } else {
+        /*
+         * Decrement by 1 << (4 - LTPSIZE). We need to use a TCG local
+         * so that decr stays live after the brcondi.
+         */
+        TCGv_i32 decr = tcg_temp_local_new_i32();
+        TCGv_i32 ltpsize = load_cpu_field(v7m.ltpsize);
+        tcg_gen_sub_i32(decr, tcg_constant_i32(4), ltpsize);
+        tcg_gen_shl_i32(decr, tcg_constant_i32(1), decr);
+        tcg_temp_free_i32(ltpsize);
+
+        tcg_gen_brcond_i32(TCG_COND_LEU, cpu_R[14], decr, loopend);
+
+        tcg_gen_sub_i32(cpu_R[14], cpu_R[14], decr);
+        tcg_temp_free_i32(decr);
     }
     /* Jump back to the loop start */
     gen_jmp(s, read_pc(s) - a->imm);
+
+    gen_set_label(loopend);
+    if (a->tp) {
+        /* Exits from tail-pred loops must reset LTPSIZE to 4 */
+        tmp = tcg_const_i32(4);
+        store_cpu_field(tmp, v7m.ltpsize);
+    }
+    /* End TB, continuing to following insn */
+    gen_jmp_tb(s, s->base.pc_next, 1);
     return true;
 }
 
-- 
2.20.1

Add the framework for decoding MVE insns, with the necessary new
files and the meson.build rules, but no actual content yet.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-11-peter.maydell@linaro.org
---
 target/arm/translate-a32.h |  1 +
 target/arm/mve.decode      | 20 ++++++++++++++++++++
 target/arm/translate-mve.c | 29 +++++++++++++++++++++++++++++
 target/arm/translate.c     |  1 +
 target/arm/meson.build     |  2 ++
 5 files changed, 53 insertions(+)
 create mode 100644 target/arm/mve.decode
 create mode 100644 target/arm/translate-mve.c

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@
 
 /* Prototypes for autogenerated disassembler functions */
 bool disas_m_nocp(DisasContext *dc, uint32_t insn);
+bool disas_mve(DisasContext *dc, uint32_t insn);
 bool disas_vfp(DisasContext *s, uint32_t insn);
 bool disas_vfp_uncond(DisasContext *s, uint32_t insn);
 bool disas_neon_dp(DisasContext *s, uint32_t insn);
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
+# M-profile MVE instruction descriptions
+#
+#  Copyright (c) 2021 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
+/*
+ *  ARM translation: M-profile MVE instructions
+ *
+ *  Copyright (c) 2021 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
+#include "exec/exec-all.h"
+#include "exec/gen-icount.h"
+#include "translate.h"
+#include "translate-a32.h"
+
+/* Include the generated decoder */
+#include "decode-mve.c.inc"
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
     if (disas_t32(s, insn) ||
         disas_vfp_uncond(s, insn) ||
         disas_neon_shared(s, insn) ||
+        disas_mve(s, insn) ||
         ((insn >> 28) == 0xe && disas_vfp(s, insn))) {
         return;
     }
diff --git a/target/arm/meson.build b/target/arm/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -XXX,XX +XXX,XX @@ gen = [
   decodetree.process('vfp.decode', extra_args: '--decode=disas_vfp'),
   decodetree.process('vfp-uncond.decode', extra_args: '--decode=disas_vfp_uncond'),
   decodetree.process('m-nocp.decode', extra_args: '--decode=disas_m_nocp'),
+  decodetree.process('mve.decode', extra_args: '--decode=disas_mve'),
   decodetree.process('a32.decode', extra_args: '--static-decode=disas_a32'),
   decodetree.process('a32-uncond.decode', extra_args: '--static-decode=disas_a32_uncond'),
   decodetree.process('t32.decode', extra_args: '--static-decode=disas_t32'),
@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
   'tlb_helper.c',
   'translate.c',
   'translate-m-nocp.c',
+  'translate-mve.c',
   'translate-neon.c',
   'translate-vfp.c',
   'vec_helper.c',
-- 
2.20.1

For MVE, we want to re-use the large data table from expand_pred_b().
Move the data table to vec_helper.c so it is no longer in an SVE
specific source file.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-14-peter.maydell@linaro.org
---
 target/arm/vec_internal.h |   3 ++
 target/arm/sve_helper.c   | 103 ++------------------------------------
 target/arm/vec_helper.c   | 102 +++++++++++++++++++++++++++++++++++++
 3 files changed, 109 insertions(+), 99 deletions(-)

diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_internal.h
+++ b/target/arm/vec_internal.h
@@ -XXX,XX +XXX,XX @@
 #define H8(x)   (x)
 #define H1_8(x) (x)
 
+/* Data for expanding active predicate bits to bytes, for byte elements. */
+extern const uint64_t expand_pred_b_data[256];
+
 static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
 {
     uint64_t *d = vd + opr_sz;
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
     return flags;
 }
 
-/* Expand active predicate bits to bytes, for byte elements.
- *  for (i = 0; i < 256; ++i) {
- *      unsigned long m = 0;
- *      for (j = 0; j < 8; j++) {
- *          if ((i >> j) & 1) {
- *              m |= 0xfful << (j << 3);
- *          }
- *      }
- *      printf("0x%016lx,\n", m);
- *  }
+/*
+ * Expand active predicate bits to bytes, for byte elements.
+ * (The data table itself is in vec_helper.c as MVE also needs it.)
  */
 static inline uint64_t expand_pred_b(uint8_t byte)
 {
-    static const uint64_t word[256] = {
-        0x0000000000000000, 0x00000000000000ff, 0x000000000000ff00,
-        0x000000000000ffff, 0x0000000000ff0000, 0x0000000000ff00ff,
-        0x0000000000ffff00, 0x0000000000ffffff, 0x00000000ff000000,
-        0x00000000ff0000ff, 0x00000000ff00ff00, 0x00000000ff00ffff,
-        0x00000000ffff0000, 0x00000000ffff00ff, 0x00000000ffffff00,
-        0x00000000ffffffff, 0x000000ff00000000, 0x000000ff000000ff,
-        0x000000ff0000ff00, 0x000000ff0000ffff, 0x000000ff00ff0000,
-        0x000000ff00ff00ff, 0x000000ff00ffff00, 0x000000ff00ffffff,
-        0x000000ffff000000, 0x000000ffff0000ff, 0x000000ffff00ff00,
-        0x000000ffff00ffff, 0x000000ffffff0000, 0x000000ffffff00ff,
-        0x000000ffffffff00, 0x000000ffffffffff, 0x0000ff0000000000,
-        0x0000ff00000000ff, 0x0000ff000000ff00, 0x0000ff000000ffff,
-        0x0000ff0000ff0000, 0x0000ff0000ff00ff, 0x0000ff0000ffff00,
-        0x0000ff0000ffffff, 0x0000ff00ff000000, 0x0000ff00ff0000ff,
-        0x0000ff00ff00ff00, 0x0000ff00ff00ffff, 0x0000ff00ffff0000,
-        0x0000ff00ffff00ff, 0x0000ff00ffffff00, 0x0000ff00ffffffff,
-        0x0000ffff00000000, 0x0000ffff000000ff, 0x0000ffff0000ff00,
-        0x0000ffff0000ffff, 0x0000ffff00ff0000, 0x0000ffff00ff00ff,
-        0x0000ffff00ffff00, 0x0000ffff00ffffff, 0x0000ffffff000000,
-        0x0000ffffff0000ff, 0x0000ffffff00ff00, 0x0000ffffff00ffff,
-        0x0000ffffffff0000, 0x0000ffffffff00ff, 0x0000ffffffffff00,
-        0x0000ffffffffffff, 0x00ff000000000000, 0x00ff0000000000ff,
-        0x00ff00000000ff00, 0x00ff00000000ffff, 0x00ff000000ff0000,
-        0x00ff000000ff00ff, 0x00ff000000ffff00, 0x00ff000000ffffff,
-        0x00ff0000ff000000, 0x00ff0000ff0000ff, 0x00ff0000ff00ff00,
-        0x00ff0000ff00ffff, 0x00ff0000ffff0000, 0x00ff0000ffff00ff,
-        0x00ff0000ffffff00, 0x00ff0000ffffffff, 0x00ff00ff00000000,
-        0x00ff00ff000000ff, 0x00ff00ff0000ff00, 0x00ff00ff0000ffff,
-        0x00ff00ff00ff0000, 0x00ff00ff00ff00ff, 0x00ff00ff00ffff00,
-        0x00ff00ff00ffffff, 0x00ff00ffff000000, 0x00ff00ffff0000ff,
-        0x00ff00ffff00ff00, 0x00ff00ffff00ffff, 0x00ff00ffffff0000,
-        0x00ff00ffffff00ff, 0x00ff00ffffffff00, 0x00ff00ffffffffff,
-        0x00ffff0000000000, 0x00ffff00000000ff, 0x00ffff000000ff00,
-        0x00ffff000000ffff, 0x00ffff0000ff0000, 0x00ffff0000ff00ff,
-        0x00ffff0000ffff00, 0x00ffff0000ffffff, 0x00ffff00ff000000,
-        0x00ffff00ff0000ff, 0x00ffff00ff00ff00, 0x00ffff00ff00ffff,
-        0x00ffff00ffff0000, 0x00ffff00ffff00ff, 0x00ffff00ffffff00,
-        0x00ffff00ffffffff, 0x00ffffff00000000, 0x00ffffff000000ff,
-        0x00ffffff0000ff00, 0x00ffffff0000ffff, 0x00ffffff00ff0000,
-        0x00ffffff00ff00ff, 0x00ffffff00ffff00, 0x00ffffff00ffffff,
-        0x00ffffffff000000, 0x00ffffffff0000ff, 0x00ffffffff00ff00,
-        0x00ffffffff00ffff, 0x00ffffffffff0000, 0x00ffffffffff00ff,
-        0x00ffffffffffff00, 0x00ffffffffffffff, 0xff00000000000000,
-        0xff000000000000ff, 0xff0000000000ff00, 0xff0000000000ffff,
-        0xff00000000ff0000, 0xff00000000ff00ff, 0xff00000000ffff00,
-        0xff00000000ffffff, 0xff000000ff000000, 0xff000000ff0000ff,
-        0xff000000ff00ff00, 0xff000000ff00ffff, 0xff000000ffff0000,
-        0xff000000ffff00ff, 0xff000000ffffff00, 0xff000000ffffffff,
-        0xff0000ff00000000, 0xff0000ff000000ff, 0xff0000ff0000ff00,
-        0xff0000ff0000ffff, 0xff0000ff00ff0000, 0xff0000ff00ff00ff,
-        0xff0000ff00ffff00, 0xff0000ff00ffffff, 0xff0000ffff000000,
-        0xff0000ffff0000ff, 0xff0000ffff00ff00, 0xff0000ffff00ffff,
-        0xff0000ffffff0000, 0xff0000ffffff00ff, 0xff0000ffffffff00,
-        0xff0000ffffffffff, 0xff00ff0000000000, 0xff00ff00000000ff,
-        0xff00ff000000ff00, 0xff00ff000000ffff, 0xff00ff0000ff0000,
-        0xff00ff0000ff00ff, 0xff00ff0000ffff00, 0xff00ff0000ffffff,
-        0xff00ff00ff000000, 0xff00ff00ff0000ff, 0xff00ff00ff00ff00,
-        0xff00ff00ff00ffff, 0xff00ff00ffff0000, 0xff00ff00ffff00ff,
-        0xff00ff00ffffff00, 0xff00ff00ffffffff, 0xff00ffff00000000,
-        0xff00ffff000000ff, 0xff00ffff0000ff00, 0xff00ffff0000ffff,
-        0xff00ffff00ff0000, 0xff00ffff00ff00ff, 0xff00ffff00ffff00,
-        0xff00ffff00ffffff, 0xff00ffffff000000, 0xff00ffffff0000ff,
-        0xff00ffffff00ff00, 0xff00ffffff00ffff, 0xff00ffffffff0000,
-        0xff00ffffffff00ff, 0xff00ffffffffff00, 0xff00ffffffffffff,
-        0xffff000000000000, 0xffff0000000000ff, 0xffff00000000ff00,
-        0xffff00000000ffff, 0xffff000000ff0000, 0xffff000000ff00ff,
-        0xffff000000ffff00, 0xffff000000ffffff, 0xffff0000ff000000,
-        0xffff0000ff0000ff, 0xffff0000ff00ff00, 0xffff0000ff00ffff,
-        0xffff0000ffff0000, 0xffff0000ffff00ff, 0xffff0000ffffff00,
-        0xffff0000ffffffff, 0xffff00ff00000000, 0xffff00ff000000ff,
-        0xffff00ff0000ff00, 0xffff00ff0000ffff, 0xffff00ff00ff0000,
-        0xffff00ff00ff00ff, 0xffff00ff00ffff00, 0xffff00ff00ffffff,
-        0xffff00ffff000000, 0xffff00ffff0000ff, 0xffff00ffff00ff00,
-        0xffff00ffff00ffff, 0xffff00ffffff0000, 0xffff00ffffff00ff,
-        0xffff00ffffffff00, 0xffff00ffffffffff, 0xffffff0000000000,
-        0xffffff00000000ff, 0xffffff000000ff00, 0xffffff000000ffff,
-        0xffffff0000ff0000, 0xffffff0000ff00ff, 0xffffff0000ffff00,
-        0xffffff0000ffffff, 0xffffff00ff000000, 0xffffff00ff0000ff,
-        0xffffff00ff00ff00, 0xffffff00ff00ffff, 0xffffff00ffff0000,
-        0xffffff00ffff00ff, 0xffffff00ffffff00, 0xffffff00ffffffff,
-        0xffffffff00000000, 0xffffffff000000ff, 0xffffffff0000ff00,
-        0xffffffff0000ffff, 0xffffffff00ff0000, 0xffffffff00ff00ff,
-        0xffffffff00ffff00, 0xffffffff00ffffff, 0xffffffffff000000,
-        0xffffffffff0000ff, 0xffffffffff00ff00, 0xffffffffff00ffff,
-        0xffffffffffff0000, 0xffffffffffff00ff, 0xffffffffffffff00,
-        0xffffffffffffffff,
-    };
-    return word[byte];
+    return expand_pred_b_data[byte];
 }
 
 /* Similarly for half-word elements.
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/int128.h"
 #include "vec_internal.h"
 
+/*
+ * Data for expanding active predicate bits to bytes, for byte elements.
+ *
+ *  for (i = 0; i < 256; ++i) {
+ *      unsigned long m = 0;
+ *      for (j = 0; j < 8; j++) {
+ *          if ((i >> j) & 1) {
+ *              m |= 0xfful << (j << 3);
+ *          }
+ *      }
+ *      printf("0x%016lx,\n", m);
+ *  }
+ */
+const uint64_t expand_pred_b_data[256] = {
+    0x0000000000000000, 0x00000000000000ff, 0x000000000000ff00,
+    0x000000000000ffff, 0x0000000000ff0000, 0x0000000000ff00ff,
+    0x0000000000ffff00, 0x0000000000ffffff, 0x00000000ff000000,
+    0x00000000ff0000ff, 0x00000000ff00ff00, 0x00000000ff00ffff,
+    0x00000000ffff0000, 0x00000000ffff00ff, 0x00000000ffffff00,
+    0x00000000ffffffff, 0x000000ff00000000, 0x000000ff000000ff,
+    0x000000ff0000ff00, 0x000000ff0000ffff, 0x000000ff00ff0000,
+    0x000000ff00ff00ff, 0x000000ff00ffff00, 0x000000ff00ffffff,
+    0x000000ffff000000, 0x000000ffff0000ff, 0x000000ffff00ff00,
+    0x000000ffff00ffff, 0x000000ffffff0000, 0x000000ffffff00ff,
+    0x000000ffffffff00, 0x000000ffffffffff, 0x0000ff0000000000,
+    0x0000ff00000000ff, 0x0000ff000000ff00, 0x0000ff000000ffff,
+    0x0000ff0000ff0000, 0x0000ff0000ff00ff, 0x0000ff0000ffff00,
+    0x0000ff0000ffffff, 0x0000ff00ff000000, 0x0000ff00ff0000ff,
+    0x0000ff00ff00ff00, 0x0000ff00ff00ffff, 0x0000ff00ffff0000,
+    0x0000ff00ffff00ff, 0x0000ff00ffffff00, 0x0000ff00ffffffff,
+    0x0000ffff00000000, 0x0000ffff000000ff, 0x0000ffff0000ff00,
+    0x0000ffff0000ffff, 0x0000ffff00ff0000, 0x0000ffff00ff00ff,
+    0x0000ffff00ffff00, 0x0000ffff00ffffff, 0x0000ffffff000000,
+    0x0000ffffff0000ff, 0x0000ffffff00ff00, 0x0000ffffff00ffff,
+    0x0000ffffffff0000, 0x0000ffffffff00ff, 0x0000ffffffffff00,
+    0x0000ffffffffffff, 0x00ff000000000000, 0x00ff0000000000ff,
+    0x00ff00000000ff00, 0x00ff00000000ffff, 0x00ff000000ff0000,
+    0x00ff000000ff00ff, 0x00ff000000ffff00, 0x00ff000000ffffff,
+    0x00ff0000ff000000, 0x00ff0000ff0000ff, 0x00ff0000ff00ff00,
+    0x00ff0000ff00ffff, 0x00ff0000ffff0000, 0x00ff0000ffff00ff,
+    0x00ff0000ffffff00, 0x00ff0000ffffffff, 0x00ff00ff00000000,
+    0x00ff00ff000000ff, 0x00ff00ff0000ff00, 0x00ff00ff0000ffff,
+    0x00ff00ff00ff0000, 0x00ff00ff00ff00ff, 0x00ff00ff00ffff00,
+    0x00ff00ff00ffffff, 0x00ff00ffff000000, 0x00ff00ffff0000ff,
+    0x00ff00ffff00ff00, 0x00ff00ffff00ffff, 0x00ff00ffffff0000,
+    0x00ff00ffffff00ff, 0x00ff00ffffffff00, 0x00ff00ffffffffff,
+    0x00ffff0000000000, 0x00ffff00000000ff, 0x00ffff000000ff00,
+    0x00ffff000000ffff, 0x00ffff0000ff0000, 0x00ffff0000ff00ff,
+    0x00ffff0000ffff00, 0x00ffff0000ffffff, 0x00ffff00ff000000,
+    0x00ffff00ff0000ff, 0x00ffff00ff00ff00, 0x00ffff00ff00ffff,
+    0x00ffff00ffff0000, 0x00ffff00ffff00ff, 0x00ffff00ffffff00,
+    0x00ffff00ffffffff, 0x00ffffff00000000, 0x00ffffff000000ff,
+    0x00ffffff0000ff00, 0x00ffffff0000ffff, 0x00ffffff00ff0000,
+    0x00ffffff00ff00ff, 0x00ffffff00ffff00, 0x00ffffff00ffffff,
+    0x00ffffffff000000, 0x00ffffffff0000ff, 0x00ffffffff00ff00,
+    0x00ffffffff00ffff, 0x00ffffffffff0000, 0x00ffffffffff00ff,
+    0x00ffffffffffff00, 0x00ffffffffffffff, 0xff00000000000000,
+    0xff000000000000ff, 0xff0000000000ff00, 0xff0000000000ffff,
+    0xff00000000ff0000, 0xff00000000ff00ff, 0xff00000000ffff00,
+    0xff00000000ffffff, 0xff000000ff000000, 0xff000000ff0000ff,
+    0xff000000ff00ff00, 0xff000000ff00ffff, 0xff000000ffff0000,
+    0xff000000ffff00ff, 0xff000000ffffff00, 0xff000000ffffffff,
+    0xff0000ff00000000, 0xff0000ff000000ff, 0xff0000ff0000ff00,
+    0xff0000ff0000ffff, 0xff0000ff00ff0000, 0xff0000ff00ff00ff,
+    0xff0000ff00ffff00, 0xff0000ff00ffffff, 0xff0000ffff000000,
+    0xff0000ffff0000ff, 0xff0000ffff00ff00, 0xff0000ffff00ffff,
+    0xff0000ffffff0000, 0xff0000ffffff00ff, 0xff0000ffffffff00,
+    0xff0000ffffffffff, 0xff00ff0000000000, 0xff00ff00000000ff,
+    0xff00ff000000ff00, 0xff00ff000000ffff, 0xff00ff0000ff0000,
+    0xff00ff0000ff00ff, 0xff00ff0000ffff00, 0xff00ff0000ffffff,
+    0xff00ff00ff000000, 0xff00ff00ff0000ff, 0xff00ff00ff00ff00,
+    0xff00ff00ff00ffff, 0xff00ff00ffff0000, 0xff00ff00ffff00ff,
+    0xff00ff00ffffff00, 0xff00ff00ffffffff, 0xff00ffff00000000,
+    0xff00ffff000000ff, 0xff00ffff0000ff00, 0xff00ffff0000ffff,
+    0xff00ffff00ff0000, 0xff00ffff00ff00ff, 0xff00ffff00ffff00,
+    0xff00ffff00ffffff, 0xff00ffffff000000, 0xff00ffffff0000ff,
+    0xff00ffffff00ff00, 0xff00ffffff00ffff, 0xff00ffffffff0000,
+    0xff00ffffffff00ff, 0xff00ffffffffff00, 0xff00ffffffffffff,
+    0xffff000000000000, 0xffff0000000000ff, 0xffff00000000ff00,
+    0xffff00000000ffff, 0xffff000000ff0000, 0xffff000000ff00ff,
+    0xffff000000ffff00, 0xffff000000ffffff, 0xffff0000ff000000,
+    0xffff0000ff0000ff, 0xffff0000ff00ff00, 0xffff0000ff00ffff,
+    0xffff0000ffff0000, 0xffff0000ffff00ff, 0xffff0000ffffff00,
+    0xffff0000ffffffff, 0xffff00ff00000000, 0xffff00ff000000ff,
+    0xffff00ff0000ff00, 0xffff00ff0000ffff, 0xffff00ff00ff0000,
+    0xffff00ff00ff00ff, 0xffff00ff00ffff00, 0xffff00ff00ffffff,
+    0xffff00ffff000000, 0xffff00ffff0000ff, 0xffff00ffff00ff00,
+    0xffff00ffff00ffff, 0xffff00ffffff0000, 0xffff00ffffff00ff,
+    0xffff00ffffffff00, 0xffff00ffffffffff, 0xffffff0000000000,
+    0xffffff00000000ff, 0xffffff000000ff00, 0xffffff000000ffff,
+    0xffffff0000ff0000, 0xffffff0000ff00ff, 0xffffff0000ffff00,
+    0xffffff0000ffffff, 0xffffff00ff000000, 0xffffff00ff0000ff,
+    0xffffff00ff00ff00, 0xffffff00ff00ffff, 0xffffff00ffff0000,
+    0xffffff00ffff00ff, 0xffffff00ffffff00, 0xffffff00ffffffff,
+    0xffffffff00000000, 0xffffffff000000ff, 0xffffffff0000ff00,
+    0xffffffff0000ffff, 0xffffffff00ff0000, 0xffffffff00ff00ff,
+    0xffffffff00ffff00, 0xffffffff00ffffff, 0xffffffffff000000,
+    0xffffffffff0000ff, 0xffffffffff00ff00, 0xffffffffff00ffff,
+    0xffffffffffff0000, 0xffffffffffff00ff, 0xffffffffffffff00,
+    0xffffffffffffffff,
+};
+
 /* Signed saturating rounding doubling multiply-accumulate high half, 8-bit */
 int8_t do_sqrdmlah_b(int8_t src1, int8_t src2, int8_t src3,
                      bool neg, bool round)
-- 
2.20.1

Currently the ARM SVE helper code defines locally some utility
functions for swapping 16-bit halfwords within 32-bit or 64-bit
values and for swapping 32-bit words within 64-bit values,
parallel to the byte-swapping bswap16/32/64 functions.

We want these also for the ARM MVE code, and they're potentially
generally useful for other targets, so move them to bitops.h.
(We don't put them in bswap.h with the bswap* functions because
they are implemented in terms of the rotate operations also
defined in bitops.h, and including bitops.h from bswap.h seems
better avoided.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210614151007.4545-17-peter.maydell@linaro.org
---
 include/qemu/bitops.h   | 29 +++++++++++++++++++++++++++++
 target/arm/sve_helper.c | 20 --------------------
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t ror64(uint64_t word, unsigned int shift)
     return (word >> shift) | (word << ((64 - shift) & 63));
 }
 
+/**
+ * hswap32 - swap 16-bit halfwords within a 32-bit value
+ * @h: value to swap
+ */
+static inline uint32_t hswap32(uint32_t h)
+{
+    return rol32(h, 16);
+}
+
+/**
+ * hswap64 - swap 16-bit halfwords within a 64-bit value
+ * @h: value to swap
+ */
+static inline uint64_t hswap64(uint64_t h)
+{
+    uint64_t m = 0x0000ffff0000ffffull;
+    h = rol64(h, 32);
+    return ((h & m) << 16) | ((h >> 16) & m);
+}
+
+/**
+ * wswap64 - swap 32-bit words within a 64-bit value
+ * @h: value to swap
+ */
+static inline uint64_t wswap64(uint64_t h)
+{
+    return rol64(h, 32);
+}
+
 /**
  * extract32:
  * @value: the value to extract the bit field from
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t expand_pred_s(uint8_t byte)
     return word[byte & 0x11];
 }
 
-/* Swap 16-bit words within a 32-bit word.  */
-static inline uint32_t hswap32(uint32_t h)
-{
-    return rol32(h, 16);
-}
-
-/* Swap 16-bit words within a 64-bit word.  */
-static inline uint64_t hswap64(uint64_t h)
-{
-    uint64_t m = 0x0000ffff0000ffffull;
-    h = rol64(h, 32);
-    return ((h & m) << 16) | ((h >> 16) & m);
-}
-
-/* Swap 32-bit words within a 64-bit word.  */
-static inline uint64_t wswap64(uint64_t h)
-{
-    return rol64(h, 32);
-}
-
 #define LOGICAL_PPPP(NAME, FUNC) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc)  \
 {                                                                         \
-- 
2.20.1

int128_make64() creates an Int128 from an unsigned 64 bit value; add
a function int128_makes64() creating an Int128 from a signed 64 bit
value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210614151007.4545-34-peter.maydell@linaro.org
---
 include/qemu/int128.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_make64(uint64_t a)
     return a;
 }
 
+static inline Int128 int128_makes64(int64_t a)
+{
+    return a;
+}
+
 static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
 {
     return (__uint128_t)hi << 64 | lo;
@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_make64(uint64_t a)
     return (Int128) { a, 0 };
 }
 
+static inline Int128 int128_makes64(int64_t a)
+{
+    return (Int128) { a, a >> 63 };
+}
+
 static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
 {
     return (Int128) { lo, hi };
-- 
2.20.1

Hi; this is the latest target-arm queue. Most of the patches
here are RTH's FEAT_HAFDBS finally landing. I've also included
the RNG-seed randomization patches from Jason, as well as a few
more minor things. The patches include a couple of regression
fixes:
 * the resettable patch fixes a SCSI reset regression
 * the 'do not re-randomize on snapshot load' patches fix
   record-and-replay regressions

thanks
-- PMM

The following changes since commit e750a7ace492f0b450653d4ad368a77d6f660fb8:

Merge tag 'pull-9p-20221024' of https://github.com/cschoenebeck/qemu into staging (2022-10-24 14:27:12 -0400)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20221025

for you to fetch changes up to e2114f701c78f76246e4b1872639dad94a6bdd21:

rx: re-randomize rng-seed on reboot (2022-10-25 17:32:24 +0100)

----------------------------------------------------------------
target-arm queue:
 * Implement FEAT_E0PD
 * Implement FEAT_HAFDBS
 * honor HCR_E2H and HCR_TGE in arm_excp_unmasked()
 * hw/arm/virt: Fix devicetree warnings about the virtio-iommu node
 * hw/core/resettable: fix reset level counting
 * hw/hyperv/hyperv.c: Use device_cold_reset() instead of device_legacy_reset()
 * imx: reload cmp timer outside of the reload ptimer transaction
 * x86: do not re-randomize RNG seed on snapshot load
 * m68k/virt: do not re-randomize RNG seed on snapshot load
 * m68k/q800: do not re-randomize RNG seed on snapshot load
 * arm: re-randomize rng-seed on reboot
 * riscv: re-randomize rng-seed on reboot
 * mips/boston: re-randomize rng-seed on reboot
 * openrisc: re-randomize rng-seed on reboot
 * rx: re-randomize rng-seed on reboot

----------------------------------------------------------------
Ake Koomsin (1):
      target/arm: honor HCR_E2H and HCR_TGE in arm_excp_unmasked()

Axel Heider (1):
      target/imx: reload cmp timer outside of the reload ptimer transaction

Damien Hedde (1):
      hw/core/resettable: fix reset level counting

Jason A. Donenfeld (10):
      reset: allow registering handlers that aren't called by snapshot loading
      device-tree: add re-randomization helper function
      x86: do not re-randomize RNG seed on snapshot load
      arm: re-randomize rng-seed on reboot
      riscv: re-randomize rng-seed on reboot
      m68k/virt: do not re-randomize RNG seed on snapshot load
      m68k/q800: do not re-randomize RNG seed on snapshot load
      mips/boston: re-randomize rng-seed on reboot
      openrisc: re-randomize rng-seed on reboot
      rx: re-randomize rng-seed on reboot

Jean-Philippe Brucker (1):
      hw/arm/virt: Fix devicetree warnings about the virtio-iommu node

Peter Maydell (2):
      target/arm: Implement FEAT_E0PD
      hw/hyperv/hyperv.c: Use device_cold_reset() instead of device_legacy_reset()

Richard Henderson (14):
      target/arm: Introduce regime_is_stage2
      target/arm: Add ptw_idx to S1Translate
      target/arm: Add isar predicates for FEAT_HAFDBS
      target/arm: Extract HA and HD in aa64_va_parameters
      target/arm: Move S1_ptw_translate outside arm_ld[lq]_ptw
      target/arm: Add ARMFault_UnsuppAtomicUpdate
      target/arm: Remove loop from get_phys_addr_lpae
      target/arm: Fix fault reporting in get_phys_addr_lpae
      target/arm: Don't shift attrs in get_phys_addr_lpae
      target/arm: Consider GP an attribute in get_phys_addr_lpae
      target/arm: Tidy merging of attributes from descriptor and table
      target/arm: Implement FEAT_HAFDBS, access flag portion
      target/arm: Implement FEAT_HAFDBS, dirty bit portion
      target/arm: Use the max page size in a 2-stage ptw

FEAT_E0PD adds new bits E0PD0 and E0PD1 to TCR_EL1, which allow the
OS to forbid EL0 access to half of the address space.  Since this is
an EL0-specific variation on the existing TCR_ELx.{EPD0,EPD1}, we can
implement it entirely in aa64_va_parameters().

This requires moving the existing regime_is_user() to internals.h
so that the code in helper.c can get at it.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20221021160131.3531787-1-peter.maydell@linaro.org
---
 docs/system/arm/emulation.rst |  1 +
 target/arm/cpu.h              |  5 +++++
 target/arm/internals.h        | 19 +++++++++++++++++++
 target/arm/cpu64.c            |  1 +
 target/arm/helper.c           |  9 +++++++++
 target/arm/ptw.c              | 19 -------------------
 6 files changed, 35 insertions(+), 19 deletions(-)

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

The "PCI Bus Binding to: IEEE Std 1275-1994" defines the compatible
string for a PCIe bus or endpoint as "pci<vendorid>,<deviceid>" or
similar. Since the initial binding for PCI virtio-iommu didn't follow
this rule, it was modified to accept both strings and ensure backward
compatibility. Also, the unit-name for the node should be
"device,function".

Fix corresponding dt-validate and dtc warnings:

pcie@10000000: virtio_iommu@16:compatible: ['virtio,pci-iommu'] does not contain items matching the given schema
  pcie@10000000: Unevaluated properties are not allowed (... 'virtio_iommu@16' were unexpected)
  From schema: linux/Documentation/devicetree/bindings/pci/host-generic-pci.yaml
  virtio_iommu@16: compatible: 'oneOf' conditional failed, one must be fixed:
        ['virtio,pci-iommu'] is too short
        'pci1af4,1057' was expected
  From schema: dtschema/schemas/pci/pci-bus.yaml

Warning (pci_device_reg): /pcie@10000000/virtio_iommu@16: PCI unit address format error, expected "2,0"

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void create_smmu(const VirtMachineState *vms,
 
 static void create_virtio_iommu_dt_bindings(VirtMachineState *vms)
 {
-    const char compat[] = "virtio,pci-iommu";
+    const char compat[] = "virtio,pci-iommu\0pci1af4,1057";
     uint16_t bdf = vms->virtio_iommu_bdf;
     MachineState *ms = MACHINE(vms);
     char *node;
 
     vms->iommu_phandle = qemu_fdt_alloc_phandle(ms->fdt);
 
-    node = g_strdup_printf("%s/virtio_iommu@%d", vms->pciehb_nodename, bdf);
+    node = g_strdup_printf("%s/virtio_iommu@%x,%x", vms->pciehb_nodename,
+                           PCI_SLOT(bdf), PCI_FUNC(bdf));
     qemu_fdt_add_subnode(ms->fdt, node);
     qemu_fdt_setprop(ms->fdt, node, "compatible", compat, sizeof(compat));
     qemu_fdt_setprop_sized_cells(ms->fdt, node, "reg",
-- 
2.25.1

From: Ake Koomsin <ake@igel.co.jp>

An exception targeting EL2 from lower EL is actually maskable when
HCR_E2H and HCR_TGE are both set. This applies to both secure and
non-secure Security state.

We can remove the conditions that try to suppress masking of
interrupts when we are Secure and the exception targets EL2 and
Secure EL2 is disabled.  This is OK because in that situation
arm_phys_excp_target_el() will never return 2 as the target EL.  The
'not if secure' check in this function was originally written before
arm_hcr_el2_eff(), and back then the target EL returned by
arm_phys_excp_target_el() could be 2 even if we were in Secure
EL0/EL1; but it is no longer needed.

Signed-off-by: Ake Koomsin <ake@igel.co.jp>
Message-id: 20221017092432.546881-1-ake@igel.co.jp
[PMM: Add commit message paragraph explaining why it's OK to
 remove the checks on secure and SCR_EEL2]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static inline bool arm_excp_unmasked(CPUState *cs, unsigned int excp_idx,
     if ((target_el > cur_el) && (target_el != 1)) {
         /* Exceptions targeting a higher EL may not be maskable */
         if (arm_feature(env, ARM_FEATURE_AARCH64)) {
-            /*
-             * 64-bit masking rules are simple: exceptions to EL3
-             * can't be masked, and exceptions to EL2 can only be
-             * masked from Secure state. The HCR and SCR settings
-             * don't affect the masking logic, only the interrupt routing.
-             */
-            if (target_el == 3 || !secure || (env->cp15.scr_el3 & SCR_EEL2)) {
+            switch (target_el) {
+            case 2:
+                /*
+                 * According to ARM DDI 0487H.a, an interrupt can be masked
+                 * when HCR_E2H and HCR_TGE are both set regardless of the
+                 * current Security state. Note that we need to revisit this
+                 * part again once we need to support NMI.
+                 */
+                if ((hcr_el2 & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
+                        unmasked = true;
+                }
+                break;
+            case 3:
+                /* Interrupt cannot be masked when the target EL is 3 */
                 unmasked = true;
+                break;
+            default:
+                g_assert_not_reached();
             }
         } else {
             /*
-- 
2.25.1

From: Damien Hedde <damien.hedde@greensocs.com>

The code for handling the reset level count in the Resettable code
has two issues:

The reset count is only decremented for the 1->0 case.  This means
that if there's ever a nested reset that takes the count to 2 then it
will never again be decremented.  Eventually the count will exceed
the '50' limit in resettable_phase_enter() and QEMU will trip over
the assertion failure.  The repro case in issue 1266 is an example of
this that happens now the SCSI subsystem uses three-phase reset.

Secondly, the count is decremented only after the exit phase handler
is called.  Moving the reset count decrement from "just after" to
"just before" calling the exit phase handler allows
resettable_is_in_reset() to return false during the handler
execution.

This simplifies reset handling in resettable devices.  Typically, a
function that updates the device state will just need to read the
current reset state and not anymore treat the "in a reset-exit
transition" as a special case.

Note that the semantics change to the *_is_in_reset() functions
will have no effect on the current codebase, because only two
devices (hw/char/cadence_uart.c and hw/misc/zynq_sclr.c) currently
call those functions, and in neither case do they do it from the
device's exit phase methed.

Fixes: 4a5fc890 ("scsi: Use device_cold_reset() and bus_cold_reset()")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1266
Signed-off-by: Damien Hedde <damien.hedde@greensocs.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reported-by: Michael Peter <michael.peter@hensoldt-cyber.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221020142749.3357951-1-peter.maydell@linaro.org
Buglink: https://bugs.launchpad.net/qemu/+bug/1905297
Reported-by: Michael Peter <michael.peter@hensoldt-cyber.com>
[PMM: adjust the docs paragraph changed to get the name of the
 'enter' phase right and to clarify exactly when the count is
 adjusted; rewrite the commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/devel/reset.rst | 8 +++++---
 hw/core/resettable.c | 3 +--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/docs/devel/reset.rst b/docs/devel/reset.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/devel/reset.rst
+++ b/docs/devel/reset.rst
@@ -XXX,XX +XXX,XX @@ Polling the reset state
 Resettable interface provides the ``resettable_is_in_reset()`` function.
 This function returns true if the object parameter is currently under reset.
 
-An object is under reset from the beginning of the *init* phase to the end of
-the *exit* phase. During all three phases, the function will return that the
-object is in reset.
+An object is under reset from the beginning of the *enter* phase (before
+either its children or its own enter method is called) to the *exit*
+phase. During *enter* and *hold* phase only, the function will return that the
+object is in reset. The state is changed after the *exit* is propagated to
+its children and just before calling the object's own *exit* method.
 
 This function may be used if the object behavior has to be adapted
 while in reset state. For example if a device has an irq input,
diff --git a/hw/core/resettable.c b/hw/core/resettable.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/core/resettable.c
+++ b/hw/core/resettable.c
@@ -XXX,XX +XXX,XX @@ static void resettable_phase_exit(Object *obj, void *opaque, ResetType type)
     resettable_child_foreach(rc, obj, resettable_phase_exit, NULL, type);
 
     assert(s->count > 0);
-    if (s->count == 1) {
+    if (--s->count == 0) {
         trace_resettable_phase_exit_exec(obj, obj_typename, !!rc->phases.exit);
         if (rc->phases.exit && !resettable_get_tr_func(rc, obj)) {
             rc->phases.exit(obj);
         }
-        s->count = 0;
     }
     s->exit_phase_in_progress = false;
     trace_resettable_phase_exit_end(obj, obj_typename, s->count);
-- 
2.25.1

The semantic difference between the deprecated device_legacy_reset()
function and the newer device_cold_reset() function is that the new
function resets both the device itself and any qbuses it owns,
whereas the legacy function resets just the device itself and nothing
else.  In hyperv_synic_reset() we reset a SynICState, which has no
qbuses, so for this purpose the two functions behave identically and
we can stop using the deprecated one.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-id: 20221013171817.1447562-1-peter.maydell@linaro.org
---
 hw/hyperv/hyperv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/hyperv/hyperv.c
+++ b/hw/hyperv/hyperv.c
@@ -XXX,XX +XXX,XX @@ void hyperv_synic_reset(CPUState *cs)
     SynICState *synic = get_synic(cs);
 
     if (synic) {
-        device_legacy_reset(DEVICE(synic));
+        device_cold_reset(DEVICE(synic));
     }
 }
 
-- 
2.25.1

From: Axel Heider <axel.heider@hensoldt.net>

When running seL4 tests (https://docs.sel4.systems/projects/sel4test)
on the sabrelight platform, the timer tests fail. The arm/imx6 EPIT
timer interrupt does not fire properly, instead of a e.g. second in
can take up to a minute to finally see the interrupt.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1263

Signed-off-by: Axel Heider <axel.heider@hensoldt.net>
Message-id: 166663118138.13362.1229967229046092876-0@git.sr.ht
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/timer/imx_epit.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/imx_epit.c
+++ b/hw/timer/imx_epit.c
@@ -XXX,XX +XXX,XX @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value,
             /* If IOVW bit is set then set the timer value */
             ptimer_set_count(s->timer_reload, s->lr);
         }
-
+        /*
+         * Commit the change to s->timer_reload, so it can propagate. Otherwise
+         * the timer interrupt may not fire properly. The commit must happen
+         * before calling imx_epit_reload_compare_timer(), which reads
+         * s->timer_reload internally again.
+         */
+        ptimer_transaction_commit(s->timer_reload);
         imx_epit_reload_compare_timer(s);
         ptimer_transaction_commit(s->timer_cmp);
-        ptimer_transaction_commit(s->timer_reload);
         break;
 
     case 3: /* CMP */
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Reduce the amount of typing required for this check.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221024051851.3074715-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h |  5 +++++
 target/arm/helper.c    | 14 +++++---------
 target/arm/ptw.c       | 14 ++++++--------
 3 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline bool regime_is_pan(CPUARMState *env, ARMMMUIdx mmu_idx)
     }
 }
 
+static inline bool regime_is_stage2(ARMMMUIdx mmu_idx)
+{
+    return mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S;
+}
+
 /* Return the exception level which controls this address translation regime */
 static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
 {
     if (regime_has_2_ranges(mmu_idx)) {
         return extract64(tcr, 37, 2);
-    } else if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+    } else if (regime_is_stage2(mmu_idx)) {
         return 0; /* VTCR_EL2 */
     } else {
         /* Replicate the single TBI bit so we always have 2 bits.  */
@@ -XXX,XX +XXX,XX @@ int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx)
 {
     if (regime_has_2_ranges(mmu_idx)) {
         return extract64(tcr, 51, 2);
-    } else if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+    } else if (regime_is_stage2(mmu_idx)) {
         return 0; /* VTCR_EL2 */
     } else {
         /* Replicate the single TBID bit so we always have 2 bits.  */
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
     int select, tsz, tbi, max_tsz, min_tsz, ps, sh;
     ARMGranuleSize gran;
     ARMCPU *cpu = env_archcpu(env);
-    bool stage2 = mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S;
+    bool stage2 = regime_is_stage2(mmu_idx);
 
     if (!regime_has_2_ranges(mmu_idx)) {
         select = 0;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
         }
         ds = false;
     } else if (ds) {
-        switch (mmu_idx) {
-        case ARMMMUIdx_Stage2:
-        case ARMMMUIdx_Stage2_S:
+        if (regime_is_stage2(mmu_idx)) {
             if (gran == Gran16K) {
                 ds = cpu_isar_feature(aa64_tgran16_2_lpa2, cpu);
             } else {
                 ds = cpu_isar_feature(aa64_tgran4_2_lpa2, cpu);
             }
-            break;
-        default:
+        } else {
             if (gran == Gran16K) {
                 ds = cpu_isar_feature(aa64_tgran16_lpa2, cpu);
             } else {
                 ds = cpu_isar_feature(aa64_tgran4_lpa2, cpu);
             }
-            break;
         }
         if (ds) {
             min_tsz = 12;
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
     bool have_wxn;
     int wxn = 0;
 
-    assert(mmu_idx != ARMMMUIdx_Stage2);
-    assert(mmu_idx != ARMMMUIdx_Stage2_S);
+    assert(!regime_is_stage2(mmu_idx));
 
     user_rw = simple_ap_to_rw_prot_is_user(ap, true);
     if (is_user) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         goto do_fault;
     }
 
-    if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
+    if (!regime_is_stage2(mmu_idx)) {
         /*
          * The starting level depends on the virtual address size (which can
          * be up to 48 bits) and the translation granule size. It indicates
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         attrs = extract64(descriptor, 2, 10)
             | (extract64(descriptor, 52, 12) << 10);
 
-        if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+        if (regime_is_stage2(mmu_idx)) {
             /* Stage 2 table descriptors do not include any attribute fields */
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
 
     ap = extract32(attrs, 4, 2);
 
-    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+    if (regime_is_stage2(mmu_idx)) {
         ns = mmu_idx == ARMMMUIdx_Stage2;
         xn = extract32(attrs, 11, 2);
         result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         result->f.guarded = guarded;
     }
 
-    if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+    if (regime_is_stage2(mmu_idx)) {
         result->cacheattrs.is_s2_format = true;
         result->cacheattrs.attrs = extract32(attrs, 0, 4);
     } else {
@@ -XXX,XX +XXX,XX @@ do_fault:
     fi->type = fault_type;
     fi->level = level;
     /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
-    fi->stage2 = fi->s1ptw || (mmu_idx == ARMMMUIdx_Stage2 ||
-                               mmu_idx == ARMMMUIdx_Stage2_S);
+    fi->stage2 = fi->s1ptw || regime_is_stage2(mmu_idx);
     fi->s1ns = mmu_idx == ARMMMUIdx_Stage2;
     return true;
 }
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Hoist the computation of the mmu_idx for the ptw up to
get_phys_addr_with_struct and get_phys_addr_twostage.
This removes the duplicate check for stage2 disabled
from the middle of the walk, performing it only once.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20221024051851.3074715-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 71 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 54 insertions(+), 17 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
 
 typedef struct S1Translate {
     ARMMMUIdx in_mmu_idx;
+    ARMMMUIdx in_ptw_idx;
     bool in_secure;
     bool in_debug;
     bool out_secure;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
 {
     bool is_secure = ptw->in_secure;
     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
-    ARMMMUIdx s2_mmu_idx = is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
-    bool s2_phys = false;
+    ARMMMUIdx s2_mmu_idx = ptw->in_ptw_idx;
     uint8_t pte_attrs;
     bool pte_secure;
 
-    if (!arm_mmu_idx_is_stage1_of_2(mmu_idx)
-        || regime_translation_disabled(env, s2_mmu_idx, is_secure)) {
-        s2_mmu_idx = is_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
-        s2_phys = true;
-    }
-
     if (unlikely(ptw->in_debug)) {
         /*
          * From gdbstub, do not use softmmu so that we don't modify the
          * state of the cpu at all, including softmmu tlb contents.
          */
-        if (s2_phys) {
-            ptw->out_phys = addr;
-            pte_attrs = 0;
-            pte_secure = is_secure;
-        } else {
+        if (regime_is_stage2(s2_mmu_idx)) {
             S1Translate s2ptw = {
                 .in_mmu_idx = s2_mmu_idx,
+                .in_ptw_idx = is_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS,
                 .in_secure = is_secure,
                 .in_debug = true,
             };
             GetPhysAddrResult s2 = { };
+
             if (!get_phys_addr_lpae(env, &s2ptw, addr, MMU_DATA_LOAD,
                                     false, &s2, fi)) {
                 goto fail;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
             ptw->out_phys = s2.f.phys_addr;
             pte_attrs = s2.cacheattrs.attrs;
             pte_secure = s2.f.attrs.secure;
+        } else {
+            /* Regime is physical. */
+            ptw->out_phys = addr;
+            pte_attrs = 0;
+            pte_secure = is_secure;
         }
         ptw->out_host = NULL;
     } else {
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
         pte_secure = full->attrs.secure;
     }
 
-    if (!s2_phys) {
+    if (regime_is_stage2(s2_mmu_idx)) {
         uint64_t hcr = arm_hcr_el2_eff_secstate(env, is_secure);
 
         if ((hcr & HCR_PTW) && S2_attrs_are_device(hcr, pte_attrs)) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         descaddr |= (address >> (stride * (4 - level))) & indexmask;
         descaddr &= ~7ULL;
         nstable = extract32(tableattrs, 4, 1);
-        ptw->in_secure = !nstable;
+        if (!nstable) {
+            /*
+             * Stage2_S -> Stage2 or Phys_S -> Phys_NS
+             * Assert that the non-secure idx are even, and relative order.
+             */
+            QEMU_BUILD_BUG_ON((ARMMMUIdx_Phys_NS & 1) != 0);
+            QEMU_BUILD_BUG_ON((ARMMMUIdx_Stage2 & 1) != 0);
+            QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS + 1 != ARMMMUIdx_Phys_S);
+            QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2 + 1 != ARMMMUIdx_Stage2_S);
+            ptw->in_ptw_idx &= ~1;
+            ptw->in_secure = false;
+        }
         descriptor = arm_ldq_ptw(env, ptw, descaddr, fi);
         if (fi->type != ARMFault_None) {
             goto do_fault;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
 
     is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
     ptw->in_mmu_idx = s2walk_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
+    ptw->in_ptw_idx = s2walk_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
     ptw->in_secure = s2walk_secure;
 
     /*
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
                                       ARMMMUFaultInfo *fi)
 {
     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
-    ARMMMUIdx s1_mmu_idx = stage_1_mmu_idx(mmu_idx);
     bool is_secure = ptw->in_secure;
+    ARMMMUIdx s1_mmu_idx;
 
-    if (mmu_idx != s1_mmu_idx) {
+    switch (mmu_idx) {
+    case ARMMMUIdx_Phys_S:
+    case ARMMMUIdx_Phys_NS:
+        /* Checking Phys early avoids special casing later vs regime_el. */
+        return get_phys_addr_disabled(env, address, access_type, mmu_idx,
+                                      is_secure, result, fi);
+
+    case ARMMMUIdx_Stage1_E0:
+    case ARMMMUIdx_Stage1_E1:
+    case ARMMMUIdx_Stage1_E1_PAN:
+        /* First stage lookup uses second stage for ptw. */
+        ptw->in_ptw_idx = is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
+        break;
+
+    case ARMMMUIdx_E10_0:
+        s1_mmu_idx = ARMMMUIdx_Stage1_E0;
+        goto do_twostage;
+    case ARMMMUIdx_E10_1:
+        s1_mmu_idx = ARMMMUIdx_Stage1_E1;
+        goto do_twostage;
+    case ARMMMUIdx_E10_1_PAN:
+        s1_mmu_idx = ARMMMUIdx_Stage1_E1_PAN;
+    do_twostage:
         /*
          * Call ourselves recursively to do the stage 1 and then stage 2
          * translations if mmu_idx is a two-stage regime, and EL2 present.
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
             return get_phys_addr_twostage(env, ptw, address, access_type,
                                           result, fi);
         }
+        /* fall through */
+
+    default:
+        /* Single stage and second stage uses physical for ptw. */
+        ptw->in_ptw_idx = is_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
+        break;
     }
 
     /*
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

The MMFR1 field may indicate support for hardware update of
access flag alone, or access flag and dirty bit.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221024051851.3074715-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_e0pd(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, E0PD) != 0;
 }
 
+static inline bool isar_feature_aa64_hafs(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) != 0;
+}
+
+static inline bool isar_feature_aa64_hdbs(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) >= 2;
+}
+
 static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20221024051851.3074715-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h | 2 ++
 target/arm/helper.c    | 8 +++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
     bool hpd        : 1;
     bool tsz_oob    : 1;  /* tsz has been clamped to legal range */
     bool ds         : 1;
+    bool ha         : 1;
+    bool hd         : 1;
     ARMGranuleSize gran : 2;
 } ARMVAParameters;
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
                                    ARMMMUIdx mmu_idx, bool data)
 {
     uint64_t tcr = regime_tcr(env, mmu_idx);
-    bool epd, hpd, tsz_oob, ds;
+    bool epd, hpd, tsz_oob, ds, ha, hd;
     int select, tsz, tbi, max_tsz, min_tsz, ps, sh;
     ARMGranuleSize gran;
     ARMCPU *cpu = env_archcpu(env);
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
         epd = false;
         sh = extract32(tcr, 12, 2);
         ps = extract32(tcr, 16, 3);
+        ha = extract32(tcr, 21, 1) && cpu_isar_feature(aa64_hafs, cpu);
+        hd = extract32(tcr, 22, 1) && cpu_isar_feature(aa64_hdbs, cpu);
         ds = extract64(tcr, 32, 1);
     } else {
         bool e0pd;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
             e0pd = extract64(tcr, 56, 1);
         }
         ps = extract64(tcr, 32, 3);
+        ha = extract64(tcr, 39, 1) && cpu_isar_feature(aa64_hafs, cpu);
+        hd = extract64(tcr, 40, 1) && cpu_isar_feature(aa64_hdbs, cpu);
         ds = extract64(tcr, 59, 1);
 
         if (e0pd && cpu_isar_feature(aa64_e0pd, cpu) &&
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
         .hpd = hpd,
         .tsz_oob = tsz_oob,
         .ds = ds,
+        .ha = ha,
+        .hd = ha && hd,
         .gran = gran,
     };
 }
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Separate S1 translation from the actual lookup.
Will enable lpae hardware updates.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221024051851.3074715-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 41 ++++++++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
 }
 
 /* All loads done in the course of a page table walk go through here. */
-static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw, hwaddr addr,
+static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw,
                             ARMMMUFaultInfo *fi)
 {
     CPUState *cs = env_cpu(env);
     uint32_t data;
 
-    if (!S1_ptw_translate(env, ptw, addr, fi)) {
-        /* Failure. */
-        assert(fi->s1ptw);
-        return 0;
-    }
-
     if (likely(ptw->out_host)) {
         /* Page tables are in RAM, and we have the host address. */
         if (ptw->out_be) {
@@ -XXX,XX +XXX,XX @@ static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw, hwaddr addr,
     return data;
 }
 
-static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw, hwaddr addr,
+static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw,
                             ARMMMUFaultInfo *fi)
 {
     CPUState *cs = env_cpu(env);
     uint64_t data;
 
-    if (!S1_ptw_translate(env, ptw, addr, fi)) {
-        /* Failure. */
-        assert(fi->s1ptw);
-        return 0;
-    }
-
     if (likely(ptw->out_host)) {
         /* Page tables are in RAM, and we have the host address. */
         if (ptw->out_be) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v5(CPUARMState *env, S1Translate *ptw,
         fi->type = ARMFault_Translation;
         goto do_fault;
     }
-    desc = arm_ldl_ptw(env, ptw, table, fi);
+    if (!S1_ptw_translate(env, ptw, table, fi)) {
+        goto do_fault;
+    }
+    desc = arm_ldl_ptw(env, ptw, fi);
     if (fi->type != ARMFault_None) {
         goto do_fault;
     }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v5(CPUARMState *env, S1Translate *ptw,
             /* Fine pagetable.  */
             table = (desc & 0xfffff000) | ((address >> 8) & 0xffc);
         }
-        desc = arm_ldl_ptw(env, ptw, table, fi);
+        if (!S1_ptw_translate(env, ptw, table, fi)) {
+            goto do_fault;
+        }
+        desc = arm_ldl_ptw(env, ptw, fi);
         if (fi->type != ARMFault_None) {
             goto do_fault;
         }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, S1Translate *ptw,
         fi->type = ARMFault_Translation;
         goto do_fault;
     }
-    desc = arm_ldl_ptw(env, ptw, table, fi);
+    if (!S1_ptw_translate(env, ptw, table, fi)) {
+        goto do_fault;
+    }
+    desc = arm_ldl_ptw(env, ptw, fi);
     if (fi->type != ARMFault_None) {
         goto do_fault;
     }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, S1Translate *ptw,
         ns = extract32(desc, 3, 1);
         /* Lookup l2 entry.  */
         table = (desc & 0xfffffc00) | ((address >> 10) & 0x3fc);
-        desc = arm_ldl_ptw(env, ptw, table, fi);
+        if (!S1_ptw_translate(env, ptw, table, fi)) {
+            goto do_fault;
+        }
+        desc = arm_ldl_ptw(env, ptw, fi);
         if (fi->type != ARMFault_None) {
             goto do_fault;
         }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
             ptw->in_ptw_idx &= ~1;
             ptw->in_secure = false;
         }
-        descriptor = arm_ldq_ptw(env, ptw, descaddr, fi);
+        if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
+            goto do_fault;
+        }
+        descriptor = arm_ldq_ptw(env, ptw, fi);
         if (fi->type != ARMFault_None) {
             goto do_fault;
         }
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

This fault type is to be used with FEAT_HAFDBS when
the guest enables hw updates, but places the tables
in memory where atomic updates are unsupported.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20221024051851.3074715-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFaultType {
     ARMFault_AsyncExternal,
     ARMFault_Debug,
     ARMFault_TLBConflict,
+    ARMFault_UnsuppAtomicUpdate,
     ARMFault_Lockdown,
     ARMFault_Exclusive,
     ARMFault_ICacheMaint,
@@ -XXX,XX +XXX,XX @@ static inline uint32_t arm_fi_to_lfsc(ARMMMUFaultInfo *fi)
     case ARMFault_TLBConflict:
         fsc = 0x30;
         break;
+    case ARMFault_UnsuppAtomicUpdate:
+        fsc = 0x31;
+        break;
     case ARMFault_Lockdown:
         fsc = 0x34;
         break;
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

The unconditional loop was used both to iterate over levels
and to control parsing of attributes.  Use an explicit goto
in both cases.

While this appears less clean for iterating over levels, we
will need to jump back into the middle of this loop for
atomic updates, which is even uglier.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221024051851.3074715-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 192 +++++++++++++++++++++++------------------------
 1 file changed, 96 insertions(+), 96 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     uint64_t descaddrmask;
     bool aarch64 = arm_el_is_aa64(env, el);
     bool guarded = false;
+    uint64_t descriptor;
+    bool nstable;
 
     /* TODO: This code does not support shareability levels. */
     if (aarch64) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
      * bits at each step.
      */
     tableattrs = is_secure ? 0 : (1 << 4);
-    for (;;) {
-        uint64_t descriptor;
-        bool nstable;
-
-        descaddr |= (address >> (stride * (4 - level))) & indexmask;
-        descaddr &= ~7ULL;
-        nstable = extract32(tableattrs, 4, 1);
-        if (!nstable) {
-            /*
-             * Stage2_S -> Stage2 or Phys_S -> Phys_NS
-             * Assert that the non-secure idx are even, and relative order.
-             */
-            QEMU_BUILD_BUG_ON((ARMMMUIdx_Phys_NS & 1) != 0);
-            QEMU_BUILD_BUG_ON((ARMMMUIdx_Stage2 & 1) != 0);
-            QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS + 1 != ARMMMUIdx_Phys_S);
-            QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2 + 1 != ARMMMUIdx_Stage2_S);
-            ptw->in_ptw_idx &= ~1;
-            ptw->in_secure = false;
-        }
-        if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
-            goto do_fault;
-        }
-        descriptor = arm_ldq_ptw(env, ptw, fi);
-        if (fi->type != ARMFault_None) {
-            goto do_fault;
-        }
-
-        if (!(descriptor & 1) ||
-            (!(descriptor & 2) && (level == 3))) {
-            /* Invalid, or the Reserved level 3 encoding */
-            goto do_fault;
-        }
-
-        descaddr = descriptor & descaddrmask;
 
+ next_level:
+    descaddr |= (address >> (stride * (4 - level))) & indexmask;
+    descaddr &= ~7ULL;
+    nstable = extract32(tableattrs, 4, 1);
+    if (!nstable) {
         /*
-         * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
-         * of descriptor.  For FEAT_LPA2 and effective DS, bits [51:50] of
-         * descaddr are in [9:8].  Otherwise, if descaddr is out of range,
-         * raise AddressSizeFault.
+         * Stage2_S -> Stage2 or Phys_S -> Phys_NS
+         * Assert that the non-secure idx are even, and relative order.
          */
-        if (outputsize > 48) {
-            if (param.ds) {
-                descaddr |= extract64(descriptor, 8, 2) << 50;
-            } else {
-                descaddr |= extract64(descriptor, 12, 4) << 48;
-            }
-        } else if (descaddr >> outputsize) {
-            fault_type = ARMFault_AddressSize;
-            goto do_fault;
-        }
-
-        if ((descriptor & 2) && (level < 3)) {
-            /*
-             * Table entry. The top five bits are attributes which may
-             * propagate down through lower levels of the table (and
-             * which are all arranged so that 0 means "no effect", so
-             * we can gather them up by ORing in the bits at each level).
-             */
-            tableattrs |= extract64(descriptor, 59, 5);
-            level++;
-            indexmask = indexmask_grainsize;
-            continue;
-        }
-        /*
-         * Block entry at level 1 or 2, or page entry at level 3.
-         * These are basically the same thing, although the number
-         * of bits we pull in from the vaddr varies. Note that although
-         * descaddrmask masks enough of the low bits of the descriptor
-         * to give a correct page or table address, the address field
-         * in a block descriptor is smaller; so we need to explicitly
-         * clear the lower bits here before ORing in the low vaddr bits.
-         */
-        page_size = (1ULL << ((stride * (4 - level)) + 3));
-        descaddr &= ~(hwaddr)(page_size - 1);
-        descaddr |= (address & (page_size - 1));
-        /* Extract attributes from the descriptor */
-        attrs = extract64(descriptor, 2, 10)
-            | (extract64(descriptor, 52, 12) << 10);
-
-        if (regime_is_stage2(mmu_idx)) {
-            /* Stage 2 table descriptors do not include any attribute fields */
-            break;
-        }
-        /* Merge in attributes from table descriptors */
-        attrs |= nstable << 3; /* NS */
-        guarded = extract64(descriptor, 50, 1);  /* GP */
-        if (param.hpd) {
-            /* HPD disables all the table attributes except NSTable.  */
-            break;
-        }
-        attrs |= extract32(tableattrs, 0, 2) << 11;     /* XN, PXN */
-        /*
-         * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
-         * means "force PL1 access only", which means forcing AP[1] to 0.
-         */
-        attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
-        attrs |= extract32(tableattrs, 3, 1) << 5;      /* APT[1] => AP[2] */
-        break;
+        QEMU_BUILD_BUG_ON((ARMMMUIdx_Phys_NS & 1) != 0);
+        QEMU_BUILD_BUG_ON((ARMMMUIdx_Stage2 & 1) != 0);
+        QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS + 1 != ARMMMUIdx_Phys_S);
+        QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2 + 1 != ARMMMUIdx_Stage2_S);
+        ptw->in_ptw_idx &= ~1;
+        ptw->in_secure = false;
     }
+    if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
+        goto do_fault;
+    }
+    descriptor = arm_ldq_ptw(env, ptw, fi);
+    if (fi->type != ARMFault_None) {
+        goto do_fault;
+    }
+
+    if (!(descriptor & 1) || (!(descriptor & 2) && (level == 3))) {
+        /* Invalid, or the Reserved level 3 encoding */
+        goto do_fault;
+    }
+
+    descaddr = descriptor & descaddrmask;
+
+    /*
+     * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
+     * of descriptor.  For FEAT_LPA2 and effective DS, bits [51:50] of
+     * descaddr are in [9:8].  Otherwise, if descaddr is out of range,
+     * raise AddressSizeFault.
+     */
+    if (outputsize > 48) {
+        if (param.ds) {
+            descaddr |= extract64(descriptor, 8, 2) << 50;
+        } else {
+            descaddr |= extract64(descriptor, 12, 4) << 48;
+        }
+    } else if (descaddr >> outputsize) {
+        fault_type = ARMFault_AddressSize;
+        goto do_fault;
+    }
+
+    if ((descriptor & 2) && (level < 3)) {
+        /*
+         * Table entry. The top five bits are attributes which may
+         * propagate down through lower levels of the table (and
+         * which are all arranged so that 0 means "no effect", so
+         * we can gather them up by ORing in the bits at each level).
+         */
+        tableattrs |= extract64(descriptor, 59, 5);
+        level++;
+        indexmask = indexmask_grainsize;
+        goto next_level;
+    }
+
+    /*
+     * Block entry at level 1 or 2, or page entry at level 3.
+     * These are basically the same thing, although the number
+     * of bits we pull in from the vaddr varies. Note that although
+     * descaddrmask masks enough of the low bits of the descriptor
+     * to give a correct page or table address, the address field
+     * in a block descriptor is smaller; so we need to explicitly
+     * clear the lower bits here before ORing in the low vaddr bits.
+     */
+    page_size = (1ULL << ((stride * (4 - level)) + 3));
+    descaddr &= ~(hwaddr)(page_size - 1);
+    descaddr |= (address & (page_size - 1));
+    /* Extract attributes from the descriptor */
+    attrs = extract64(descriptor, 2, 10)
+        | (extract64(descriptor, 52, 12) << 10);
+
+    if (regime_is_stage2(mmu_idx)) {
+        /* Stage 2 table descriptors do not include any attribute fields */
+        goto skip_attrs;
+    }
+    /* Merge in attributes from table descriptors */
+    attrs |= nstable << 3; /* NS */
+    guarded = extract64(descriptor, 50, 1);  /* GP */
+    if (param.hpd) {
+        /* HPD disables all the table attributes except NSTable.  */
+        goto skip_attrs;
+    }
+    attrs |= extract32(tableattrs, 0, 2) << 11;     /* XN, PXN */
+    /*
+     * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
+     * means "force PL1 access only", which means forcing AP[1] to 0.
+     */
+    attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
+    attrs |= extract32(tableattrs, 3, 1) << 5;      /* APT[1] => AP[2] */
+ skip_attrs:
+
     /*
      * Here descaddr is the final physical address, and attributes
      * are all in attrs.
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Always overriding fi->type was incorrect, as we would not properly
propagate the fault type from S1_ptw_translate, or arm_ldq_ptw.
Simplify things by providing a new label for a translation fault.
For other faults, store into fi directly.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20221024051851.3074715-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     ARMCPU *cpu = env_archcpu(env);
     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
     bool is_secure = ptw->in_secure;
-    /* Read an LPAE long-descriptor translation table. */
-    ARMFaultType fault_type = ARMFault_Translation;
     uint32_t level;
     ARMVAParameters param;
     uint64_t ttbr;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
          * so our choice is to always raise the fault.
          */
         if (param.tsz_oob) {
-            fault_type = ARMFault_Translation;
-            goto do_fault;
+            goto do_translation_fault;
         }
 
         addrsize = 64 - 8 * param.tbi;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
                                            addrsize - inputsize);
         if (-top_bits != param.select) {
             /* The gap between the two regions is a Translation fault */
-            fault_type = ARMFault_Translation;
-            goto do_fault;
+            goto do_translation_fault;
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
          * Translation table walk disabled => Translation fault on TLB miss
          * Note: This is always 0 on 64-bit EL2 and EL3.
          */
-        goto do_fault;
+        goto do_translation_fault;
     }
 
     if (!regime_is_stage2(mmu_idx)) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         if (param.ds && stride == 9 && sl2) {
             if (sl0 != 0) {
                 level = 0;
-                fault_type = ARMFault_Translation;
-                goto do_fault;
+                goto do_translation_fault;
             }
             startlevel = -1;
         } else if (!aarch64 || stride == 9) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         ok = check_s2_mmu_setup(cpu, aarch64, startlevel,
                                 inputsize, stride, outputsize);
         if (!ok) {
-            fault_type = ARMFault_Translation;
-            goto do_fault;
+            goto do_translation_fault;
         }
         level = startlevel;
     }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         descaddr |= extract64(ttbr, 2, 4) << 48;
     } else if (descaddr >> outputsize) {
         level = 0;
-        fault_type = ARMFault_AddressSize;
+        fi->type = ARMFault_AddressSize;
         goto do_fault;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
 
     if (!(descriptor & 1) || (!(descriptor & 2) && (level == 3))) {
         /* Invalid, or the Reserved level 3 encoding */
-        goto do_fault;
+        goto do_translation_fault;
     }
 
     descaddr = descriptor & descaddrmask;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
             descaddr |= extract64(descriptor, 12, 4) << 48;
         }
     } else if (descaddr >> outputsize) {
-        fault_type = ARMFault_AddressSize;
+        fi->type = ARMFault_AddressSize;
         goto do_fault;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
      * Here descaddr is the final physical address, and attributes
      * are all in attrs.
      */
-    fault_type = ARMFault_AccessFlag;
     if ((attrs & (1 << 8)) == 0) {
         /* Access flag */
+        fi->type = ARMFault_AccessFlag;
         goto do_fault;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         result->f.prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
     }
 
-    fault_type = ARMFault_Permission;
     if (!(result->f.prot & (1 << access_type))) {
+        fi->type = ARMFault_Permission;
         goto do_fault;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     result->f.lg_page_size = ctz64(page_size);
     return false;
 
-do_fault:
-    fi->type = fault_type;
+ do_translation_fault:
+    fi->type = ARMFault_Translation;
+ do_fault:
     fi->level = level;
     /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
     fi->stage2 = fi->s1ptw || regime_is_stage2(mmu_idx);
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Leave the upper and lower attributes in the place they originate
from in the descriptor.  Shifting them around is confusing, since
one cannot read the bit numbers out of the manual.  Also, new
attributes have been added which would alter the shifts.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221024051851.3074715-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     hwaddr descaddr, indexmask, indexmask_grainsize;
     uint32_t tableattrs;
     target_ulong page_size;
-    uint32_t attrs;
+    uint64_t attrs;
     int32_t stride;
     int addrsize, inputsize, outputsize;
     uint64_t tcr = regime_tcr(env, mmu_idx);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     descaddr &= ~(hwaddr)(page_size - 1);
     descaddr |= (address & (page_size - 1));
     /* Extract attributes from the descriptor */
-    attrs = extract64(descriptor, 2, 10)
-        | (extract64(descriptor, 52, 12) << 10);
+    attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(52, 12));
 
     if (regime_is_stage2(mmu_idx)) {
         /* Stage 2 table descriptors do not include any attribute fields */
         goto skip_attrs;
     }
     /* Merge in attributes from table descriptors */
-    attrs |= nstable << 3; /* NS */
+    attrs |= nstable << 5; /* NS */
     guarded = extract64(descriptor, 50, 1);  /* GP */
     if (param.hpd) {
         /* HPD disables all the table attributes except NSTable.  */
         goto skip_attrs;
     }
-    attrs |= extract32(tableattrs, 0, 2) << 11;     /* XN, PXN */
+    attrs |= extract64(tableattrs, 0, 2) << 53;     /* XN, PXN */
     /*
      * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
      * means "force PL1 access only", which means forcing AP[1] to 0.
      */
-    attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
-    attrs |= extract32(tableattrs, 3, 1) << 5;      /* APT[1] => AP[2] */
+    attrs &= ~(extract64(tableattrs, 2, 1) << 6);   /* !APT[0] => AP[1] */
+    attrs |= extract32(tableattrs, 3, 1) << 7;      /* APT[1] => AP[2] */
  skip_attrs:
 
     /*
      * Here descaddr is the final physical address, and attributes
      * are all in attrs.
      */
-    if ((attrs & (1 << 8)) == 0) {
+    if ((attrs & (1 << 10)) == 0) {
         /* Access flag */
         fi->type = ARMFault_AccessFlag;
         goto do_fault;
     }
 
-    ap = extract32(attrs, 4, 2);
+    ap = extract32(attrs, 6, 2);
 
     if (regime_is_stage2(mmu_idx)) {
         ns = mmu_idx == ARMMMUIdx_Stage2;
-        xn = extract32(attrs, 11, 2);
+        xn = extract64(attrs, 53, 2);
         result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
     } else {
-        ns = extract32(attrs, 3, 1);
-        xn = extract32(attrs, 12, 1);
-        pxn = extract32(attrs, 11, 1);
+        ns = extract32(attrs, 5, 1);
+        xn = extract64(attrs, 54, 1);
+        pxn = extract64(attrs, 53, 1);
         result->f.prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
 
     if (regime_is_stage2(mmu_idx)) {
         result->cacheattrs.is_s2_format = true;
-        result->cacheattrs.attrs = extract32(attrs, 0, 4);
+        result->cacheattrs.attrs = extract32(attrs, 2, 4);
     } else {
         /* Index into MAIR registers for cache attributes */
-        uint8_t attrindx = extract32(attrs, 0, 3);
+        uint8_t attrindx = extract32(attrs, 2, 3);
         uint64_t mair = env->cp15.mair_el[regime_el(env, mmu_idx)];
         assert(attrindx <= 7);
         result->cacheattrs.is_s2_format = false;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     if (param.ds) {
         result->cacheattrs.shareability = param.sh;
     } else {
-        result->cacheattrs.shareability = extract32(attrs, 6, 2);
+        result->cacheattrs.shareability = extract32(attrs, 8, 2);
     }
 
     result->f.phys_addr = descaddr;
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Both GP and DBM are in the upper attribute block.
Extend the computation of attrs to include them,
then simplify the setting of guarded.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20221024051851.3074715-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Replace some gotos with some nested if statements.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20221024051851.3074715-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     page_size = (1ULL << ((stride * (4 - level)) + 3));
     descaddr &= ~(hwaddr)(page_size - 1);
     descaddr |= (address & (page_size - 1));
-    /* Extract attributes from the descriptor */
-    attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
 
-    if (regime_is_stage2(mmu_idx)) {
-        /* Stage 2 table descriptors do not include any attribute fields */
-        goto skip_attrs;
-    }
-    /* Merge in attributes from table descriptors */
-    attrs |= nstable << 5; /* NS */
-    if (param.hpd) {
-        /* HPD disables all the table attributes except NSTable.  */
-        goto skip_attrs;
-    }
-    attrs |= extract64(tableattrs, 0, 2) << 53;     /* XN, PXN */
     /*
-     * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
-     * means "force PL1 access only", which means forcing AP[1] to 0.
+     * Extract attributes from the descriptor, and apply table descriptors.
+     * Stage 2 table descriptors do not include any attribute fields.
+     * HPD disables all the table attributes except NSTable.
      */
-    attrs &= ~(extract64(tableattrs, 2, 1) << 6);   /* !APT[0] => AP[1] */
-    attrs |= extract32(tableattrs, 3, 1) << 7;      /* APT[1] => AP[2] */
- skip_attrs:
+    attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
+    if (!regime_is_stage2(mmu_idx)) {
+        attrs |= nstable << 5; /* NS */
+        if (!param.hpd) {
+            attrs |= extract64(tableattrs, 0, 2) << 53;     /* XN, PXN */
+            /*
+             * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
+             * means "force PL1 access only", which means forcing AP[1] to 0.
+             */
+            attrs &= ~(extract64(tableattrs, 2, 1) << 6); /* !APT[0] => AP[1] */
+            attrs |= extract32(tableattrs, 3, 1) << 7;    /* APT[1] => AP[2] */
+        }
+    }
 
     /*
      * Here descaddr is the final physical address, and attributes
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Perform the atomic update for hardware management of the access flag.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221024051851.3074715-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/emulation.rst |   1 +
 target/arm/cpu64.c            |   1 +
 target/arm/ptw.c              | 176 +++++++++++++++++++++++++++++-----
 3 files changed, 156 insertions(+), 22 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_FlagM (Flag manipulation instructions v2)
 - FEAT_FlagM2 (Enhancements to flag manipulation instructions)
 - FEAT_GTG (Guest translation granule size)
+- FEAT_HAFDBS (Hardware management of the access flag and dirty bit state)
 - FEAT_HCX (Support for the HCRX_EL2 register)
 - FEAT_HPDS (Hierarchical permission disables)
 - FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
     cpu->isar.id_aa64mmfr0 = t;
 
     t = cpu->isar.id_aa64mmfr1;
+    t = FIELD_DP64(t, ID_AA64MMFR1, HAFDBS, 1);   /* FEAT_HAFDBS, AF only */
     t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* FEAT_VMID16 */
     t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);       /* FEAT_VHE */
     t = FIELD_DP64(t, ID_AA64MMFR1, HPDS, 1);     /* FEAT_HPDS */
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
     bool in_secure;
     bool in_debug;
     bool out_secure;
+    bool out_rw;
     bool out_be;
+    hwaddr out_virt;
     hwaddr out_phys;
     void *out_host;
 } S1Translate;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
     uint8_t pte_attrs;
     bool pte_secure;
 
+    ptw->out_virt = addr;
+
     if (unlikely(ptw->in_debug)) {
         /*
          * From gdbstub, do not use softmmu so that we don't modify the
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
             pte_secure = is_secure;
         }
         ptw->out_host = NULL;
+        ptw->out_rw = false;
     } else {
         CPUTLBEntryFull *full;
         int flags;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
             goto fail;
         }
         ptw->out_phys = full->phys_addr;
+        ptw->out_rw = full->prot & PROT_WRITE;
         pte_attrs = full->pte_attrs;
         pte_secure = full->attrs.secure;
     }
@@ -XXX,XX +XXX,XX @@ static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw,
                             ARMMMUFaultInfo *fi)
 {
     CPUState *cs = env_cpu(env);
+    void *host = ptw->out_host;
     uint32_t data;
 
-    if (likely(ptw->out_host)) {
+    if (likely(host)) {
         /* Page tables are in RAM, and we have the host address. */
+        data = qatomic_read((uint32_t *)host);
         if (ptw->out_be) {
-            data = ldl_be_p(ptw->out_host);
+            data = be32_to_cpu(data);
         } else {
-            data = ldl_le_p(ptw->out_host);
+            data = le32_to_cpu(data);
         }
     } else {
         /* Page tables are in MMIO. */
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw,
                             ARMMMUFaultInfo *fi)
 {
     CPUState *cs = env_cpu(env);
+    void *host = ptw->out_host;
     uint64_t data;
 
-    if (likely(ptw->out_host)) {
+    if (likely(host)) {
         /* Page tables are in RAM, and we have the host address. */
+#ifdef CONFIG_ATOMIC64
+        data = qatomic_read__nocheck((uint64_t *)host);
         if (ptw->out_be) {
-            data = ldq_be_p(ptw->out_host);
+            data = be64_to_cpu(data);
         } else {
-            data = ldq_le_p(ptw->out_host);
+            data = le64_to_cpu(data);
         }
+#else
+        if (ptw->out_be) {
+            data = ldq_be_p(host);
+        } else {
+            data = ldq_le_p(host);
+        }
+#endif
     } else {
         /* Page tables are in MMIO. */
         MemTxAttrs attrs = { .secure = ptw->out_secure };
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw,
     return data;
 }
 
+static uint64_t arm_casq_ptw(CPUARMState *env, uint64_t old_val,
+                             uint64_t new_val, S1Translate *ptw,
+                             ARMMMUFaultInfo *fi)
+{
+    uint64_t cur_val;
+    void *host = ptw->out_host;
+
+    if (unlikely(!host)) {
+        fi->type = ARMFault_UnsuppAtomicUpdate;
+        fi->s1ptw = true;
+        return 0;
+    }
+
+    /*
+     * Raising a stage2 Protection fault for an atomic update to a read-only
+     * page is delayed until it is certain that there is a change to make.
+     */
+    if (unlikely(!ptw->out_rw)) {
+        int flags;
+        void *discard;
+
+        env->tlb_fi = fi;
+        flags = probe_access_flags(env, ptw->out_virt, MMU_DATA_STORE,
+                                   arm_to_core_mmu_idx(ptw->in_ptw_idx),
+                                   true, &discard, 0);
+        env->tlb_fi = NULL;
+
+        if (unlikely(flags & TLB_INVALID_MASK)) {
+            assert(fi->type != ARMFault_None);
+            fi->s2addr = ptw->out_virt;
+            fi->stage2 = true;
+            fi->s1ptw = true;
+            fi->s1ns = !ptw->in_secure;
+            return 0;
+        }
+
+        /* In case CAS mismatches and we loop, remember writability. */
+        ptw->out_rw = true;
+    }
+
+#ifdef CONFIG_ATOMIC64
+    if (ptw->out_be) {
+        old_val = cpu_to_be64(old_val);
+        new_val = cpu_to_be64(new_val);
+        cur_val = qatomic_cmpxchg__nocheck((uint64_t *)host, old_val, new_val);
+        cur_val = be64_to_cpu(cur_val);
+    } else {
+        old_val = cpu_to_le64(old_val);
+        new_val = cpu_to_le64(new_val);
+        cur_val = qatomic_cmpxchg__nocheck((uint64_t *)host, old_val, new_val);
+        cur_val = le64_to_cpu(cur_val);
+    }
+#else
+    /*
+     * We can't support the full 64-bit atomic cmpxchg on the host.
+     * Because this is only used for FEAT_HAFDBS, which is only for AA64,
+     * we know that TCG_OVERSIZED_GUEST is set, which means that we are
+     * running in round-robin mode and could only race with dma i/o.
+     */
+#ifndef TCG_OVERSIZED_GUEST
+# error "Unexpected configuration"
+#endif
+    bool locked = qemu_mutex_iothread_locked();
+    if (!locked) {
+       qemu_mutex_lock_iothread();
+    }
+    if (ptw->out_be) {
+        cur_val = ldq_be_p(host);
+        if (cur_val == old_val) {
+            stq_be_p(host, new_val);
+        }
+    } else {
+        cur_val = ldq_le_p(host);
+        if (cur_val == old_val) {
+            stq_le_p(host, new_val);
+        }
+    }
+    if (!locked) {
+        qemu_mutex_unlock_iothread();
+    }
+#endif
+
+    return cur_val;
+}
+
 static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
                                      uint32_t *table, uint32_t address)
 {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     uint32_t el = regime_el(env, mmu_idx);
     uint64_t descaddrmask;
     bool aarch64 = arm_el_is_aa64(env, el);
-    uint64_t descriptor;
+    uint64_t descriptor, new_descriptor;
     bool nstable;
 
     /* TODO: This code does not support shareability levels. */
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     if (fi->type != ARMFault_None) {
         goto do_fault;
     }
+    new_descriptor = descriptor;
 
+ restart_atomic_update:
     if (!(descriptor & 1) || (!(descriptor & 2) && (level == 3))) {
         /* Invalid, or the Reserved level 3 encoding */
         goto do_translation_fault;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
      * to give a correct page or table address, the address field
      * in a block descriptor is smaller; so we need to explicitly
      * clear the lower bits here before ORing in the low vaddr bits.
+     *
+     * Afterward, descaddr is the final physical address.
      */
     page_size = (1ULL << ((stride * (4 - level)) + 3));
     descaddr &= ~(hwaddr)(page_size - 1);
     descaddr |= (address & (page_size - 1));
 
+    if (likely(!ptw->in_debug)) {
+        /*
+         * Access flag.
+         * If HA is enabled, prepare to update the descriptor below.
+         * Otherwise, pass the access fault on to software.
+         */
+        if (!(descriptor & (1 << 10))) {
+            if (param.ha) {
+                new_descriptor |= 1 << 10; /* AF */
+            } else {
+                fi->type = ARMFault_AccessFlag;
+                goto do_fault;
+            }
+        }
+    }
+
     /*
-     * Extract attributes from the descriptor, and apply table descriptors.
-     * Stage 2 table descriptors do not include any attribute fields.
-     * HPD disables all the table attributes except NSTable.
+     * Extract attributes from the (modified) descriptor, and apply
+     * table descriptors. Stage 2 table descriptors do not include
+     * any attribute fields. HPD disables all the table attributes
+     * except NSTable.
      */
-    attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
+    attrs = new_descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
     if (!regime_is_stage2(mmu_idx)) {
         attrs |= nstable << 5; /* NS */
         if (!param.hpd) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         }
     }
 
-    /*
-     * Here descaddr is the final physical address, and attributes
-     * are all in attrs.
-     */
-    if ((attrs & (1 << 10)) == 0) {
-        /* Access flag */
-        fi->type = ARMFault_AccessFlag;
-        goto do_fault;
-    }
-
     ap = extract32(attrs, 6, 2);
-
     if (regime_is_stage2(mmu_idx)) {
         ns = mmu_idx == ARMMMUIdx_Stage2;
         xn = extract64(attrs, 53, 2);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         goto do_fault;
     }
 
+    /* If FEAT_HAFDBS has made changes, update the PTE. */
+    if (new_descriptor != descriptor) {
+        new_descriptor = arm_casq_ptw(env, descriptor, new_descriptor, ptw, fi);
+        if (fi->type != ARMFault_None) {
+            goto do_fault;
+        }
+        /*
+         * I_YZSVV says that if the in-memory descriptor has changed,
+         * then we must use the information in that new value
+         * (which might include a different output address, different
+         * attributes, or generate a fault).
+         * Restart the handling of the descriptor value from scratch.
+         */
+        if (new_descriptor != descriptor) {
+            descriptor = new_descriptor;
+            goto restart_atomic_update;
+        }
+    }
+
     if (ns) {
         /*
          * The NS bit will (as required by the architecture) have no effect if
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Perform the atomic update for hardware management of the dirty bit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221024051851.3074715-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu64.c |  2 +-
 target/arm/ptw.c   | 16 ++++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
     cpu->isar.id_aa64mmfr0 = t;
 
     t = cpu->isar.id_aa64mmfr1;
-    t = FIELD_DP64(t, ID_AA64MMFR1, HAFDBS, 1);   /* FEAT_HAFDBS, AF only */
+    t = FIELD_DP64(t, ID_AA64MMFR1, HAFDBS, 2);   /* FEAT_HAFDBS */
     t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* FEAT_VMID16 */
     t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);       /* FEAT_VHE */
     t = FIELD_DP64(t, ID_AA64MMFR1, HPDS, 1);     /* FEAT_HPDS */
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
                 goto do_fault;
             }
         }
+
+        /*
+         * Dirty Bit.
+         * If HD is enabled, pre-emptively set/clear the appropriate AP/S2AP
+         * bit for writeback. The actual write protection test may still be
+         * overridden by tableattrs, to be merged below.
+         */
+        if (param.hd
+            && extract64(descriptor, 51, 1)  /* DBM */
+            && access_type == MMU_DATA_STORE) {
+            if (regime_is_stage2(mmu_idx)) {
+                new_descriptor |= 1ull << 7;    /* set S2AP[1] */
+            } else {
+                new_descriptor &= ~(1ull << 7); /* clear AP[2] */
+            }
+        }
     }
 
     /*
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

We had only been reporting the stage2 page size.  This causes
problems if stage1 is using a larger page size (16k, 2M, etc),
but stage2 is using a smaller page size, because cputlb does
not set large_page_{addr,mask} properly.

Fix by using the max of the two page sizes.

Reported-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221024051851.3074715-15-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
                                    ARMMMUFaultInfo *fi)
 {
     hwaddr ipa;
-    int s1_prot;
+    int s1_prot, s1_lgpgsz;
     bool is_secure = ptw->in_secure;
     bool ret, ipa_secure, s2walk_secure;
     ARMCacheAttrs cacheattrs1;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
      * Save the stage1 results so that we may merge prot and cacheattrs later.
      */
     s1_prot = result->f.prot;
+    s1_lgpgsz = result->f.lg_page_size;
     cacheattrs1 = result->cacheattrs;
     memset(result, 0, sizeof(*result));
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
         return ret;
     }
 
+    /*
+     * Use the maximum of the S1 & S2 page size, so that invalidation
+     * of pages > TARGET_PAGE_SIZE works correctly.
+     */
+    if (result->f.lg_page_size < s1_lgpgsz) {
+        result->f.lg_page_size = s1_lgpgsz;
+    }
+
     /* Combine the S1 and S2 cache attributes. */
     hcr = arm_hcr_el2_eff_secstate(env, is_secure);
     if (hcr & HCR_DC) {
-- 
2.25.1

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

Snapshot loading only expects to call deterministic handlers, not
non-deterministic ones. So introduce a way of registering handlers that
won't be called when reseting for snapshots.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-2-Jason@zx2c4.com
[PMM: updated json doc comment with Markus' text; fixed
 checkpatch style nit]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 qapi/run-state.json        |  6 +++++-
 include/hw/boards.h        |  2 +-
 include/sysemu/reset.h     |  5 ++++-
 hw/arm/aspeed.c            |  4 ++--
 hw/arm/mps2-tz.c           |  4 ++--
 hw/core/reset.c            | 17 ++++++++++++++++-
 hw/hppa/machine.c          |  4 ++--
 hw/i386/microvm.c          |  4 ++--
 hw/i386/pc.c               |  6 +++---
 hw/ppc/pegasos2.c          |  4 ++--
 hw/ppc/pnv.c               |  4 ++--
 hw/ppc/spapr.c             |  4 ++--
 hw/s390x/s390-virtio-ccw.c |  4 ++--
 migration/savevm.c         |  2 +-
 softmmu/runstate.c         | 11 ++++++++---
 15 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/qapi/run-state.json b/qapi/run-state.json
index XXXXXXX..XXXXXXX 100644
--- a/qapi/run-state.json
+++ b/qapi/run-state.json
@@ -XXX,XX +XXX,XX @@
 #                   ignores --no-reboot. This is useful for sanitizing
 #                   hypercalls on s390 that are used during kexec/kdump/boot
 #
+# @snapshot-load: A snapshot is being loaded by the record & replay
+#                 subsystem. This value is used only within QEMU.  It
+#                 doesn't occur in QMP. (since 7.2)
+#
 ##
 { 'enum': 'ShutdownCause',
   # Beware, shutdown_caused_by_guest() depends on enumeration order
   'data': [ 'none', 'host-error', 'host-qmp-quit', 'host-qmp-system-reset',
             'host-signal', 'host-ui', 'guest-shutdown', 'guest-reset',
-            'guest-panic', 'subsystem-reset'] }
+            'guest-panic', 'subsystem-reset', 'snapshot-load'] }
 
 ##
 # @StatusInfo:
diff --git a/include/hw/boards.h b/include/hw/boards.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -XXX,XX +XXX,XX @@ struct MachineClass {
     const char *deprecation_reason;
 
     void (*init)(MachineState *state);
-    void (*reset)(MachineState *state);
+    void (*reset)(MachineState *state, ShutdownCause reason);
     void (*wakeup)(MachineState *state);
     int (*kvm_type)(MachineState *machine, const char *arg);
 
diff --git a/include/sysemu/reset.h b/include/sysemu/reset.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/reset.h
+++ b/include/sysemu/reset.h
@@ -XXX,XX +XXX,XX @@
 #ifndef QEMU_SYSEMU_RESET_H
 #define QEMU_SYSEMU_RESET_H
 
+#include "qapi/qapi-events-run-state.h"
+
 typedef void QEMUResetHandler(void *opaque);
 
 void qemu_register_reset(QEMUResetHandler *func, void *opaque);
+void qemu_register_reset_nosnapshotload(QEMUResetHandler *func, void *opaque);
 void qemu_unregister_reset(QEMUResetHandler *func, void *opaque);
-void qemu_devices_reset(void);
+void qemu_devices_reset(ShutdownCause reason);
 
 #endif
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_bletchley_class_init(ObjectClass *oc, void *data)
         aspeed_soc_num_cpus(amc->soc_name);
 }
 
-static void fby35_reset(MachineState *state)
+static void fby35_reset(MachineState *state, ShutdownCause reason)
 {
     AspeedMachineState *bmc = ASPEED_MACHINE(state);
     AspeedGPIOState *gpio = &bmc->soc.gpio;
 
-    qemu_devices_reset();
+    qemu_devices_reset(reason);
 
     /* Board ID: 7 (Class-1, 4 slots) */
     object_property_set_bool(OBJECT(gpio), "gpioV4", true, &error_fatal);
diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void mps2_set_remap(Object *obj, const char *value, Error **errp)
     }
 }
 
-static void mps2_machine_reset(MachineState *machine)
+static void mps2_machine_reset(MachineState *machine, ShutdownCause reason)
 {
     MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
 
@@ -XXX,XX +XXX,XX @@ static void mps2_machine_reset(MachineState *machine)
      * reset see the correct mapping.
      */
     remap_memory(mms, mms->remap);
-    qemu_devices_reset();
+    qemu_devices_reset(reason);
 }
 
 static void mps2tz_class_init(ObjectClass *oc, void *data)
diff --git a/hw/core/reset.c b/hw/core/reset.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/core/reset.c
+++ b/hw/core/reset.c
@@ -XXX,XX +XXX,XX @@ typedef struct QEMUResetEntry {
     QTAILQ_ENTRY(QEMUResetEntry) entry;
     QEMUResetHandler *func;
     void *opaque;
+    bool skip_on_snapshot_load;
 } QEMUResetEntry;
 
 static QTAILQ_HEAD(, QEMUResetEntry) reset_handlers =
@@ -XXX,XX +XXX,XX @@ void qemu_register_reset(QEMUResetHandler *func, void *opaque)
     QTAILQ_INSERT_TAIL(&reset_handlers, re, entry);
 }
 
+void qemu_register_reset_nosnapshotload(QEMUResetHandler *func, void *opaque)
+{
+    QEMUResetEntry *re = g_new0(QEMUResetEntry, 1);
+
+    re->func = func;
+    re->opaque = opaque;
+    re->skip_on_snapshot_load = true;
+    QTAILQ_INSERT_TAIL(&reset_handlers, re, entry);
+}
+
 void qemu_unregister_reset(QEMUResetHandler *func, void *opaque)
 {
     QEMUResetEntry *re;
@@ -XXX,XX +XXX,XX @@ void qemu_unregister_reset(QEMUResetHandler *func, void *opaque)
     }
 }
 
-void qemu_devices_reset(void)
+void qemu_devices_reset(ShutdownCause reason)
 {
     QEMUResetEntry *re, *nre;
 
     /* reset all devices */
     QTAILQ_FOREACH_SAFE(re, &reset_handlers, entry, nre) {
+        if (reason == SHUTDOWN_CAUSE_SNAPSHOT_LOAD &&
+            re->skip_on_snapshot_load) {
+            continue;
+        }
         re->func(re->opaque);
     }
 }
diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/hppa/machine.c
+++ b/hw/hppa/machine.c
@@ -XXX,XX +XXX,XX @@ static void machine_hppa_init(MachineState *machine)
     cpu[0]->env.gr[19] = FW_CFG_IO_BASE;
 }
 
-static void hppa_machine_reset(MachineState *ms)
+static void hppa_machine_reset(MachineState *ms, ShutdownCause reason)
 {
     unsigned int smp_cpus = ms->smp.cpus;
     int i;
 
-    qemu_devices_reset();
+    qemu_devices_reset(reason);
 
     /* Start all CPUs at the firmware entry point.
      *  Monarch CPU will initialize firmware, secondary CPUs
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -XXX,XX +XXX,XX @@ static void microvm_machine_state_init(MachineState *machine)
     microvm_devices_init(mms);
 }
 
-static void microvm_machine_reset(MachineState *machine)
+static void microvm_machine_reset(MachineState *machine, ShutdownCause reason)
 {
     MicrovmMachineState *mms = MICROVM_MACHINE(machine);
     CPUState *cs;
@@ -XXX,XX +XXX,XX @@ static void microvm_machine_reset(MachineState *machine)
         mms->kernel_cmdline_fixed = true;
     }
 
-    qemu_devices_reset();
+    qemu_devices_reset(reason);
 
     CPU_FOREACH(cs) {
         cpu = X86_CPU(cs);
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -XXX,XX +XXX,XX @@ static void pc_machine_initfn(Object *obj)
     cxl_machine_init(obj, &pcms->cxl_devices_state);
 }
 
-static void pc_machine_reset(MachineState *machine)
+static void pc_machine_reset(MachineState *machine, ShutdownCause reason)
 {
     CPUState *cs;
     X86CPU *cpu;
 
-    qemu_devices_reset();
+    qemu_devices_reset(reason);
 
     /* Reset APIC after devices have been reset to cancel
      * any changes that qemu_devices_reset() might have done.
@@ -XXX,XX +XXX,XX @@ static void pc_machine_reset(MachineState *machine)
 static void pc_machine_wakeup(MachineState *machine)
 {
     cpu_synchronize_all_states();
-    pc_machine_reset(machine);
+    pc_machine_reset(machine, SHUTDOWN_CAUSE_NONE);
     cpu_synchronize_all_post_reset();
 }
 
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -XXX,XX +XXX,XX @@ static void pegasos2_pci_config_write(Pegasos2MachineState *pm, int bus,
     pegasos2_mv_reg_write(pm, pcicfg + 4, len, val);
 }
 
-static void pegasos2_machine_reset(MachineState *machine)
+static void pegasos2_machine_reset(MachineState *machine, ShutdownCause reason)
 {
     Pegasos2MachineState *pm = PEGASOS2_MACHINE(machine);
     void *fdt;
     uint64_t d[2];
     int sz;
 
-    qemu_devices_reset();
+    qemu_devices_reset(reason);
     if (!pm->vof) {
         return; /* Firmware should set up machine so nothing to do */
     }
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -XXX,XX +XXX,XX @@ static void pnv_powerdown_notify(Notifier *n, void *opaque)
     }
 }
 
-static void pnv_reset(MachineState *machine)
+static void pnv_reset(MachineState *machine, ShutdownCause reason)
 {
     PnvMachineState *pnv = PNV_MACHINE(machine);
     IPMIBmc *bmc;
     void *fdt;
 
-    qemu_devices_reset();
+    qemu_devices_reset(reason);
 
     /*
      * The machine should provide by default an internal BMC simulator.
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -XXX,XX +XXX,XX @@ void spapr_check_mmu_mode(bool guest_radix)
     }
 }
 
-static void spapr_machine_reset(MachineState *machine)
+static void spapr_machine_reset(MachineState *machine, ShutdownCause reason)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(machine);
     PowerPCCPU *first_ppc_cpu;
@@ -XXX,XX +XXX,XX @@ static void spapr_machine_reset(MachineState *machine)
         spapr_setup_hpt(spapr);
     }
 
-    qemu_devices_reset();
+    qemu_devices_reset(reason);
 
     spapr_ovec_cleanup(spapr->ov5_cas);
     spapr->ov5_cas = spapr_ovec_new();
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -XXX,XX +XXX,XX @@ static void s390_pv_prepare_reset(S390CcwMachineState *ms)
     s390_pv_prep_reset();
 }
 
-static void s390_machine_reset(MachineState *machine)
+static void s390_machine_reset(MachineState *machine, ShutdownCause reason)
 {
     S390CcwMachineState *ms = S390_CCW_MACHINE(machine);
     enum s390_reset reset_type;
@@ -XXX,XX +XXX,XX @@ static void s390_machine_reset(MachineState *machine)
             s390_machine_unprotect(ms);
         }
 
-        qemu_devices_reset();
+        qemu_devices_reset(reason);
         s390_crypto_reset();
 
         /* configure and start the ipl CPU only */
diff --git a/migration/savevm.c b/migration/savevm.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -XXX,XX +XXX,XX @@ bool load_snapshot(const char *name, const char *vmstate,
         goto err_drain;
     }
 
-    qemu_system_reset(SHUTDOWN_CAUSE_NONE);
+    qemu_system_reset(SHUTDOWN_CAUSE_SNAPSHOT_LOAD);
     mis->from_src_file = f;
 
     if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index XXXXXXX..XXXXXXX 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -XXX,XX +XXX,XX @@ void qemu_system_reset(ShutdownCause reason)
     cpu_synchronize_all_states();
 
     if (mc && mc->reset) {
-        mc->reset(current_machine);
+        mc->reset(current_machine, reason);
     } else {
-        qemu_devices_reset();
+        qemu_devices_reset(reason);
     }
-    if (reason && reason != SHUTDOWN_CAUSE_SUBSYSTEM_RESET) {
+    switch (reason) {
+    case SHUTDOWN_CAUSE_NONE:
+    case SHUTDOWN_CAUSE_SUBSYSTEM_RESET:
+    case SHUTDOWN_CAUSE_SNAPSHOT_LOAD:
+        break;
+    default:
         qapi_event_send_reset(shutdown_caused_by_guest(reason), reason);
     }
     cpu_synchronize_all_post_reset();
-- 
2.25.1

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

When the system reboots, the rng-seed that the FDT has should be
re-randomized, so that the new boot gets a new seed. Several
architectures require this functionality, so export a function for
injecting a new seed into the given FDT.

Cc: Alistair Francis <alistair.francis@wdc.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20221025004327.568476-3-Jason@zx2c4.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/device_tree.h |  9 +++++++++
 softmmu/device_tree.c        | 21 +++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -XXX,XX +XXX,XX @@ int qemu_fdt_setprop_sized_cells_from_array(void *fdt,
                                                 qdt_tmp);                 \
     })
 
+
+/**
+ * qemu_fdt_randomize_seeds:
+ * @fdt: device tree blob
+ *
+ * Re-randomize all "rng-seed" properties with new seeds.
+ */
+void qemu_fdt_randomize_seeds(void *fdt);
+
 #define FDT_PCI_RANGE_RELOCATABLE          0x80000000
 #define FDT_PCI_RANGE_PREFETCHABLE         0x40000000
 #define FDT_PCI_RANGE_ALIASED              0x20000000
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index XXXXXXX..XXXXXXX 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/option.h"
 #include "qemu/bswap.h"
 #include "qemu/cutils.h"
+#include "qemu/guest-random.h"
 #include "sysemu/device_tree.h"
 #include "hw/loader.h"
 #include "hw/boards.h"
@@ -XXX,XX +XXX,XX @@ void hmp_dumpdtb(Monitor *mon, const QDict *qdict)
 
     info_report("dtb dumped to %s", filename);
 }
+
+void qemu_fdt_randomize_seeds(void *fdt)
+{
+    int noffset, poffset, len;
+    const char *name;
+    uint8_t *data;
+
+    for (noffset = fdt_next_node(fdt, 0, NULL);
+         noffset >= 0;
+         noffset = fdt_next_node(fdt, noffset, NULL)) {
+        for (poffset = fdt_first_property_offset(fdt, noffset);
+             poffset >= 0;
+             poffset = fdt_next_property_offset(fdt, poffset)) {
+            data = (uint8_t *)fdt_getprop_by_offset(fdt, poffset, &name, &len);
+            if (!data || strcmp(name, "rng-seed"))
+                continue;
+            qemu_guest_getrandom_nofail(data, len);
+        }
+    }
+}
-- 
2.25.1

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

Snapshot loading is supposed to be deterministic, so we shouldn't
re-randomize the various seeds used.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-4-Jason@zx2c4.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/i386/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -XXX,XX +XXX,XX @@ void x86_load_linux(X86MachineState *x86ms,
         setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
         setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
         qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
-        qemu_register_reset(reset_rng_seed, setup_data);
+        qemu_register_reset_nosnapshotload(reset_rng_seed, setup_data);
         fw_cfg_add_bytes_callback(fw_cfg, FW_CFG_KERNEL_DATA, reset_rng_seed, NULL,
                                   setup_data, kernel, kernel_size, true);
     } else {
-- 
2.25.1

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

When the system reboots, the rng-seed that the FDT has should be
re-randomized, so that the new boot gets a new seed. Since the FDT is in
the ROM region at this point, we add a hook right after the ROM has been
added, so that we have a pointer to that copy of the FDT.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-arm@nongnu.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-5-Jason@zx2c4.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/boot.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
      * the DTB is copied again upon reset, even if addr points into RAM.
      */
     rom_add_blob_fixed_as("dtb", fdt, size, addr, as);
+    qemu_register_reset_nosnapshotload(qemu_fdt_randomize_seeds,
+                                       rom_ptr_for_as(as, addr, size));
 
     g_free(fdt);
 
-- 
2.25.1

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Alistair Francis <alistair.francis@wdc.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: qemu-riscv@nongnu.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20221025004327.568476-6-Jason@zx2c4.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/riscv/boot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/device_tree.h"
 #include "sysemu/qtest.h"
 #include "sysemu/kvm.h"
+#include "sysemu/reset.h"
 
 #include <libfdt.h>
 
@@ -XXX,XX +XXX,XX @@ uint64_t riscv_load_fdt(hwaddr dram_base, uint64_t mem_size, void *fdt)
 
     rom_add_blob_fixed_as("fdt", fdt, fdtsize, fdt_addr,
                           &address_space_memory);
+    qemu_register_reset_nosnapshotload(qemu_fdt_randomize_seeds,
+                        rom_ptr_for_as(&address_space_memory, fdt_addr, fdtsize));
 
     return fdt_addr;
 }
-- 
2.25.1

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

Snapshot loading is supposed to be deterministic, so we shouldn't
re-randomize the various seeds used.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-7-Jason@zx2c4.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/m68k/virt.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/hw/m68k/virt.c b/hw/m68k/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/m68k/virt.c
+++ b/hw/m68k/virt.c
@@ -XXX,XX +XXX,XX @@ typedef struct {
     M68kCPU *cpu;
     hwaddr initial_pc;
     hwaddr initial_stack;
-    struct bi_record *rng_seed;
 } ResetInfo;
 
 static void main_cpu_reset(void *opaque)
@@ -XXX,XX +XXX,XX @@ static void main_cpu_reset(void *opaque)
     M68kCPU *cpu = reset_info->cpu;
     CPUState *cs = CPU(cpu);
 
-    if (reset_info->rng_seed) {
-        qemu_guest_getrandom_nofail((void *)reset_info->rng_seed->data + 2,
-            be16_to_cpu(*(uint16_t *)reset_info->rng_seed->data));
-    }
-
     cpu_reset(cs);
     cpu->env.aregs[7] = reset_info->initial_stack;
     cpu->env.pc = reset_info->initial_pc;
 }
 
+static void rerandomize_rng_seed(void *opaque)
+{
+    struct bi_record *rng_seed = opaque;
+    qemu_guest_getrandom_nofail((void *)rng_seed->data + 2,
+                                be16_to_cpu(*(uint16_t *)rng_seed->data));
+}
+
 static void virt_init(MachineState *machine)
 {
     M68kCPU *cpu = NULL;
@@ -XXX,XX +XXX,XX @@ static void virt_init(MachineState *machine)
         BOOTINFO0(param_ptr, BI_LAST);
         rom_add_blob_fixed_as("bootinfo", param_blob, param_ptr - param_blob,
                               parameters_base, cs->as);
-        reset_info->rng_seed = rom_ptr_for_as(cs->as, parameters_base,
-                                              param_ptr - param_blob) +
-                               (param_rng_seed - param_blob);
+        qemu_register_reset_nosnapshotload(rerandomize_rng_seed,
+                            rom_ptr_for_as(cs->as, parameters_base,
+                                           param_ptr - param_blob) +
+                            (param_rng_seed - param_blob));
         g_free(param_blob);
     }
 }
-- 
2.25.1

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

Snapshot loading is supposed to be deterministic, so we shouldn't
re-randomize the various seeds used.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-8-Jason@zx2c4.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/m68k/q800.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/m68k/q800.c
+++ b/hw/m68k/q800.c
@@ -XXX,XX +XXX,XX @@ static const TypeInfo glue_info = {
     },
 };
 
-typedef struct {
-    M68kCPU *cpu;
-    struct bi_record *rng_seed;
-} ResetInfo;
-
 static void main_cpu_reset(void *opaque)
 {
-    ResetInfo *reset_info = opaque;
-    M68kCPU *cpu = reset_info->cpu;
+    M68kCPU *cpu = opaque;
     CPUState *cs = CPU(cpu);
 
-    if (reset_info->rng_seed) {
-        qemu_guest_getrandom_nofail((void *)reset_info->rng_seed->data + 2,
-            be16_to_cpu(*(uint16_t *)reset_info->rng_seed->data));
-    }
-
     cpu_reset(cs);
     cpu->env.aregs[7] = ldl_phys(cs->as, 0);
     cpu->env.pc = ldl_phys(cs->as, 4);
 }
 
+static void rerandomize_rng_seed(void *opaque)
+{
+    struct bi_record *rng_seed = opaque;
+    qemu_guest_getrandom_nofail((void *)rng_seed->data + 2,
+                                be16_to_cpu(*(uint16_t *)rng_seed->data));
+}
+
 static uint8_t fake_mac_rom[] = {
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 
@@ -XXX,XX +XXX,XX @@ static void q800_init(MachineState *machine)
     NubusBus *nubus;
     DeviceState *glue;
     DriveInfo *dinfo;
-    ResetInfo *reset_info;
     uint8_t rng_seed[32];
 
     linux_boot = (kernel_filename != NULL);
@@ -XXX,XX +XXX,XX @@ static void q800_init(MachineState *machine)
         exit(1);
     }
 
-    reset_info = g_new0(ResetInfo, 1);
-
     /* init CPUs */
     cpu = M68K_CPU(cpu_create(machine->cpu_type));
-    reset_info->cpu = cpu;
-    qemu_register_reset(main_cpu_reset, reset_info);
+    qemu_register_reset(main_cpu_reset, cpu);
 
     /* RAM */
     memory_region_add_subregion(get_system_memory(), 0, machine->ram);
@@ -XXX,XX +XXX,XX @@ static void q800_init(MachineState *machine)
         BOOTINFO0(param_ptr, BI_LAST);
         rom_add_blob_fixed_as("bootinfo", param_blob, param_ptr - param_blob,
                               parameters_base, cs->as);
-        reset_info->rng_seed = rom_ptr_for_as(cs->as, parameters_base,
-                                              param_ptr - param_blob) +
-                               (param_rng_seed - param_blob);
+        qemu_register_reset_nosnapshotload(rerandomize_rng_seed,
+                            rom_ptr_for_as(cs->as, parameters_base,
+                                           param_ptr - param_blob) +
+                            (param_rng_seed - param_blob));
         g_free(param_blob);
     } else {
         uint8_t *ptr;
-- 
2.25.1

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

Cc: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-9-Jason@zx2c4.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/mips/boston.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/mips/boston.c b/hw/mips/boston.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/mips/boston.c
+++ b/hw/mips/boston.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/sysemu.h"
 #include "sysemu/qtest.h"
 #include "sysemu/runstate.h"
+#include "sysemu/reset.h"
 
 #include <libfdt.h>
 #include "qom/object.h"
@@ -XXX,XX +XXX,XX @@ static void boston_mach_init(MachineState *machine)
             /* Calculate real fdt size after filter */
             dt_size = fdt_totalsize(dtb_load_data);
             rom_add_blob_fixed("dtb", dtb_load_data, dt_size, dtb_paddr);
+            qemu_register_reset_nosnapshotload(qemu_fdt_randomize_seeds,
+                                rom_ptr(dtb_paddr, dt_size));
         } else {
             /* Try to load file as FIT */
             fit_err = load_fit(&boston_fit_loader, machine->kernel_filename, s);
-- 
2.25.1

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

Cc: Stafford Horne <shorne@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-11-Jason@zx2c4.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/openrisc/boot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/openrisc/boot.c b/hw/openrisc/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/openrisc/boot.c
+++ b/hw/openrisc/boot.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/openrisc/boot.h"
 #include "sysemu/device_tree.h"
 #include "sysemu/qtest.h"
+#include "sysemu/reset.h"
 
 #include <libfdt.h>
 
@@ -XXX,XX +XXX,XX @@ uint32_t openrisc_load_fdt(void *fdt, hwaddr load_start,
 
     rom_add_blob_fixed_as("fdt", fdt, fdtsize, fdt_addr,
                           &address_space_memory);
+    qemu_register_reset_nosnapshotload(qemu_fdt_randomize_seeds,
+                        rom_ptr_for_as(&address_space_memory, fdt_addr, fdtsize));
 
     return fdt_addr;
 }
-- 
2.25.1

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-12-Jason@zx2c4.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/rx/rx-gdbsim.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/rx/rx-gdbsim.c b/hw/rx/rx-gdbsim.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/rx/rx-gdbsim.c
+++ b/hw/rx/rx-gdbsim.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/rx/rx62n.h"
 #include "sysemu/qtest.h"
 #include "sysemu/device_tree.h"
+#include "sysemu/reset.h"
 #include "hw/boards.h"
 #include "qom/object.h"
 
@@ -XXX,XX +XXX,XX @@ static void rx_gdbsim_init(MachineState *machine)
             dtb_offset = ROUND_DOWN(machine->ram_size - dtb_size, 16);
             rom_add_blob_fixed("dtb", dtb, dtb_size,
                                SDRAM_BASE + dtb_offset);
+            qemu_register_reset_nosnapshotload(qemu_fdt_randomize_seeds,
+                                rom_ptr(SDRAM_BASE + dtb_offset, dtb_size));
             /* Set dtb address to R1 */
             RX_CPU(first_cpu)->env.regs[1] = SDRAM_BASE + dtb_offset;
         }
-- 
2.25.1