Series comparison

-[PULL 00/26] target-arm queue
+[PULL 00/33] target-arm queue
-Small pile of bug fixes for rc1. I've included my patches to get
+The following changes since commit bf4460a8d9a86f6cfe05d7a7f470c48e3a93d8b2:
 our docs building with Sphinx 3, just for convenience...
--- PMM
+  Merge tag 'pull-tcg-20230123' of https://gitlab.com/rth7680/qemu into staging (2023-02-03 09:30:45 +0000)
 The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:
   Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230203
-for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:
+for you to fetch changes up to bb18151d8bd9bedc497ee9d4e8d81b39a4e5bbf6:
-  tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)
+  target/arm: Enable FEAT_FGT on '-cpu max' (2023-02-03 12:59:24 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * target/arm: Fix Neon emulation bugs on big-endian hosts
+ * Fix physical address resolution for Stage2
- * target/arm: fix handling of HCR.FB
+ * pl011: refactoring, implement reset method
- * target/arm: fix LORID_EL1 access check
+ * Support GICv3 with hvf acceleration
- * disas/capstone: Fix monitor disassembly of >32 bytes
+ * sbsa-ref: remove cortex-a76 from list of supported cpus
- * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+ * Correct syndrome for ATS12NSO* traps at Secure EL1
- * hw/arm/boot: fix SVE for EL3 direct kernel boot
+ * Fix priority of HSTR_EL2 traps vs UNDEFs
- * hw/display/omap_lcdc: Fix potential NULL pointer dereference
+ * Implement FEAT_FGT for '-cpu max'
  * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
  * target/arm: Get correct MMU index for other-security-state
  * configure: Test that gio libs from pkg-config work
  * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
  * docs: Fix building with Sphinx 3
  * tests/qtest/npcm7xx_rng-test: Disable randomness tests
 ----------------------------------------------------------------
-AlexChen (2):
+Alexander Graf (3):
-      hw/display/omap_lcdc: Fix potential NULL pointer dereference
+      hvf: arm: Add support for GICv3
-      hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
+      hw/arm/virt: Consolidate GIC finalize logic
       hw/arm/virt: Make accels in GIC finalize logic explicit
-Peter Maydell (9):
+Evgeny Iakovlev (4):
-      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
+      hw/char/pl011: refactor FIFO depth handling code
-      target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
+      hw/char/pl011: add post_load hook for backwards-compatibility
-      disas/capstone: Fix monitor disassembly of >32 bytes
+      hw/char/pl011: implement a reset method
-      target/arm: Get correct MMU index for other-security-state
+      hw/char/pl011: better handling of FIFO flags on LCR reset
       configure: Test that gio libs from pkg-config work
       hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
       scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
       qemu-option-trace.rst.inc: Don't use option:: markup
       tests/qtest/npcm7xx_rng-test: Disable randomness tests
-Philippe Mathieu-Daudé (1):
+Marcin Juszkiewicz (1):
-      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+      sbsa-ref: remove cortex-a76 from list of supported cpus
-Richard Henderson (11):
+Peter Maydell (23):
-      target/arm: Introduce neon_full_reg_offset
+      target/arm: Name AT_S1E1RP and AT_S1E1WP cpregs correctly
-      target/arm: Move neon_element_offset to translate.c
+      target/arm: Correct syndrome for ATS12NSO* at Secure EL1
-      target/arm: Use neon_element_offset in neon_load/store_reg
+      target/arm: Remove CP_ACCESS_TRAP_UNCATEGORIZED_{EL2, EL3}
-      target/arm: Use neon_element_offset in vfp_reg_offset
+      target/arm: Move do_coproc_insn() syndrome calculation earlier
-      target/arm: Add read/write_neon_element32
+      target/arm: All UNDEF-at-EL0 traps take priority over HSTR_EL2 traps
-      target/arm: Expand read/write_neon_element32 to all MemOp
+      target/arm: Make HSTR_EL2 traps take priority over UNDEF-at-EL1
-      target/arm: Rename neon_load_reg32 to vfp_load_reg32
+      target/arm: Disable HSTR_EL2 traps if EL2 is not enabled
-      target/arm: Add read/write_neon_element64
+      target/arm: Define the FEAT_FGT registers
-      target/arm: Rename neon_load_reg64 to vfp_load_reg64
+      target/arm: Implement FGT trapping infrastructure
-      target/arm: Simplify do_long_3d and do_2scalar_long
+      target/arm: Mark up sysregs for HFGRTR bits 0..11
-      target/arm: Improve do_prewiden_3d
+      target/arm: Mark up sysregs for HFGRTR bits 12..23
       target/arm: Mark up sysregs for HFGRTR bits 24..35
       target/arm: Mark up sysregs for HFGRTR bits 36..63
       target/arm: Mark up sysregs for HDFGRTR bits 0..11
       target/arm: Mark up sysregs for HDFGRTR bits 12..63
       target/arm: Mark up sysregs for HFGITR bits 0..11
       target/arm: Mark up sysregs for HFGITR bits 12..17
       target/arm: Mark up sysregs for HFGITR bits 18..47
       target/arm: Mark up sysregs for HFGITR bits 48..63
       target/arm: Implement the HFGITR_EL2.ERET trap
       target/arm: Implement the HFGITR_EL2.SVC_EL0 and SVC_EL1 traps
       target/arm: Implement MDCR_EL2.TDCC and MDCR_EL3.TDCC traps
       target/arm: Enable FEAT_FGT on '-cpu max'
-Rémi Denis-Courmont (3):
+Richard Henderson (2):
-      target/arm: fix handling of HCR.FB
+      hw/arm: Use TYPE_ARM_SMMUV3
-      target/arm: fix LORID_EL1 access check
+      target/arm: Fix physical address resolution for Stage2
       hw/arm/boot: fix SVE for EL3 direct kernel boot
- docs/qemu-option-trace.rst.inc     |   6 +-
+ docs/system/arm/emulation.rst |   1 +
- configure                          |  10 +-
+ include/hw/arm/virt.h         |  15 +-
- include/hw/intc/arm_gicv3_common.h |   1 -
+ include/hw/char/pl011.h       |   5 +-
- disas/capstone.c                   |   2 +-
+ target/arm/cpregs.h           | 484 +++++++++++++++++++++++++++++++++++++++++-
- hw/arm/boot.c                      |   3 +
+ target/arm/cpu.h              |  18 ++
- hw/arm/smmuv3.c                    |   3 +-
+ target/arm/internals.h        |  20 ++
- hw/display/exynos4210_fimd.c       |   4 +-
+ target/arm/syndrome.h         |  10 +
- hw/display/omap_lcdc.c             |  10 +-
+ target/arm/translate.h        |   6 +
- hw/intc/arm_gicv3_cpuif.c          |   5 +-
+ hw/arm/sbsa-ref.c             |   4 +-
- target/arm/helper.c                |  24 +-
+ hw/arm/virt.c                 | 203 +++++++++---------
- target/arm/m_helper.c              |   3 +-
+ hw/char/pl011.c               |  93 ++++++--
- target/arm/translate.c             | 153 +++++++++---
+ hw/intc/arm_gicv3_cpuif.c     |  18 +-
- target/arm/vec_helper.c            |  12 +-
+ target/arm/cpu64.c            |   1 +
- tests/qtest/npcm7xx_rng-test.c     |  14 +-
+ target/arm/debug_helper.c     |  46 +++-
- scripts/kernel-doc                 |  18 +-
+ target/arm/helper.c           | 245 ++++++++++++++++++++-
- target/arm/translate-neon.c.inc    | 472 ++++++++++++++++++++-----------------
+ target/arm/hvf/hvf.c          | 151 +++++++++++++
- target/arm/translate-vfp.c.inc     | 341 +++++++++++----------------
+ target/arm/op_helper.c        |  58 ++++-
-files changed, 588 insertions(+), 493 deletions(-)
+ target/arm/ptw.c              |   2 +-
+ target/arm/translate-a64.c    |  22 +-
  target/arm/translate.c        | 125 +++++++----
  target/arm/hvf/trace-events   |   2 +
 files changed, 1340 insertions(+), 189 deletions(-)

-[PULL 10/26] target/arm: Simplify do_long_3d and do_2scalar_long
+[PULL 01/33] hw/arm: Use TYPE_ARM_SMMUV3
 From: Richard Henderson <richard.henderson@linaro.org>
-In both cases, we can sink the write-back and perform
+Use the macro instead of two explicit string literals.
 the accumulate into the normal destination temps.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Eric Auger <eric.auger@redhat.com>
 Message-id: 20230124232059.4017615-1-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-neon.c.inc | 23 +++++++++--------------
+ hw/arm/sbsa-ref.c | 3 ++-
-file changed, 9 insertions(+), 14 deletions(-)
+ hw/arm/virt.c     | 2 +-
 files changed, 3 insertions(+), 2 deletions(-)
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
+--- a/hw/arm/sbsa-ref.c
-+++ b/target/arm/translate-neon.c.inc
++++ b/hw/arm/sbsa-ref.c
-@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
+@@ -XXX,XX +XXX,XX @@
-     if (accfn) {
+ #include "exec/hwaddr.h"
-         tmp = tcg_temp_new_i64();
+ #include "kvm_arm.h"
-         read_neon_element64(tmp, a->vd, 0, MO_64);
+ #include "hw/arm/boot.h"
--        accfn(tmp, tmp, rd0);
++#include "hw/arm/smmuv3.h"
--        write_neon_element64(tmp, a->vd, 0, MO_64);
+ #include "hw/block/flash.h"
-+        accfn(rd0, tmp, rd0);
+ #include "hw/boards.h"
-         read_neon_element64(tmp, a->vd, 1, MO_64);
+ #include "hw/ide/internal.h"
--        accfn(tmp, tmp, rd1);
+@@ -XXX,XX +XXX,XX @@ static void create_smmu(const SBSAMachineState *sms, PCIBus *bus)
--        write_neon_element64(tmp, a->vd, 1, MO_64);
+     DeviceState *dev;
-+        accfn(rd1, tmp, rd1);
+     int i;
-         tcg_temp_free_i64(tmp);
--    } else {
+-    dev = qdev_new("arm-smmuv3");
--        write_neon_element64(rd0, a->vd, 0, MO_64);
++    dev = qdev_new(TYPE_ARM_SMMUV3);
--        write_neon_element64(rd1, a->vd, 1, MO_64);
      object_property_set_link(OBJECT(dev), "primary-bus", OBJECT(bus),
                               &error_abort);
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void create_smmu(const VirtMachineState *vms,
          return;
      }
-+    write_neon_element64(rd0, a->vd, 0, MO_64);
+-    dev = qdev_new("arm-smmuv3");
-+    write_neon_element64(rd1, a->vd, 1, MO_64);
++    dev = qdev_new(TYPE_ARM_SMMUV3);
-     tcg_temp_free_i64(rd0);
-     tcg_temp_free_i64(rd1);
+     object_property_set_link(OBJECT(dev), "primary-bus", OBJECT(bus),
+                              &error_abort);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      if (accfn) {
          TCGv_i64 t64 = tcg_temp_new_i64();
          read_neon_element64(t64, a->vd, 0, MO_64);
 -        accfn(t64, t64, rn0_64);
 -        write_neon_element64(t64, a->vd, 0, MO_64);
 +        accfn(rn0_64, t64, rn0_64);
          read_neon_element64(t64, a->vd, 1, MO_64);
 -        accfn(t64, t64, rn1_64);
 -        write_neon_element64(t64, a->vd, 1, MO_64);
 +        accfn(rn1_64, t64, rn1_64);
          tcg_temp_free_i64(t64);
 -    } else {
 -        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 -        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
 +
 +    write_neon_element64(rn0_64, a->vd, 0, MO_64);
 +    write_neon_element64(rn1_64, a->vd, 1, MO_64);
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
      return true;
 --
-.20.1
+.34.1

-[PULL 16/26] disas/capstone: Fix monitor disassembly of >32 bytes
+[PULL 02/33] target/arm: Fix physical address resolution for Stage2
-If we're using the capstone disassembler, disassembly of a run of
+From: Richard Henderson <richard.henderson@linaro.org>
 instructions more than 32 bytes long disassembles the wrong data for
 instructions beyond the 32 byte mark:
-(qemu) xp /16x 0x100
+Conversion to probe_access_full missed applying the page offset.
 0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
 0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
 0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
 0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
 (qemu) xp /16i 0x100
 x00000100: 00000005 andeq r0, r0, r5
 x00000104: 54410001 strbpl r0, [r1], #-1
 x00000108: 00000001 andeq r0, r0, r1
 x0000010c: 00001000 andeq r1, r0, r0
 x00000110: 00000000 andeq r0, r0, r0
 x00000114: 00000004 andeq r0, r0, r4
 x00000118: 54410002 strbpl r0, [r1], #-2
 x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
 x00000120: 54410001 strbpl r0, [r1], #-1
 x00000124: 00000001 andeq r0, r0, r1
 x00000128: 00001000 andeq r1, r0, r0
 x0000012c: 00000000 andeq r0, r0, r0
 x00000130: 00000004 andeq r0, r0, r4
 x00000134: 54410002 strbpl r0, [r1], #-2
 x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
 x0000013c: 00000000 andeq r0, r0, r0
 Here the disassembly of 0x120..0x13f is using the data that is in
 x104..0x123.
 This is caused by passing the wrong value to the read_memory_func().
 The intention is that at this point in the loop the 'cap_buf' buffer
 already contains 'csize' bytes of data for the instruction at guest
 addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
 extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
 time through the loop 'csize' happens to be zero, so the initial read
 of 32 bytes into cap_buf is correct and as long as the disassembly
 never needs to read more data we return the correct information.
 Use the correct guest address in the call to read_memory_func().
 Cc: qemu-stable@nongnu.org
-Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
+Reported-by: Sid Manning <sidneym@quicinc.com>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Message-id: 20230126233134.103193-1-richard.henderson@linaro.org
 Fixes: f3639a64f602 ("target/arm: Use softmmu tlbs for page table walking")
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
 ---
- disas/capstone.c | 2 +-
+ target/arm/ptw.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/disas/capstone.c b/disas/capstone.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/disas/capstone.c
+--- a/target/arm/ptw.c
-+++ b/disas/capstone.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
+@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
+         if (unlikely(flags & TLB_INVALID_MASK)) {
-         /* Make certain that we can make progress.  */
+             goto fail;
-         assert(tsize != 0);
+         }
--        info->read_memory_func(pc, cap_buf + csize, tsize, info);
+-        ptw->out_phys = full->phys_addr;
-+        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
++        ptw->out_phys = full->phys_addr | (addr & ~TARGET_PAGE_MASK);
-         csize += tsize;
+         ptw->out_rw = full->prot & PAGE_WRITE;
+         pte_attrs = full->pte_attrs;
-         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
+         pte_secure = full->attrs.secure;
 --
-.20.1
+.34.1

-[PULL 09/26] target/arm: Rename neon_load_reg64 to vfp_load_reg64
+[PULL 03/33] hw/char/pl011: refactor FIFO depth handling code
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
-The only uses of this function are for loading VFP
+PL011 can be in either of 2 modes depending guest config: FIFO and
-double-precision values, and nothing to do with NEON.
+single register. The last mode could be viewed as a 1-element-deep FIFO.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Current code open-codes a bunch of depth-dependent logic. Refactor FIFO
-Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
+depth handling code to isolate calculating current FIFO depth.
 One functional (albeit guest-invisible) side-effect of this change is
 that previously we would always increment s->read_pos in UARTDR read
 handler even if FIFO was disabled, now we are limiting read_pos to not
 exceed FIFO depth (read_pos itself is reset to 0 if user disables FIFO).
 Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20230123162304.26254-2-eiakovlev@linux.microsoft.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         |  8 ++--
+ include/hw/char/pl011.h |  5 ++++-
- target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
+ hw/char/pl011.c         | 30 ++++++++++++++++++------------
-files changed, 46 insertions(+), 46 deletions(-)
+files changed, 22 insertions(+), 13 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/include/hw/char/pl011.h b/include/hw/char/pl011.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/include/hw/char/pl011.h
-+++ b/target/arm/translate.c
++++ b/include/hw/char/pl011.h
-@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
+@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(PL011State, PL011)
  /* This shares the same struct (and cast macro) as the base pl011 device */
  #define TYPE_PL011_LUMINARY "pl011_luminary"
 +/* Depth of UART FIFO in bytes, when FIFO mode is enabled (else depth == 1) */
 +#define PL011_FIFO_DEPTH 16
 +
  struct PL011State {
      SysBusDevice parent_obj;
@@ -XXX,XX +XXX,XX @@ struct PL011State {
      uint32_t dmacr;
      uint32_t int_enabled;
      uint32_t int_level;
 -    uint32_t read_fifo[16];
 +    uint32_t read_fifo[PL011_FIFO_DEPTH];
      uint32_t ilpr;
      uint32_t ibrd;
      uint32_t fbrd;
 diff --git a/hw/char/pl011.c b/hw/char/pl011.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/char/pl011.c
 +++ b/hw/char/pl011.c
@@ -XXX,XX +XXX,XX @@ static void pl011_update(PL011State *s)
      }
  }
--static inline void neon_load_reg64(TCGv_i64 var, int reg)
++static bool pl011_is_fifo_enabled(PL011State *s)
-+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
++{
 +    return (s->lcr & 0x10) != 0;
 +}
 +
 +static inline unsigned pl011_get_fifo_depth(PL011State *s)
 +{
 +    /* Note: FIFO depth is expected to be power-of-2 */
 +    return pl011_is_fifo_enabled(s) ? PL011_FIFO_DEPTH : 1;
 +}
 +
  static uint64_t pl011_read(void *opaque, hwaddr offset,
                             unsigned size)
  {
--    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+@@ -XXX,XX +XXX,XX @@ static uint64_t pl011_read(void *opaque, hwaddr offset,
-+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
+         c = s->read_fifo[s->read_pos];
          if (s->read_count > 0) {
              s->read_count--;
 -            if (++s->read_pos == 16)
 -                s->read_pos = 0;
 +            s->read_pos = (s->read_pos + 1) & (pl011_get_fifo_depth(s) - 1);
          }
          if (s->read_count == 0) {
              s->flags |= PL011_FLAG_RXFE;
@@ -XXX,XX +XXX,XX @@ static int pl011_can_receive(void *opaque)
      PL011State *s = (PL011State *)opaque;
      int r;
 -    if (s->lcr & 0x10) {
 -        r = s->read_count < 16;
 -    } else {
 -        r = s->read_count < 1;
 -    }
 +    r = s->read_count < pl011_get_fifo_depth(s);
      trace_pl011_can_receive(s->lcr, s->read_count, r);
      return r;
  }
+@@ -XXX,XX +XXX,XX @@ static void pl011_put_fifo(void *opaque, uint32_t value)
 -static inline void neon_store_reg64(TCGv_i64 var, int reg)
 +static inline void vfp_store_reg64(TCGv_i64 var, int reg)
  {
--    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
+     PL011State *s = (PL011State *)opaque;
-+    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
+     int slot;
- }
++    unsigned pipe_depth;
- static inline void vfp_load_reg32(TCGv_i32 var, int reg)
+-    slot = s->read_pos + s->read_count;
-diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
+-    if (slot >= 16)
-index XXXXXXX..XXXXXXX 100644
+-        slot -= 16;
---- a/target/arm/translate-vfp.c.inc
++    pipe_depth = pl011_get_fifo_depth(s);
-+++ b/target/arm/translate-vfp.c.inc
++    slot = (s->read_pos + s->read_count) & (pipe_depth - 1);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+     s->read_fifo[slot] = value;
-         tcg_gen_ext_i32_i64(nf, cpu_NF);
+     s->read_count++;
-         tcg_gen_ext_i32_i64(vf, cpu_VF);
+     s->flags &= ~PL011_FLAG_RXFE;
+     trace_pl011_put_fifo(value, s->read_count);
--        neon_load_reg64(frn, rn);
+-    if (!(s->lcr & 0x10) || s->read_count == 16) {
--        neon_load_reg64(frm, rm);
++    if (s->read_count == pipe_depth) {
-+        vfp_load_reg64(frn, rn);
+         trace_pl011_put_fifo_full();
-+        vfp_load_reg64(frm, rm);
+         s->flags |= PL011_FLAG_RXFF;
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
              tcg_temp_free_i64(tmp);
              break;
          }
 -        neon_store_reg64(dest, rd);
 +        vfp_store_reg64(dest, rd);
          tcg_temp_free_i64(frn);
          tcg_temp_free_i64(frm);
          tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i64 tcg_res;
          tcg_op = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
 -        neon_load_reg64(tcg_op, rm);
 +        vfp_load_reg64(tcg_op, rm);
          gen_helper_rintd(tcg_res, tcg_op, fpst);
 -        neon_store_reg64(tcg_res, rd);
 +        vfp_store_reg64(tcg_res, rd);
          tcg_temp_free_i64(tcg_op);
          tcg_temp_free_i64(tcg_res);
      } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          tcg_double = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
          tcg_tmp = tcg_temp_new_i32();
 -        neon_load_reg64(tcg_double, rm);
 +        vfp_load_reg64(tcg_double, rm);
          if (is_signed) {
              gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
          } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
      tmp = tcg_temp_new_i64();
      if (a->l) {
          gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg64(tmp, a->vd);
 +        vfp_store_reg64(tmp, a->vd);
      } else {
 -        neon_load_reg64(tmp, a->vd);
 +        vfp_load_reg64(tmp, a->vd);
          gen_aa32_st64(s, tmp, addr, get_mem_index(s));
      }
-     tcg_temp_free_i64(tmp);
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_pl011 = {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
+         VMSTATE_UINT32(dmacr, PL011State),
-         if (a->l) {
+         VMSTATE_UINT32(int_enabled, PL011State),
-             /* load */
+         VMSTATE_UINT32(int_level, PL011State),
-             gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
+-        VMSTATE_UINT32_ARRAY(read_fifo, PL011State, 16),
--            neon_store_reg64(tmp, a->vd + i);
++        VMSTATE_UINT32_ARRAY(read_fifo, PL011State, PL011_FIFO_DEPTH),
-+            vfp_store_reg64(tmp, a->vd + i);
+         VMSTATE_UINT32(ilpr, PL011State),
-         } else {
+         VMSTATE_UINT32(ibrd, PL011State),
-             /* store */
+         VMSTATE_UINT32(fbrd, PL011State),
 -            neon_load_reg64(tmp, a->vd + i);
 +            vfp_load_reg64(tmp, a->vd + i);
              gen_aa32_st64(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
      fd = tcg_temp_new_i64();
      fpst = fpstatus_ptr(FPST_FPCR);
 -    neon_load_reg64(f0, vn);
 -    neon_load_reg64(f1, vm);
 +    vfp_load_reg64(f0, vn);
 +    vfp_load_reg64(f1, vm);
      for (;;) {
          if (reads_vd) {
 -            neon_load_reg64(fd, vd);
 +            vfp_load_reg64(fd, vd);
          }
          fn(fd, f0, f1, fpst);
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
          veclen--;
          vd = vfp_advance_dreg(vd, delta_d);
          vn = vfp_advance_dreg(vn, delta_d);
 -        neon_load_reg64(f0, vn);
 +        vfp_load_reg64(f0, vn);
          if (delta_m) {
              vm = vfp_advance_dreg(vm, delta_m);
 -            neon_load_reg64(f1, vm);
 +            vfp_load_reg64(f1, vm);
          }
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
      f0 = tcg_temp_new_i64();
      fd = tcg_temp_new_i64();
 -    neon_load_reg64(f0, vm);
 +    vfp_load_reg64(f0, vm);
      for (;;) {
          fn(fd, f0);
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
              /* single source one-many */
              while (veclen--) {
                  vd = vfp_advance_dreg(vd, delta_d);
 -                neon_store_reg64(fd, vd);
 +                vfp_store_reg64(fd, vd);
              }
              break;
          }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          veclen--;
          vd = vfp_advance_dreg(vd, delta_d);
          vd = vfp_advance_dreg(vm, delta_m);
 -        neon_load_reg64(f0, vm);
 +        vfp_load_reg64(f0, vm);
      }
      tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vn, a->vn);
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vn, a->vn);
 +    vfp_load_reg64(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negd(vn, vn);
      }
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negd(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
      for (;;) {
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
      vd = tcg_temp_new_i64();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i64(vm, 0);
      } else {
 -        neon_load_reg64(vm, a->vm);
 +        vfp_load_reg64(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
      tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
      vd = tcg_temp_new_i64();
      gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(ahp_mode);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
      tmp = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
      tcg_temp_free_i64(vm);
      tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rintd(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd_exact(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vd = tcg_temp_new_i64();
      vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
          /* u32 -> f64 */
          gen_helper_vfp_uitod(vd, vm, fpst);
      }
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i32();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i64(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR);
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i32();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      if (a->s) {
          if (a->rz) {
 --
-.20.1
+.34.1

-[PULL 20/26] hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
+[PULL 04/33] hw/char/pl011: add post_load hook for backwards-compatibility
-From: AlexChen <alex.chen@huawei.com>
+From: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
-In exynos4210_fimd_update(), the pointer s is dereferinced before
+Previous change slightly modified the way we handle data writes when
-being check if it is valid, which may lead to NULL pointer dereference.
+FIFO is disabled. Previously we kept incrementing read_pos and were
-So move the assignment to global_width after checking that the s is valid.
+storing data at that position, although we only have a
 single-register-deep FIFO now. Then we changed it to always store data
 at pos 0.
-Reported-by: Euler Robot <euler.robot@huawei.com>
+If guest disables FIFO and the proceeds to read data, it will work out
-Signed-off-by: Alex Chen <alex.chen@huawei.com>
+fine, because we still read from current read_pos before setting it to
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+.
-Message-id: 5F9F8D88.9030102@huawei.com
 However, to make code less fragile, introduce a post_load hook for
 PL011State and move fixup read FIFO state when FIFO is disabled. Since
 we are introducing a post_load hook, also do some sanity checking on
 untrusted incoming input state.
 Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
 Message-id: 20230123162304.26254-3-eiakovlev@linux.microsoft.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/display/exynos4210_fimd.c | 4 +++-
+ hw/char/pl011.c | 25 +++++++++++++++++++++++++
-file changed, 3 insertions(+), 1 deletion(-)
+file changed, 25 insertions(+)
-diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
+diff --git a/hw/char/pl011.c b/hw/char/pl011.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/exynos4210_fimd.c
+--- a/hw/char/pl011.c
-+++ b/hw/display/exynos4210_fimd.c
++++ b/hw/char/pl011.c
-@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_pl011_clock = {
      bool blend = false;
      uint8_t *host_fb_addr;
      bool is_dirty = false;
 -    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
 +    int global_width;
      if (!s || !s->console || !s->enabled ||
          surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
          return;
      }
+ };
++static int pl011_post_load(void *opaque, int version_id)
++{
++    PL011State* s = opaque;
 +
-+    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
++    /* Sanity-check input state */
-     exynos4210_update_resolution(s);
++    if (s->read_pos >= ARRAY_SIZE(s->read_fifo) ||
-     surface = qemu_console_surface(s->console);
++        s->read_count > ARRAY_SIZE(s->read_fifo)) {
++        return -1;
 +    }
 +
 +    if (!pl011_is_fifo_enabled(s) && s->read_count > 0 && s->read_pos > 0) {
 +        /*
 +         * Older versions of PL011 didn't ensure that the single
 +         * character in the FIFO in FIFO-disabled mode is in
 +         * element 0 of the array; convert to follow the current
 +         * code's assumptions.
 +         */
 +        s->read_fifo[0] = s->read_fifo[s->read_pos];
 +        s->read_pos = 0;
 +    }
 +
 +    return 0;
 +}
 +
  static const VMStateDescription vmstate_pl011 = {
      .name = "pl011",
      .version_id = 2,
      .minimum_version_id = 2,
 +    .post_load = pl011_post_load,
      .fields = (VMStateField[]) {
          VMSTATE_UINT32(readbuff, PL011State),
          VMSTATE_UINT32(flags, PL011State),
 --
-.20.1
+.34.1

-[PULL 05/26] target/arm: Add read/write_neon_element32
+[PULL 05/33] hw/char/pl011: implement a reset method
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
-Model these off the aa64 read/write_vec_element functions.
+PL011 currently lacks a reset method. Implement it.
 Use it within translate-neon.c.inc.  The new functions do
 not allocate or free temps, so this rearranges the calling
 code a bit.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
 Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-id: 20230123162304.26254-4-eiakovlev@linux.microsoft.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  26 ++++
+ hw/char/pl011.c | 26 +++++++++++++++++++++-----
- target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
+file changed, 21 insertions(+), 5 deletions(-)
 files changed, 183 insertions(+), 99 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/hw/char/pl011.c b/hw/char/pl011.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/char/pl011.c
-+++ b/target/arm/translate.c
++++ b/hw/char/pl011.c
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
+@@ -XXX,XX +XXX,XX @@ static void pl011_init(Object *obj)
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+     s->clk = qdev_init_clock_in(DEVICE(obj), "clk", pl011_clock_update, s,
                                  ClockUpdate);
 -    s->read_trigger = 1;
 -    s->ifl = 0x12;
 -    s->cr = 0x300;
 -    s->flags = 0x90;
 -
      s->id = pl011_id_arm;
  }
-+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+@@ -XXX,XX +XXX,XX @@ static void pl011_realize(DeviceState *dev, Error **errp)
                               pl011_event, NULL, s, NULL, true);
  }
 +static void pl011_reset(DeviceState *dev)
 +{
-+    long off = neon_element_offset(reg, ele, size);
++    PL011State *s = PL011(dev);
 +
-+    switch (size) {
++    s->lcr = 0;
-+    case MO_32:
++    s->rsr = 0;
-+        tcg_gen_ld_i32(dest, cpu_env, off);
++    s->dmacr = 0;
-+        break;
++    s->int_enabled = 0;
-+    default:
++    s->int_level = 0;
-+        g_assert_not_reached();
++    s->ilpr = 0;
-+    }
++    s->ibrd = 0;
 +    s->fbrd = 0;
 +    s->read_pos = 0;
 +    s->read_count = 0;
 +    s->read_trigger = 1;
 +    s->ifl = 0x12;
 +    s->cr = 0x300;
 +    s->flags = 0x90;
 +}
 +
-+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+ static void pl011_class_init(ObjectClass *oc, void *data)
 +{
 +    long off = neon_element_offset(reg, ele, size);
 +
 +    switch (size) {
 +    case MO_32:
 +        tcg_gen_st_i32(src, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
  static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  {
-     TCGv_ptr ret = tcg_temp_new_ptr();
+     DeviceClass *dc = DEVICE_CLASS(oc);
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
-index XXXXXXX..XXXXXXX 100644
+     dc->realize = pl011_realize;
---- a/target/arm/translate-neon.c.inc
++    dc->reset = pl011_reset;
-+++ b/target/arm/translate-neon.c.inc
+     dc->vmsd = &vmstate_pl011;
-@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
+     device_class_set_props(dc, pl011_properties);
       * early. Since Q is 0 there are always just two passes, so instead
       * of a complicated loop over each pass we just unroll.
       */
 -    tmp = neon_load_reg(a->vn, 0);
 -    tmp2 = neon_load_reg(a->vn, 1);
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    tmp3 = tcg_temp_new_i32();
 +
 +    read_neon_element32(tmp, a->vn, 0, MO_32);
 +    read_neon_element32(tmp2, a->vn, 1, MO_32);
      fn(tmp, tmp, tmp2);
 -    tcg_temp_free_i32(tmp2);
 -    tmp3 = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(tmp3, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      fn(tmp3, tmp3, tmp2);
 -    tcg_temp_free_i32(tmp2);
 -    neon_store_reg(a->vd, 0, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
       * 2-reg-and-shift operations, size < 3 case, where the
       * helper needs to be passed cpu_env.
       */
 -    TCGv_i32 constimm;
 +    TCGv_i32 constimm, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
       * by immediate using the variable shift operations.
       */
      constimm = tcg_const_i32(dup_const(a->size, a->shift));
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, cpu_env, tmp, constimm);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(constimm);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
      constimm = tcg_const_i64(-a->shift);
      rm1 = tcg_temp_new_i64();
      rm2 = tcg_temp_new_i64();
 +    rd = tcg_temp_new_i32();
      /* Load both inputs first to avoid potential overwrite if rm == rd */
      neon_load_reg64(rm1, a->vm);
      neon_load_reg64(rm2, a->vm + 1);
      shiftfn(rm1, rm1, constimm);
 -    rd = tcg_temp_new_i32();
      narrowfn(rd, cpu_env, rm1);
 -    neon_store_reg(a->vd, 0, rd);
 +    write_neon_element32(rd, a->vd, 0, MO_32);
      shiftfn(rm2, rm2, constimm);
 -    rd = tcg_temp_new_i32();
      narrowfn(rd, cpu_env, rm2);
 -    neon_store_reg(a->vd, 1, rd);
 +    write_neon_element32(rd, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i64(rm1);
      tcg_temp_free_i64(rm2);
      tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      constimm = tcg_const_i32(imm);
      /* Load all inputs first to avoid potential overwrite */
 -    rm1 = neon_load_reg(a->vm, 0);
 -    rm2 = neon_load_reg(a->vm, 1);
 -    rm3 = neon_load_reg(a->vm + 1, 0);
 -    rm4 = neon_load_reg(a->vm + 1, 1);
 +    rm1 = tcg_temp_new_i32();
 +    rm2 = tcg_temp_new_i32();
 +    rm3 = tcg_temp_new_i32();
 +    rm4 = tcg_temp_new_i32();
 +    read_neon_element32(rm1, a->vm, 0, MO_32);
 +    read_neon_element32(rm2, a->vm, 1, MO_32);
 +    read_neon_element32(rm3, a->vm, 2, MO_32);
 +    read_neon_element32(rm4, a->vm, 3, MO_32);
      rtmp = tcg_temp_new_i64();
      shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      tcg_temp_free_i32(rm2);
      narrowfn(rm1, cpu_env, rtmp);
 -    neon_store_reg(a->vd, 0, rm1);
 +    write_neon_element32(rm1, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(rm1);
      shiftfn(rm3, rm3, constimm);
      shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      narrowfn(rm3, cpu_env, rtmp);
      tcg_temp_free_i64(rtmp);
 -    neon_store_reg(a->vd, 1, rm3);
 +    write_neon_element32(rm3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rm3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          widen_mask = dup_const(a->size + 1, widen_mask);
      }
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      tmp = tcg_temp_new_i64();
      widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn0_64, a->vn);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 0);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn1_64, a->vn + 1);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 1);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      neon_store_reg64(rn0_64, a->vd);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      narrowfn(rd1, rn_64);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rn_64);
      tcg_temp_free_i64(rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i64();
      rd1 = tcg_temp_new_i64();
 -    rn = neon_load_reg(a->vn, 0);
 -    rm = neon_load_reg(a->vm, 0);
 +    rn = tcg_temp_new_i32();
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      opfn(rd0, rn, rm);
 -    tcg_temp_free_i32(rn);
 -    tcg_temp_free_i32(rm);
 -    rn = neon_load_reg(a->vn, 1);
 -    rm = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      opfn(rd1, rn, rm);
      tcg_temp_free_i32(rn);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
  static inline TCGv_i32 neon_get_scalar(int size, int reg)
  {
 -    TCGv_i32 tmp;
 -    if (size == 1) {
 -        tmp = neon_load_reg(reg & 7, reg >> 4);
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +    if (size == MO_16) {
 +        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
          if (reg & 8) {
              gen_neon_dup_high16(tmp);
          } else {
              gen_neon_dup_low16(tmp);
          }
      } else {
 -        tmp = neon_load_reg(reg & 15, reg >> 4);
 +        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
      }
      return tmp;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
       * perform an accumulation operation of that result into the
       * destination.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
 +        read_neon_element32(tmp, a->vn, pass, MO_32);
          opfn(tmp, tmp, scalar);
          if (accfn) {
 -            TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +            TCGv_i32 rd = tcg_temp_new_i32();
 +            read_neon_element32(rd, a->vd, pass, MO_32);
              accfn(tmp, rd, tmp);
              tcg_temp_free_i32(rd);
          }
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(scalar);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
       * performs a kind of fused op-then-accumulate using a helper
       * function that takes all of rd, rn and the scalar at once.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, rn, rd;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    rn = tcg_temp_new_i32();
 +    rd = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 rn = neon_load_reg(a->vn, pass);
 -        TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +        read_neon_element32(rn, a->vn, pass, MO_32);
 +        read_neon_element32(rd, a->vd, pass, MO_32);
          opfn(rd, cpu_env, rn, scalar, rd);
 -        tcg_temp_free_i32(rn);
 -        neon_store_reg(a->vd, pass, rd);
 +        write_neon_element32(rd, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i32(scalar);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      scalar = neon_get_scalar(a->size, a->vm);
      /* Load all inputs before writing any outputs, in case of overlap */
 -    rn = neon_load_reg(a->vn, 0);
 +    rn = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
      rn0_64 = tcg_temp_new_i64();
      opfn(rn0_64, rn, scalar);
 -    tcg_temp_free_i32(rn);
 -    rn = neon_load_reg(a->vn, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
      rn1_64 = tcg_temp_new_i64();
      opfn(rn1_64, rn, scalar);
      tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
          return false;
      }
      n <<= 3;
 +    tmp = tcg_temp_new_i32();
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 0);
 +        read_neon_element32(tmp, a->vd, 0, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp2 = neon_load_reg(a->vm, 0);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 0, MO_32);
      ptr1 = vfp_reg_ptr(true, a->vn);
      tmp4 = tcg_const_i32(n);
      gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
 -    tcg_temp_free_i32(tmp);
 +
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 1);
 +        read_neon_element32(tmp, a->vd, 1, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp3 = neon_load_reg(a->vm, 1);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 1, MO_32);
      gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(tmp4);
      tcg_temp_free_ptr(ptr1);
 -    neon_store_reg(a->vd, 0, tmp2);
 -    neon_store_reg(a->vd, 1, tmp3);
 -    tcg_temp_free_i32(tmp);
 +
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
  static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
  {
      int pass, half;
 +    TCGv_i32 tmp[2];
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
          return true;
      }
 -    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        TCGv_i32 tmp[2];
 +    tmp[0] = tcg_temp_new_i32();
 +    tmp[1] = tcg_temp_new_i32();
 +    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
          for (half = 0; half < 2; half++) {
 -            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
 +            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
              switch (a->size) {
              case 0:
                  tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                  g_assert_not_reached();
              }
          }
 -        neon_store_reg(a->vd, pass * 2, tmp[1]);
 -        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
 +        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
 +        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
      }
 +
 +    tcg_temp_free_i32(tmp[0]);
 +    tcg_temp_free_i32(tmp[1]);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          rm0_64 = tcg_temp_new_i64();
          rm1_64 = tcg_temp_new_i64();
          rd_64 = tcg_temp_new_i64();
 -        tmp = neon_load_reg(a->vm, pass * 2);
 +
 +        tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
          widenfn(rm0_64, tmp);
 -        tcg_temp_free_i32(tmp);
 -        tmp = neon_load_reg(a->vm, pass * 2 + 1);
 +        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
          widenfn(rm1_64, tmp);
          tcg_temp_free_i32(tmp);
 +
          opfn(rd_64, rm0_64, rm1_64);
          tcg_temp_free_i64(rm0_64);
          tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      narrowfn(rd0, cpu_env, rm);
      neon_load_reg64(rm, a->vm + 1);
      narrowfn(rd1, cpu_env, rm);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rm);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      }
      rd = tcg_temp_new_i64();
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
 -    tmp = neon_load_reg(a->vm, 0);
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
      tcg_gen_shli_i32(tmp2, tmp2, 16);
      tcg_gen_or_i32(tmp2, tmp2, tmp);
 -    tcg_temp_free_i32(tmp);
 -    tmp = neon_load_reg(a->vm, 2);
 +    read_neon_element32(tmp, a->vm, 2, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp3 = neon_load_reg(a->vm, 3);
 -    neon_store_reg(a->vd, 0, tmp2);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 3, MO_32);
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(tmp2);
      gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
      tcg_gen_shli_i32(tmp3, tmp3, 16);
      tcg_gen_or_i32(tmp3, tmp3, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
      tmp3 = tcg_temp_new_i32();
 -    tmp = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      tcg_gen_ext16u_i32(tmp3, tmp);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 0, tmp3);
 +    write_neon_element32(tmp3, a->vd, 0, MO_32);
      tcg_gen_shri_i32(tmp, tmp, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
 -    neon_store_reg(a->vd, 1, tmp);
 -    tmp3 = tcg_temp_new_i32();
 +    write_neon_element32(tmp, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp);
      tcg_gen_ext16u_i32(tmp3, tmp2);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 2, tmp3);
 +    write_neon_element32(tmp3, a->vd, 2, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_gen_shri_i32(tmp2, tmp2, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
 -    neon_store_reg(a->vd, 3, tmp2);
 +    write_neon_element32(tmp2, a->vd, 3, MO_32);
 +    tcg_temp_free_i32(tmp2);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
  static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
  {
 +    TCGv_i32 tmp;
      int pass;
      /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
          return true;
      }
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, tmp);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
          return true;
      }
 -    if (a->size == 2) {
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    if (a->size == MO_32) {
          for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass + 1);
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass + 1, tmp);
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
          }
      } else {
          for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass);
 -            if (a->size == 0) {
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass, MO_32);
 +            if (a->size == MO_8) {
                  gen_neon_trn_u8(tmp, tmp2);
              } else {
                  gen_neon_trn_u16(tmp, tmp2);
              }
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass, tmp);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass, MO_32);
          }
      }
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
      return true;
  }
 --
-.20.1
+.34.1

-[PULL 01/26] target/arm: Introduce neon_full_reg_offset
+[PULL 06/33] hw/char/pl011: better handling of FIFO flags on LCR reset
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
-This function makes it clear that we're talking about the whole
+Current FIFO handling code does not reset RXFE/RXFF flags when guest
-register, and not the 32-bit piece at index 0.  This fixes a bug
+resets FIFO by writing to UARTLCR register, although internal FIFO state
-when running on a big-endian host.
+is reset to 0 read count. Actual guest-visible flag update will happen
 only on next data read or write attempt. As a result of that any guest
 that expects RXFE flag to be set (and RXFF to be cleared) after resetting
 FIFO will never see that happen.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
 Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20230123162304.26254-5-eiakovlev@linux.microsoft.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  8 ++++++
+ hw/char/pl011.c | 18 +++++++++++++-----
- target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
+file changed, 13 insertions(+), 5 deletions(-)
  target/arm/translate-vfp.c.inc  |  2 +-
 files changed, 31 insertions(+), 23 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/hw/char/pl011.c b/hw/char/pl011.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/char/pl011.c
-+++ b/target/arm/translate.c
++++ b/hw/char/pl011.c
-@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
+@@ -XXX,XX +XXX,XX @@ static inline unsigned pl011_get_fifo_depth(PL011State *s)
-     unallocated_encoding(s);
+     return pl011_is_fifo_enabled(s) ? PL011_FIFO_DEPTH : 1;
  }
-+/*
++static inline void pl011_reset_fifo(PL011State *s)
 + * Return the offset of a "full" NEON Dreg.
 + */
 +static long neon_full_reg_offset(unsigned reg)
 +{
-+    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
++    s->read_count = 0;
 +    s->read_pos = 0;
 +
 +    /* Reset FIFO flags */
 +    s->flags &= ~(PL011_FLAG_RXFF | PL011_FLAG_TXFF);
 +    s->flags |= PL011_FLAG_RXFE | PL011_FLAG_TXFE;
 +}
 +
- static inline long vfp_reg_offset(bool dp, unsigned reg)
+ static uint64_t pl011_read(void *opaque, hwaddr offset,
                             unsigned size)
  {
-     if (dp) {
+@@ -XXX,XX +XXX,XX @@ static void pl011_write(void *opaque, hwaddr offset,
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+     case 11: /* UARTLCR_H */
-index XXXXXXX..XXXXXXX 100644
+         /* Reset the FIFO state on FIFO enable or disable */
---- a/target/arm/translate-neon.c.inc
+         if ((s->lcr ^ value) & 0x10) {
-+++ b/target/arm/translate-neon.c.inc
+-            s->read_count = 0;
-@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
+-            s->read_pos = 0;
-         ofs ^= 8 - element_size;
++            pl011_reset_fifo(s);
-     }
+         }
- #endif
+         if ((s->lcr ^ value) & 0x1) {
--    return neon_reg_offset(reg, 0) + ofs;
+             int break_enable = value & 0x1;
-+    return neon_full_reg_offset(reg) + ofs;
+@@ -XXX,XX +XXX,XX @@ static void pl011_reset(DeviceState *dev)
      s->ilpr = 0;
      s->ibrd = 0;
      s->fbrd = 0;
 -    s->read_pos = 0;
 -    s->read_count = 0;
      s->read_trigger = 1;
      s->ifl = 0x12;
      s->cr = 0x300;
 -    s->flags = 0x90;
 +    s->flags = 0;
 +    pl011_reset_fifo(s);
  }
- static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
+ static void pl011_class_init(ObjectClass *oc, void *data)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
               * We cannot write 16 bytes at once because the
               * destination is unaligned.
               */
 -            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
 +            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
 , 8, tmp);
 -            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
 -                             neon_reg_offset(vd, 0), 8, 8);
 +            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
 +                             neon_full_reg_offset(vd), 8, 8);
          } else {
 -            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
 +            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                   vec_size, vec_size, tmp);
          }
          tcg_gen_addi_i32(addr, addr, 1 << size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
  static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
  {
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rn_ofs = neon_reg_offset(a->vn, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rn_ofs = neon_full_reg_offset(a->vn);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
  {
      /* Handle a 2-reg-shift insn which can be vectorized. */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
  {
      /* FP operations in 2-reg-and-shift group */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      TCGv_ptr fpst;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
          return true;
      }
 -    reg_ofs = neon_reg_offset(a->vd, 0);
 +    reg_ofs = neon_full_reg_offset(a->vd);
      vec_size = a->q ? 16 : 8;
      imm = asimd_imm_const(a->imm, a->cmode, a->op);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
          return true;
      }
 -    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
 -                       neon_reg_offset(a->vn, 0),
 -                       neon_reg_offset(a->vm, 0),
 +    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
 +                       neon_full_reg_offset(a->vn),
 +                       neon_full_reg_offset(a->vm),
 , 16, 0, fn_gvec);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
  {
      /* Two registers and a scalar, using gvec */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rn_ofs = neon_reg_offset(a->vn, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rn_ofs = neon_full_reg_offset(a->vn);
      int rm_ofs;
      int idx;
      TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
      /* a->vm is M:Vm, which encodes both register and index */
      idx = extract32(a->vm, a->size + 2, 2);
      a->vm = extract32(a->vm, 0, a->size + 2);
 -    rm_ofs = neon_reg_offset(a->vm, 0);
 +    rm_ofs = neon_full_reg_offset(a->vm);
      fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
      tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
          return true;
      }
 -    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
 +    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                           neon_element_offset(a->vm, a->index, a->size),
                           a->q ? 16 : 8, a->q ? 16 : 8);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
  static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
  {
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
      }
      tmp = load_reg(s, a->rt);
 -    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
 +    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                           vec_size, vec_size, tmp);
      tcg_temp_free_i32(tmp);
 --
-.20.1
+.34.1

-[PULL 06/26] target/arm: Expand read/write_neon_element32 to all MemOp
+[PULL 07/33] hvf: arm: Add support for GICv3
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-We can then use this to improve VMOV (scalar to gp) and
+We currently only support GICv2 emulation. To also support GICv3, we will
-VMOV (gp to scalar) so that we simply perform the memory
+need to pass a few system registers into their respective handler functions.
-operation that we wanted, rather than inserting or
-extracting from a 32-bit quantity.
+This patch adds support for HVF to call into the TCG callbacks for GICv3
+system register handlers. This is safe because the GICv3 TCG code is generic
-These were the last uses of neon_load/store_reg, so remove them.
+as long as we limit ourselves to EL0 and EL1 - which are the only modes
+supported by HVF.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
+To make sure nobody trips over that, we also annotate callbacks that don't
 work in HVF mode, such as EL state change hooks.
 With GICv3 support in place, we can run with more than 8 vCPUs.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Message-id: 20230128224459.70676-1-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         | 50 +++++++++++++-----------
+ hw/intc/arm_gicv3_cpuif.c   |  16 +++-
- target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
+ target/arm/hvf/hvf.c        | 151 ++++++++++++++++++++++++++++++++++++
-files changed, 37 insertions(+), 84 deletions(-)
+ target/arm/hvf/trace-events |   2 +
+files changed, 168 insertions(+), 1 deletion(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
 diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/intc/arm_gicv3_cpuif.c
-+++ b/target/arm/translate.c
++++ b/hw/intc/arm_gicv3_cpuif.c
-@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+@@ -XXX,XX +XXX,XX @@
-  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+ #include "hw/irq.h"
-  * where 0 is the least significant end of the register.
+ #include "cpu.h"
-  */
+ #include "target/arm/cpregs.h"
--static long neon_element_offset(int reg, int element, MemOp size)
++#include "sysemu/tcg.h"
-+static long neon_element_offset(int reg, int element, MemOp memop)
++#include "sysemu/qtest.h"
- {
--    int element_size = 1 << size;
+ /*
-+    int element_size = 1 << (memop & MO_SIZE);
+  * Special case return value from hppvi_index(); must be larger than
-     int ofs = element * element_size;
+@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
- #ifdef HOST_WORDS_BIGENDIAN
+          * which case we'd get the wrong value.
-     /*
+          * So instead we define the regs with no ri->opaque info, and
-@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
+          * get back to the GICv3CPUState from the CPUARMState.
 +         *
 +         * These CP regs callbacks can be called from either TCG or HVF code.
           */
          define_arm_cp_regs(cpu, gicv3_cpuif_reginfo);
@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
                  define_arm_cp_regs(cpu, gicv3_cpuif_ich_apxr23_reginfo);
              }
          }
 -        arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs);
 +        if (tcg_enabled() || qtest_enabled()) {
 +            /*
 +             * We can only trap EL changes with TCG. However the GIC interrupt
 +             * state only changes on EL changes involving EL2 or EL3, so for
 +             * the non-TCG case this is OK, as EL2 and EL3 can't exist.
 +             */
 +            arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs);
 +        } else {
 +            assert(!arm_feature(&cpu->env, ARM_FEATURE_EL2));
 +            assert(!arm_feature(&cpu->env, ARM_FEATURE_EL3));
 +        }
      }
  }
+diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
--static TCGv_i32 neon_load_reg(int reg, int pass)
+index XXXXXXX..XXXXXXX 100644
--{
+--- a/target/arm/hvf/hvf.c
--    TCGv_i32 tmp = tcg_temp_new_i32();
++++ b/target/arm/hvf/hvf.c
--    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
+@@ -XXX,XX +XXX,XX @@
--    return tmp;
+ #define SYSREG_PMCCNTR_EL0    SYSREG(3, 3, 9, 13, 0)
--}
+ #define SYSREG_PMCCFILTR_EL0  SYSREG(3, 3, 14, 15, 7)
--
--static void neon_store_reg(int reg, int pass, TCGv_i32 var)
++#define SYSREG_ICC_AP0R0_EL1     SYSREG(3, 0, 12, 8, 4)
--{
++#define SYSREG_ICC_AP0R1_EL1     SYSREG(3, 0, 12, 8, 5)
--    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
++#define SYSREG_ICC_AP0R2_EL1     SYSREG(3, 0, 12, 8, 6)
--    tcg_temp_free_i32(var);
++#define SYSREG_ICC_AP0R3_EL1     SYSREG(3, 0, 12, 8, 7)
--}
++#define SYSREG_ICC_AP1R0_EL1     SYSREG(3, 0, 12, 9, 0)
--
++#define SYSREG_ICC_AP1R1_EL1     SYSREG(3, 0, 12, 9, 1)
- static inline void neon_load_reg64(TCGv_i64 var, int reg)
++#define SYSREG_ICC_AP1R2_EL1     SYSREG(3, 0, 12, 9, 2)
 +#define SYSREG_ICC_AP1R3_EL1     SYSREG(3, 0, 12, 9, 3)
 +#define SYSREG_ICC_ASGI1R_EL1    SYSREG(3, 0, 12, 11, 6)
 +#define SYSREG_ICC_BPR0_EL1      SYSREG(3, 0, 12, 8, 3)
 +#define SYSREG_ICC_BPR1_EL1      SYSREG(3, 0, 12, 12, 3)
 +#define SYSREG_ICC_CTLR_EL1      SYSREG(3, 0, 12, 12, 4)
 +#define SYSREG_ICC_DIR_EL1       SYSREG(3, 0, 12, 11, 1)
 +#define SYSREG_ICC_EOIR0_EL1     SYSREG(3, 0, 12, 8, 1)
 +#define SYSREG_ICC_EOIR1_EL1     SYSREG(3, 0, 12, 12, 1)
 +#define SYSREG_ICC_HPPIR0_EL1    SYSREG(3, 0, 12, 8, 2)
 +#define SYSREG_ICC_HPPIR1_EL1    SYSREG(3, 0, 12, 12, 2)
 +#define SYSREG_ICC_IAR0_EL1      SYSREG(3, 0, 12, 8, 0)
 +#define SYSREG_ICC_IAR1_EL1      SYSREG(3, 0, 12, 12, 0)
 +#define SYSREG_ICC_IGRPEN0_EL1   SYSREG(3, 0, 12, 12, 6)
 +#define SYSREG_ICC_IGRPEN1_EL1   SYSREG(3, 0, 12, 12, 7)
 +#define SYSREG_ICC_PMR_EL1       SYSREG(3, 0, 4, 6, 0)
 +#define SYSREG_ICC_RPR_EL1       SYSREG(3, 0, 12, 11, 3)
 +#define SYSREG_ICC_SGI0R_EL1     SYSREG(3, 0, 12, 11, 7)
 +#define SYSREG_ICC_SGI1R_EL1     SYSREG(3, 0, 12, 11, 5)
 +#define SYSREG_ICC_SRE_EL1       SYSREG(3, 0, 12, 12, 5)
 +
  #define WFX_IS_WFE (1 << 0)
  #define TMR_CTL_ENABLE  (1 << 0)
@@ -XXX,XX +XXX,XX @@ static bool is_id_sysreg(uint32_t reg)
             SYSREG_CRM(reg) < 8;
  }
 +static uint32_t hvf_reg2cp_reg(uint32_t reg)
 +{
 +    return ENCODE_AA64_CP_REG(CP_REG_ARM64_SYSREG_CP,
 +                              (reg >> SYSREG_CRN_SHIFT) & SYSREG_CRN_MASK,
 +                              (reg >> SYSREG_CRM_SHIFT) & SYSREG_CRM_MASK,
 +                              (reg >> SYSREG_OP0_SHIFT) & SYSREG_OP0_MASK,
 +                              (reg >> SYSREG_OP1_SHIFT) & SYSREG_OP1_MASK,
 +                              (reg >> SYSREG_OP2_SHIFT) & SYSREG_OP2_MASK);
 +}
 +
 +static bool hvf_sysreg_read_cp(CPUState *cpu, uint32_t reg, uint64_t *val)
 +{
 +    ARMCPU *arm_cpu = ARM_CPU(cpu);
 +    CPUARMState *env = &arm_cpu->env;
 +    const ARMCPRegInfo *ri;
 +
 +    ri = get_arm_cp_reginfo(arm_cpu->cp_regs, hvf_reg2cp_reg(reg));
 +    if (ri) {
 +        if (ri->accessfn) {
 +            if (ri->accessfn(env, ri, true) != CP_ACCESS_OK) {
 +                return false;
 +            }
 +        }
 +        if (ri->type & ARM_CP_CONST) {
 +            *val = ri->resetvalue;
 +        } else if (ri->readfn) {
 +            *val = ri->readfn(env, ri);
 +        } else {
 +            *val = CPREG_FIELD64(env, ri);
 +        }
 +        trace_hvf_vgic_read(ri->name, *val);
 +        return true;
 +    }
 +
 +    return false;
 +}
 +
  static int hvf_sysreg_read(CPUState *cpu, uint32_t reg, uint32_t rt)
  {
-     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+     ARMCPU *arm_cpu = ARM_CPU(cpu);
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
+@@ -XXX,XX +XXX,XX @@ static int hvf_sysreg_read(CPUState *cpu, uint32_t reg, uint32_t rt)
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+     case SYSREG_OSDLR_EL1:
- }
+         /* Dummy register */
+         break;
--static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
++    case SYSREG_ICC_AP0R0_EL1:
-+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
++    case SYSREG_ICC_AP0R1_EL1:
- {
++    case SYSREG_ICC_AP0R2_EL1:
--    long off = neon_element_offset(reg, ele, size);
++    case SYSREG_ICC_AP0R3_EL1:
-+    long off = neon_element_offset(reg, ele, memop);
++    case SYSREG_ICC_AP1R0_EL1:
++    case SYSREG_ICC_AP1R1_EL1:
--    switch (size) {
++    case SYSREG_ICC_AP1R2_EL1:
--    case MO_32:
++    case SYSREG_ICC_AP1R3_EL1:
-+    switch (memop) {
++    case SYSREG_ICC_ASGI1R_EL1:
-+    case MO_SB:
++    case SYSREG_ICC_BPR0_EL1:
-+        tcg_gen_ld8s_i32(dest, cpu_env, off);
++    case SYSREG_ICC_BPR1_EL1:
 +    case SYSREG_ICC_DIR_EL1:
 +    case SYSREG_ICC_EOIR0_EL1:
 +    case SYSREG_ICC_EOIR1_EL1:
 +    case SYSREG_ICC_HPPIR0_EL1:
 +    case SYSREG_ICC_HPPIR1_EL1:
 +    case SYSREG_ICC_IAR0_EL1:
 +    case SYSREG_ICC_IAR1_EL1:
 +    case SYSREG_ICC_IGRPEN0_EL1:
 +    case SYSREG_ICC_IGRPEN1_EL1:
 +    case SYSREG_ICC_PMR_EL1:
 +    case SYSREG_ICC_SGI0R_EL1:
 +    case SYSREG_ICC_SGI1R_EL1:
 +    case SYSREG_ICC_SRE_EL1:
 +    case SYSREG_ICC_CTLR_EL1:
 +        /* Call the TCG sysreg handler. This is only safe for GICv3 regs. */
 +        if (!hvf_sysreg_read_cp(cpu, reg, &val)) {
 +            hvf_raise_exception(cpu, EXCP_UDEF, syn_uncategorized());
 +        }
 +        break;
-+    case MO_UB:
-+        tcg_gen_ld8u_i32(dest, cpu_env, off);
-+        break;
-+    case MO_SW:
-+        tcg_gen_ld16s_i32(dest, cpu_env, off);
-+        break;
-+    case MO_UW:
-+        tcg_gen_ld16u_i32(dest, cpu_env, off);
-+        break;
-+    case MO_UL:
-+    case MO_SL:
-         tcg_gen_ld_i32(dest, cpu_env, off);
-         break;
      default:
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+         if (is_id_sysreg(reg)) {
              /* ID system registers read as RES0 */
@@ -XXX,XX +XXX,XX @@ static void pmswinc_write(CPUARMState *env, uint64_t value)
      }
  }
--static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
++static bool hvf_sysreg_write_cp(CPUState *cpu, uint32_t reg, uint64_t val)
-+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
++{
 +    ARMCPU *arm_cpu = ARM_CPU(cpu);
 +    CPUARMState *env = &arm_cpu->env;
 +    const ARMCPRegInfo *ri;
 +
 +    ri = get_arm_cp_reginfo(arm_cpu->cp_regs, hvf_reg2cp_reg(reg));
 +
 +    if (ri) {
 +        if (ri->accessfn) {
 +            if (ri->accessfn(env, ri, false) != CP_ACCESS_OK) {
 +                return false;
 +            }
 +        }
 +        if (ri->writefn) {
 +            ri->writefn(env, ri, val);
 +        } else {
 +            CPREG_FIELD64(env, ri) = val;
 +        }
 +
 +        trace_hvf_vgic_write(ri->name, val);
 +        return true;
 +    }
 +
 +    return false;
 +}
 +
  static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val)
  {
--    long off = neon_element_offset(reg, ele, size);
+     ARMCPU *arm_cpu = ARM_CPU(cpu);
-+    long off = neon_element_offset(reg, ele, memop);
+@@ -XXX,XX +XXX,XX @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val)
+     case SYSREG_OSDLR_EL1:
--    switch (size) {
+         /* Dummy register */
-+    switch (memop) {
+         break;
-+    case MO_8:
++    case SYSREG_ICC_AP0R0_EL1:
-+        tcg_gen_st8_i32(src, cpu_env, off);
++    case SYSREG_ICC_AP0R1_EL1:
 +    case SYSREG_ICC_AP0R2_EL1:
 +    case SYSREG_ICC_AP0R3_EL1:
 +    case SYSREG_ICC_AP1R0_EL1:
 +    case SYSREG_ICC_AP1R1_EL1:
 +    case SYSREG_ICC_AP1R2_EL1:
 +    case SYSREG_ICC_AP1R3_EL1:
 +    case SYSREG_ICC_ASGI1R_EL1:
 +    case SYSREG_ICC_BPR0_EL1:
 +    case SYSREG_ICC_BPR1_EL1:
 +    case SYSREG_ICC_CTLR_EL1:
 +    case SYSREG_ICC_DIR_EL1:
 +    case SYSREG_ICC_EOIR0_EL1:
 +    case SYSREG_ICC_EOIR1_EL1:
 +    case SYSREG_ICC_HPPIR0_EL1:
 +    case SYSREG_ICC_HPPIR1_EL1:
 +    case SYSREG_ICC_IAR0_EL1:
 +    case SYSREG_ICC_IAR1_EL1:
 +    case SYSREG_ICC_IGRPEN0_EL1:
 +    case SYSREG_ICC_IGRPEN1_EL1:
 +    case SYSREG_ICC_PMR_EL1:
 +    case SYSREG_ICC_SGI0R_EL1:
 +    case SYSREG_ICC_SGI1R_EL1:
 +    case SYSREG_ICC_SRE_EL1:
 +        /* Call the TCG sysreg handler. This is only safe for GICv3 regs. */
 +        if (!hvf_sysreg_write_cp(cpu, reg, val)) {
 +            hvf_raise_exception(cpu, EXCP_UDEF, syn_uncategorized());
 +        }
 +        break;
-+    case MO_16:
+     default:
-+        tcg_gen_st16_i32(src, cpu_env, off);
+         cpu_synchronize_state(cpu);
-+        break;
+         trace_hvf_unhandled_sysreg_write(env->pc, reg,
-     case MO_32:
+diff --git a/target/arm/hvf/trace-events b/target/arm/hvf/trace-events
          tcg_gen_st_i32(src, cpu_env, off);
          break;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c.inc
+--- a/target/arm/hvf/trace-events
-+++ b/target/arm/translate-vfp.c.inc
++++ b/target/arm/hvf/trace-events
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
+@@ -XXX,XX +XXX,XX @@ hvf_unknown_hvc(uint64_t x0) "unknown HVC! 0x%016"PRIx64
- {
+ hvf_unknown_smc(uint64_t x0) "unknown SMC! 0x%016"PRIx64
-     /* VMOV scalar to general purpose register */
+ hvf_exit(uint64_t syndrome, uint32_t ec, uint64_t pc) "exit: 0x%"PRIx64" [ec=0x%x pc=0x%"PRIx64"]"
-     TCGv_i32 tmp;
+ hvf_psci_call(uint64_t x0, uint64_t x1, uint64_t x2, uint64_t x3, uint32_t cpuid) "PSCI Call x0=0x%016"PRIx64" x1=0x%016"PRIx64" x2=0x%016"PRIx64" x3=0x%016"PRIx64" cpu=0x%x"
--    int pass;
++hvf_vgic_write(const char *name, uint64_t val) "vgic write to %s [val=0x%016"PRIx64"]"
--    uint32_t offset;
++hvf_vgic_read(const char *name, uint64_t val) "vgic read from %s [val=0x%016"PRIx64"]"
 -    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == 2
 +    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == MO_32
          ? !dc_isar_feature(aa32_fpsp_v2, s)
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
 -    tmp = neon_load_reg(a->vn, pass);
 -    switch (a->size) {
 -    case 0:
 -        if (offset) {
 -            tcg_gen_shri_i32(tmp, tmp, offset);
 -        }
 -        if (a->u) {
 -            gen_uxtb(tmp);
 -        } else {
 -            gen_sxtb(tmp);
 -        }
 -        break;
 -    case 1:
 -        if (a->u) {
 -            if (offset) {
 -                tcg_gen_shri_i32(tmp, tmp, 16);
 -            } else {
 -                gen_uxth(tmp);
 -            }
 -        } else {
 -            if (offset) {
 -                tcg_gen_sari_i32(tmp, tmp, 16);
 -            } else {
 -                gen_sxth(tmp);
 -            }
 -        }
 -        break;
 -    case 2:
 -        break;
 -    }
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
      store_reg(s, a->rt, tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
  {
      /* VMOV general purpose register to scalar */
 -    TCGv_i32 tmp, tmp2;
 -    int pass;
 -    uint32_t offset;
 +    TCGv_i32 tmp;
 -    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == 2
 +    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == MO_32
          ? !dc_isar_feature(aa32_fpsp_v2, s)
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
      tmp = load_reg(s, a->rt);
 -    switch (a->size) {
 -    case 0:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 1:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 2:
 -        break;
 -    }
 -    neon_store_reg(a->vn, pass, tmp);
 +    write_neon_element32(tmp, a->vn, a->index, a->size);
 +    tcg_temp_free_i32(tmp);
      return true;
  }
 --
-.20.1
+.34.1

-[PULL 08/26] target/arm: Add read/write_neon_element64
+[PULL 08/33] hw/arm/virt: Consolidate GIC finalize logic
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.
+Up to now, the finalize_gic_version() code open coded what is essentially
+a support bitmap match between host/emulation environment and desired
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+target GIC type.
-Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+This open coding leads to undesirable side effects. For example, a VM with
 KVM and -smp 10 will automatically choose GICv3 while the same command
 line with TCG will stay on GICv2 and fail the launch.
 This patch combines the TCG and KVM matching code paths by making
 everything a 2 pass process. First, we determine which GIC versions the
 current environment is able to support, then we go through a single
 state machine to determine which target GIC mode that means for us.
 After this patch, the only user noticable changes should be consolidated
 error messages as well as TCG -M virt supporting -smp > 8 automatically.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Cornelia Huck <cohuck@redhat.com>
 Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
 Message-id: 20221223090107.98888-2-agraf@csgraf.de
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          | 26 +++++++++
+ include/hw/arm/virt.h |  15 ++--
- target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
+ hw/arm/virt.c         | 198 ++++++++++++++++++++++--------------------
-files changed, 73 insertions(+), 47 deletions(-)
+files changed, 112 insertions(+), 101 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/include/hw/arm/virt.h
-+++ b/target/arm/translate.c
++++ b/include/hw/arm/virt.h
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
+@@ -XXX,XX +XXX,XX @@ typedef enum VirtMSIControllerType {
  } VirtMSIControllerType;
  typedef enum VirtGICType {
 -    VIRT_GIC_VERSION_MAX,
 -    VIRT_GIC_VERSION_HOST,
 -    VIRT_GIC_VERSION_2,
 -    VIRT_GIC_VERSION_3,
 -    VIRT_GIC_VERSION_4,
 +    VIRT_GIC_VERSION_MAX = 0,
 +    VIRT_GIC_VERSION_HOST = 1,
 +    /* The concrete GIC values have to match the GIC version number */
 +    VIRT_GIC_VERSION_2 = 2,
 +    VIRT_GIC_VERSION_3 = 3,
 +    VIRT_GIC_VERSION_4 = 4,
      VIRT_GIC_VERSION_NOSEL,
  } VirtGICType;
 +#define VIRT_GIC_VERSION_2_MASK BIT(VIRT_GIC_VERSION_2)
 +#define VIRT_GIC_VERSION_3_MASK BIT(VIRT_GIC_VERSION_3)
 +#define VIRT_GIC_VERSION_4_MASK BIT(VIRT_GIC_VERSION_4)
 +
  struct VirtMachineClass {
      MachineClass parent;
      bool disallow_affinity_adjustment;
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
      }
  }
-+static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
++static VirtGICType finalize_gic_version_do(const char *accel_name,
 +                                           VirtGICType gic_version,
 +                                           int gics_supported,
 +                                           unsigned int max_cpus)
 +{
-+    long off = neon_element_offset(reg, ele, memop);
++    /* Convert host/max/nosel to GIC version number */
-+
++    switch (gic_version) {
-+    switch (memop) {
++    case VIRT_GIC_VERSION_HOST:
-+    case MO_Q:
++        if (!kvm_enabled()) {
-+        tcg_gen_ld_i64(dest, cpu_env, off);
++            error_report("gic-version=host requires KVM");
 +            exit(1);
 +        }
 +
 +        /* For KVM, gic-version=host means gic-version=max */
 +        return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX,
 +                                       gics_supported, max_cpus);
 +    case VIRT_GIC_VERSION_MAX:
 +        if (gics_supported & VIRT_GIC_VERSION_4_MASK) {
 +            gic_version = VIRT_GIC_VERSION_4;
 +        } else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
 +            gic_version = VIRT_GIC_VERSION_3;
 +        } else {
 +            gic_version = VIRT_GIC_VERSION_2;
 +        }
 +        break;
 +    case VIRT_GIC_VERSION_NOSEL:
 +        if ((gics_supported & VIRT_GIC_VERSION_2_MASK) &&
 +            max_cpus <= GIC_NCPU) {
 +            gic_version = VIRT_GIC_VERSION_2;
 +        } else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
 +            /*
 +             * in case the host does not support v2 emulation or
 +             * the end-user requested more than 8 VCPUs we now default
 +             * to v3. In any case defaulting to v2 would be broken.
 +             */
 +            gic_version = VIRT_GIC_VERSION_3;
 +        } else if (max_cpus > GIC_NCPU) {
 +            error_report("%s only supports GICv2 emulation but more than 8 "
 +                         "vcpus are requested", accel_name);
 +            exit(1);
 +        }
 +        break;
 +    case VIRT_GIC_VERSION_2:
 +    case VIRT_GIC_VERSION_3:
 +    case VIRT_GIC_VERSION_4:
 +        break;
 +    }
 +
 +    /* Check chosen version is effectively supported */
 +    switch (gic_version) {
 +    case VIRT_GIC_VERSION_2:
 +        if (!(gics_supported & VIRT_GIC_VERSION_2_MASK)) {
 +            error_report("%s does not support GICv2 emulation", accel_name);
 +            exit(1);
 +        }
 +        break;
 +    case VIRT_GIC_VERSION_3:
 +        if (!(gics_supported & VIRT_GIC_VERSION_3_MASK)) {
 +            error_report("%s does not support GICv3 emulation", accel_name);
 +            exit(1);
 +        }
 +        break;
 +    case VIRT_GIC_VERSION_4:
 +        if (!(gics_supported & VIRT_GIC_VERSION_4_MASK)) {
 +            error_report("%s does not support GICv4 emulation, is virtualization=on?",
 +                         accel_name);
 +            exit(1);
 +        }
 +        break;
 +    default:
-+        g_assert_not_reached();
++        error_report("logic error in finalize_gic_version");
 +        exit(1);
 +        break;
 +    }
++
++    return gic_version;
 +}
 +
- static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
+ /*
   * finalize_gic_version - Determines the final gic_version
   * according to the gic-version property
@@ -XXX,XX +XXX,XX @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
   */
  static void finalize_gic_version(VirtMachineState *vms)
  {
-     long off = neon_element_offset(reg, ele, memop);
++    const char *accel_name = current_accel_name();
-@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
+     unsigned int max_cpus = MACHINE(vms)->smp.max_cpus;
 +    int gics_supported = 0;
 -    if (kvm_enabled()) {
 -        int probe_bitmap;
 +    /* Determine which GIC versions the current environment supports */
 +    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
 +        int probe_bitmap = kvm_arm_vgic_probe();
 -        if (!kvm_irqchip_in_kernel()) {
 -            switch (vms->gic_version) {
 -            case VIRT_GIC_VERSION_HOST:
 -                warn_report(
 -                    "gic-version=host not relevant with kernel-irqchip=off "
 -                     "as only userspace GICv2 is supported. Using v2 ...");
 -                return;
 -            case VIRT_GIC_VERSION_MAX:
 -            case VIRT_GIC_VERSION_NOSEL:
 -                vms->gic_version = VIRT_GIC_VERSION_2;
 -                return;
 -            case VIRT_GIC_VERSION_2:
 -                return;
 -            case VIRT_GIC_VERSION_3:
 -                error_report(
 -                    "gic-version=3 is not supported with kernel-irqchip=off");
 -                exit(1);
 -            case VIRT_GIC_VERSION_4:
 -                error_report(
 -                    "gic-version=4 is not supported with kernel-irqchip=off");
 -                exit(1);
 -            }
 -        }
 -
 -        probe_bitmap = kvm_arm_vgic_probe();
          if (!probe_bitmap) {
              error_report("Unable to determine GIC version supported by host");
              exit(1);
          }
 -        switch (vms->gic_version) {
 -        case VIRT_GIC_VERSION_HOST:
 -        case VIRT_GIC_VERSION_MAX:
 -            if (probe_bitmap & KVM_ARM_VGIC_V3) {
 -                vms->gic_version = VIRT_GIC_VERSION_3;
 -            } else {
 -                vms->gic_version = VIRT_GIC_VERSION_2;
 -            }
 -            return;
 -        case VIRT_GIC_VERSION_NOSEL:
 -            if ((probe_bitmap & KVM_ARM_VGIC_V2) && max_cpus <= GIC_NCPU) {
 -                vms->gic_version = VIRT_GIC_VERSION_2;
 -            } else if (probe_bitmap & KVM_ARM_VGIC_V3) {
 -                /*
 -                 * in case the host does not support v2 in-kernel emulation or
 -                 * the end-user requested more than 8 VCPUs we now default
 -                 * to v3. In any case defaulting to v2 would be broken.
 -                 */
 -                vms->gic_version = VIRT_GIC_VERSION_3;
 -            } else if (max_cpus > GIC_NCPU) {
 -                error_report("host only supports in-kernel GICv2 emulation "
 -                             "but more than 8 vcpus are requested");
 -                exit(1);
 -            }
 -            break;
 -        case VIRT_GIC_VERSION_2:
 -        case VIRT_GIC_VERSION_3:
 -            break;
 -        case VIRT_GIC_VERSION_4:
 -            error_report("gic-version=4 is not supported with KVM");
 -            exit(1);
 +        if (probe_bitmap & KVM_ARM_VGIC_V2) {
 +            gics_supported |= VIRT_GIC_VERSION_2_MASK;
          }
 -
 -        /* Check chosen version is effectively supported by the host */
 -        if (vms->gic_version == VIRT_GIC_VERSION_2 &&
 -            !(probe_bitmap & KVM_ARM_VGIC_V2)) {
 -            error_report("host does not support in-kernel GICv2 emulation");
 -            exit(1);
 -        } else if (vms->gic_version == VIRT_GIC_VERSION_3 &&
 -                   !(probe_bitmap & KVM_ARM_VGIC_V3)) {
 -            error_report("host does not support in-kernel GICv3 emulation");
 -            exit(1);
 +        if (probe_bitmap & KVM_ARM_VGIC_V3) {
 +            gics_supported |= VIRT_GIC_VERSION_3_MASK;
          }
 -        return;
 -    }
 -
 -    /* TCG mode */
 -    switch (vms->gic_version) {
 -    case VIRT_GIC_VERSION_NOSEL:
 -        vms->gic_version = VIRT_GIC_VERSION_2;
 -        break;
 -    case VIRT_GIC_VERSION_MAX:
 +    } else if (kvm_enabled() && !kvm_irqchip_in_kernel()) {
 +        /* KVM w/o kernel irqchip can only deal with GICv2 */
 +        gics_supported |= VIRT_GIC_VERSION_2_MASK;
 +        accel_name = "KVM with kernel-irqchip=off";
 +    } else {
 +        gics_supported |= VIRT_GIC_VERSION_2_MASK;
          if (module_object_class_by_name("arm-gicv3")) {
 -            /* CONFIG_ARM_GICV3_TCG was set */
 +            gics_supported |= VIRT_GIC_VERSION_3_MASK;
              if (vms->virt) {
                  /* GICv4 only makes sense if CPU has EL2 */
 -                vms->gic_version = VIRT_GIC_VERSION_4;
 -            } else {
 -                vms->gic_version = VIRT_GIC_VERSION_3;
 +                gics_supported |= VIRT_GIC_VERSION_4_MASK;
              }
 -        } else {
 -            vms->gic_version = VIRT_GIC_VERSION_2;
          }
 -        break;
 -    case VIRT_GIC_VERSION_HOST:
 -        error_report("gic-version=host requires KVM");
 -        exit(1);
 -    case VIRT_GIC_VERSION_4:
 -        if (!vms->virt) {
 -            error_report("gic-version=4 requires virtualization enabled");
 -            exit(1);
 -        }
 -        break;
 -    case VIRT_GIC_VERSION_2:
 -    case VIRT_GIC_VERSION_3:
 -        break;
      }
++
++    /*
++     * Then convert helpers like host/max to concrete GIC versions and ensure
++     * the desired version is supported
++     */
++    vms->gic_version = finalize_gic_version_do(accel_name, vms->gic_version,
++                                               gics_supported, max_cpus);
  }
-+static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
+ /*
 +{
 +    long off = neon_element_offset(reg, ele, memop);
 +
 +    switch (memop) {
 +    case MO_64:
 +        tcg_gen_st_i64(src, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
  static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  {
      TCGv_ptr ret = tcg_temp_new_ptr();
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
      for (pass = 0; pass < a->q + 1; pass++) {
          TCGv_i64 tmp = tcg_temp_new_i64();
 -        neon_load_reg64(tmp, a->vm + pass);
 +        read_neon_element64(tmp, a->vm, pass, MO_64);
          fn(tmp, cpu_env, tmp, constimm);
 -        neon_store_reg64(tmp, a->vd + pass);
 +        write_neon_element64(tmp, a->vd, pass, MO_64);
          tcg_temp_free_i64(tmp);
      }
      tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
      rd = tcg_temp_new_i32();
      /* Load both inputs first to avoid potential overwrite if rm == rd */
 -    neon_load_reg64(rm1, a->vm);
 -    neon_load_reg64(rm2, a->vm + 1);
 +    read_neon_element64(rm1, a->vm, 0, MO_64);
 +    read_neon_element64(rm2, a->vm, 1, MO_64);
      shiftfn(rm1, rm1, constimm);
      narrowfn(rd, cpu_env, rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
      }
 -    neon_store_reg64(tmp, a->vd);
 +    write_neon_element64(tmp, a->vd, 0, MO_64);
      widenfn(tmp, rm1);
      tcg_temp_free_i32(rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
      }
 -    neon_store_reg64(tmp, a->vd + 1);
 +    write_neon_element64(tmp, a->vd, 1, MO_64);
      tcg_temp_free_i64(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm_64 = tcg_temp_new_i64();
      if (src1_wide) {
 -        neon_load_reg64(rn0_64, a->vn);
 +        read_neon_element64(rn0_64, a->vn, 0, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
       * avoid incorrect results if a narrow input overlaps with the result.
       */
      if (src1_wide) {
 -        neon_load_reg64(rn1_64, a->vn + 1);
 +        read_neon_element64(rn1_64, a->vn, 1, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm = tcg_temp_new_i32();
      read_neon_element32(rm, a->vm, 1, MO_32);
 -    neon_store_reg64(rn0_64, a->vd);
 +    write_neon_element64(rn0_64, a->vd, 0, MO_64);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
 -    neon_store_reg64(rn1_64, a->vd + 1);
 +    write_neon_element64(rn1_64, a->vd, 1, MO_64);
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rn_64, a->vn);
 -    neon_load_reg64(rm_64, a->vm);
 +    read_neon_element64(rn_64, a->vn, 0, MO_64);
 +    read_neon_element64(rm_64, a->vm, 0, MO_64);
      opfn(rn_64, rn_64, rm_64);
      narrowfn(rd0, rn_64);
 -    neon_load_reg64(rn_64, a->vn + 1);
 -    neon_load_reg64(rm_64, a->vm + 1);
 +    read_neon_element64(rn_64, a->vn, 1, MO_64);
 +    read_neon_element64(rm_64, a->vm, 1, MO_64);
      opfn(rn_64, rn_64, rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      /* Don't store results until after all loads: they might overlap */
      if (accfn) {
          tmp = tcg_temp_new_i64();
 -        neon_load_reg64(tmp, a->vd);
 +        read_neon_element64(tmp, a->vd, 0, MO_64);
          accfn(tmp, tmp, rd0);
 -        neon_store_reg64(tmp, a->vd);
 -        neon_load_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 0, MO_64);
 +        read_neon_element64(tmp, a->vd, 1, MO_64);
          accfn(tmp, tmp, rd1);
 -        neon_store_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 1, MO_64);
          tcg_temp_free_i64(tmp);
      } else {
 -        neon_store_reg64(rd0, a->vd);
 -        neon_store_reg64(rd1, a->vd + 1);
 +        write_neon_element64(rd0, a->vd, 0, MO_64);
 +        write_neon_element64(rd1, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      if (accfn) {
          TCGv_i64 t64 = tcg_temp_new_i64();
 -        neon_load_reg64(t64, a->vd);
 +        read_neon_element64(t64, a->vd, 0, MO_64);
          accfn(t64, t64, rn0_64);
 -        neon_store_reg64(t64, a->vd);
 -        neon_load_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 0, MO_64);
 +        read_neon_element64(t64, a->vd, 1, MO_64);
          accfn(t64, t64, rn1_64);
 -        neon_store_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 1, MO_64);
          tcg_temp_free_i64(t64);
      } else {
 -        neon_store_reg64(rn0_64, a->vd);
 -        neon_store_reg64(rn1_64, a->vd + 1);
 +        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 +        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          right = tcg_temp_new_i64();
          dest = tcg_temp_new_i64();
 -        neon_load_reg64(right, a->vn);
 -        neon_load_reg64(left, a->vm);
 +        read_neon_element64(right, a->vn, 0, MO_64);
 +        read_neon_element64(left, a->vm, 0, MO_64);
          tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
 -        neon_store_reg64(dest, a->vd);
 +        write_neon_element64(dest, a->vd, 0, MO_64);
          tcg_temp_free_i64(left);
          tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          destright = tcg_temp_new_i64();
          if (a->imm < 8) {
 -            neon_load_reg64(right, a->vn);
 -            neon_load_reg64(middle, a->vn + 1);
 +            read_neon_element64(right, a->vn, 0, MO_64);
 +            read_neon_element64(middle, a->vn, 1, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
 -            neon_load_reg64(left, a->vm);
 +            read_neon_element64(left, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
          } else {
 -            neon_load_reg64(right, a->vn + 1);
 -            neon_load_reg64(middle, a->vm);
 +            read_neon_element64(right, a->vn, 1, MO_64);
 +            read_neon_element64(middle, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
 -            neon_load_reg64(left, a->vm + 1);
 +            read_neon_element64(left, a->vm, 1, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
          }
 -        neon_store_reg64(destright, a->vd);
 -        neon_store_reg64(destleft, a->vd + 1);
 +        write_neon_element64(destright, a->vd, 0, MO_64);
 +        write_neon_element64(destleft, a->vd, 1, MO_64);
          tcg_temp_free_i64(destright);
          tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          if (accfn) {
              TCGv_i64 tmp64 = tcg_temp_new_i64();
 -            neon_load_reg64(tmp64, a->vd + pass);
 +            read_neon_element64(tmp64, a->vd, pass, MO_64);
              accfn(rd_64, tmp64, rd_64);
              tcg_temp_free_i64(tmp64);
          }
 -        neon_store_reg64(rd_64, a->vd + pass);
 +        write_neon_element64(rd_64, a->vd, pass, MO_64);
          tcg_temp_free_i64(rd_64);
      }
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rm, a->vm);
 +    read_neon_element64(rm, a->vm, 0, MO_64);
      narrowfn(rd0, cpu_env, rm);
 -    neon_load_reg64(rm, a->vm + 1);
 +    read_neon_element64(rm, a->vm, 1, MO_64);
      narrowfn(rd1, cpu_env, rm);
      write_neon_element32(rd0, a->vd, 0, MO_32);
      write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd);
 +    write_neon_element64(rd, a->vd, 0, MO_64);
      widenfn(rd, rm1);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd + 1);
 +    write_neon_element64(rd, a->vd, 1, MO_64);
      tcg_temp_free_i64(rd);
      tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
      rm = tcg_temp_new_i64();
      rd = tcg_temp_new_i64();
      for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        neon_load_reg64(rm, a->vm + pass);
 -        neon_load_reg64(rd, a->vd + pass);
 -        neon_store_reg64(rm, a->vd + pass);
 -        neon_store_reg64(rd, a->vm + pass);
 +        read_neon_element64(rm, a->vm, pass, MO_64);
 +        read_neon_element64(rd, a->vd, pass, MO_64);
 +        write_neon_element64(rm, a->vd, pass, MO_64);
 +        write_neon_element64(rd, a->vm, pass, MO_64);
      }
      tcg_temp_free_i64(rm);
      tcg_temp_free_i64(rd);
 --
-.20.1
+.34.1

-[PULL 17/26] hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+[PULL 09/33] hw/arm/virt: Make accels in GIC finalize logic explicit
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: Alexander Graf <agraf@csgraf.de>
-Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
+Let's explicitly list out all accelerators that we support when trying to
-This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):
+determine the supported set of GIC versions. KVM was already separate, so
 the only missing one is HVF which simply reuses all of TCG's emulation
 code and thus has the same compatibility matrix.
-  CID 1432363 (#1 of 1): Unintentional integer overflow:
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-  overflow_before_widen:
+Reviewed-by: Cornelia Huck <cohuck@redhat.com>
-    Potentially overflowing expression 1 << scale with type int
+Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
-    (32 bits, signed) is evaluated using 32-bit arithmetic, and
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-    then used in a context that expects an expression of type
+Message-id: 20221223090107.98888-3-agraf@csgraf.de
-    hwaddr (64 bits, unsigned).
+[PMM: Added qtest to the list of accelerators]
 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Acked-by: Eric Auger <eric.auger@redhat.com>
 Message-id: 20201030144617.1535064-1-philmd@redhat.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/smmuv3.c | 3 ++-
+ hw/arm/virt.c | 7 ++++++-
-file changed, 2 insertions(+), 1 deletion(-)
+file changed, 6 insertions(+), 1 deletion(-)
-diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
+diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/smmuv3.c
+--- a/hw/arm/virt.c
-+++ b/hw/arm/smmuv3.c
++++ b/hw/arm/virt.c
 @@ -XXX,XX +XXX,XX @@
-  */
+ #include "sysemu/numa.h"
+ #include "sysemu/runstate.h"
- #include "qemu/osdep.h"
+ #include "sysemu/tpm.h"
-+#include "qemu/bitops.h"
++#include "sysemu/tcg.h"
- #include "hw/irq.h"
+ #include "sysemu/kvm.h"
- #include "hw/sysbus.h"
+ #include "sysemu/hvf.h"
- #include "migration/vmstate.h"
++#include "sysemu/qtest.h"
-@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
+ #include "hw/loader.h"
-         scale = CMD_SCALE(cmd);
+ #include "qapi/error.h"
-         num = CMD_NUM(cmd);
+ #include "qemu/bitops.h"
-         ttl = CMD_TTL(cmd);
+@@ -XXX,XX +XXX,XX @@ static void finalize_gic_version(VirtMachineState *vms)
--        num_pages = (num + 1) * (1 << (scale));
+         /* KVM w/o kernel irqchip can only deal with GICv2 */
-+        num_pages = (num + 1) * BIT_ULL(scale);
+         gics_supported |= VIRT_GIC_VERSION_2_MASK;
          accel_name = "KVM with kernel-irqchip=off";
 -    } else {
 +    } else if (tcg_enabled() || hvf_enabled() || qtest_enabled())  {
          gics_supported |= VIRT_GIC_VERSION_2_MASK;
          if (module_object_class_by_name("arm-gicv3")) {
              gics_supported |= VIRT_GIC_VERSION_3_MASK;
@@ -XXX,XX +XXX,XX @@ static void finalize_gic_version(VirtMachineState *vms)
                  gics_supported |= VIRT_GIC_VERSION_4_MASK;
              }
          }
 +    } else {
 +        error_report("Unsupported accelerator, can not determine GIC support");
 +        exit(1);
      }
-     if (type == SMMU_CMD_TLBI_NH_VA) {
+     /*
 --
-.20.1
+.34.1

-[PULL 18/26] hw/arm/boot: fix SVE for EL3 direct kernel boot
+[PULL 10/33] sbsa-ref: remove cortex-a76 from list of supported cpus
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
-When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
+Cortex-A76 supports 40bits of address space. sbsa-ref's memory
-that SVE will not trap to EL3.
+starts above this limit.
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030151541.11976-1-remi@remlab.net
+Message-id: 20230126114416.2447685-1-marcin.juszkiewicz@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/boot.c | 3 +++
+ hw/arm/sbsa-ref.c | 1 -
-file changed, 3 insertions(+)
+file changed, 1 deletion(-)
-diff --git a/hw/arm/boot.c b/hw/arm/boot.c
+diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/boot.c
+--- a/hw/arm/sbsa-ref.c
-+++ b/hw/arm/boot.c
++++ b/hw/arm/sbsa-ref.c
-@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
+@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
-                     if (cpu_isar_feature(aa64_mte, cpu)) {
+ static const char * const valid_cpus[] = {
-                         env->cp15.scr_el3 |= SCR_ATA;
+     ARM_CPU_TYPE_NAME("cortex-a57"),
-                     }
+     ARM_CPU_TYPE_NAME("cortex-a72"),
-+                    if (cpu_isar_feature(aa64_sve, cpu)) {
+-    ARM_CPU_TYPE_NAME("cortex-a76"),
-+                        env->cp15.cptr_el[3] |= CPTR_EZ;
+     ARM_CPU_TYPE_NAME("neoverse-n1"),
-+                    }
+     ARM_CPU_TYPE_NAME("max"),
-                     /* AArch64 kernels never boot in secure mode */
+ };
                      assert(!info->secure_boot);
                      /* This hook is only supported for AArch32 currently:
 --
-.20.1
+.34.1

-[PULL 14/26] target/arm: fix handling of HCR.FB
+[PULL 11/33] target/arm: Name AT_S1E1RP and AT_S1E1WP cpregs correctly
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+The encodings 0,0,C7,C9,0 and 0,0,C7,C9,1 are AT SP1E1RP and AT
 S1E1WP, but our ARMCPRegInfo definitions for them incorrectly name
 them AT S1E1R and AT S1E1W (which are entirely different
 instructions).  Fix the names.
-HCR should be applied when NS is set, not when it is cleared.
+(This has no guest-visible effect as the names are for debug purposes
 only.)
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Message-id: 20230130182459.3309057-2-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-2-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 5 ++---
+ target/arm/helper.c | 4 ++--
-file changed, 2 insertions(+), 3 deletions(-)
+file changed, 2 insertions(+), 2 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vhe_reginfo[] = {
- /*
+ #ifndef CONFIG_USER_ONLY
-  * Non-IS variants of TLB operations are upgraded to
+ static const ARMCPRegInfo ats1e1_reginfo[] = {
-- * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
+-    { .name = "AT_S1E1R", .state = ARM_CP_STATE_AA64,
-+ * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
++    { .name = "AT_S1E1RP", .state = ARM_CP_STATE_AA64,
-  * force broadcast of these operations.
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 0,
-  */
+       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
- static bool tlb_force_broadcast(CPUARMState *env)
+       .writefn = ats_write64 },
- {
+-    { .name = "AT_S1E1W", .state = ARM_CP_STATE_AA64,
--    return (env->cp15.hcr_el2 & HCR_FB) &&
++    { .name = "AT_S1E1WP", .state = ARM_CP_STATE_AA64,
--        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 1,
-+    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
+       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
- }
+       .writefn = ats_write64 },
  static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
 --
-.20.1
+.34.1

-[PULL 13/26] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
+[PULL 12/33] target/arm: Correct syndrome for ATS12NSO* at Secure EL1
-The helper functions for performing the udot/sdot operations against
+The AArch32 ATS12NSO* address translation operations are supposed to
-a scalar were not using an address-swizzling macro when converting
+trap to either EL2 or EL3 if they're executed at Secure EL1 (which
-the index of the scalar element into a pointer into the vm array.
+can only happen if EL3 is AArch64).  We implement this, but we got
-This had no effect on little-endian hosts but meant we generated
+the syndrome value wrong: like other traps to EL2 or EL3 on an
-incorrect results on big-endian hosts.
+AArch32 cpreg access, they should report the 0x3 syndrome, not the
 x0 'uncategorized' syndrome.  This is clear in the access pseudocode
 for these instructions.
-For these insns, the index is indexing over group of 4 8-bit values,
+Fix the syndrome value for these operations by correcting the
-so 32 bits per indexed entity, and H4() is therefore what we want.
+returned value from the ats_access() function.
 (For Neon the only possible input indexes are 0 and 1.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Tested-by: Fuad Tabba <tabba@google.com>
-Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
+Message-id: 20230130182459.3309057-3-peter.maydell@linaro.org
 Message-id: 20230127175507.2895013-3-peter.maydell@linaro.org
 ---
- target/arm/vec_helper.c | 4 ++--
+ target/arm/helper.c | 4 ++--
 file changed, 2 insertions(+), 2 deletions(-)
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
+--- a/target/arm/helper.c
-+++ b/target/arm/vec_helper.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
+@@ -XXX,XX +XXX,XX @@ static CPAccessResult ats_access(CPUARMState *env, const ARMCPRegInfo *ri,
-     intptr_t index = simd_data(desc);
+         if (arm_current_el(env) == 1) {
-     uint32_t *d = vd;
+             if (arm_is_secure_below_el3(env)) {
-     int8_t *n = vn;
+                 if (env->cp15.scr_el3 & SCR_EEL2) {
--    int8_t *m_indexed = (int8_t *)vm + index * 4;
+-                    return CP_ACCESS_TRAP_UNCATEGORIZED_EL2;
-+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
++                    return CP_ACCESS_TRAP_EL2;
+                 }
-     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
+-                return CP_ACCESS_TRAP_UNCATEGORIZED_EL3;
-      * Otherwise opr_sz is a multiple of 16.
++                return CP_ACCESS_TRAP_EL3;
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
+             }
-     intptr_t index = simd_data(desc);
+             return CP_ACCESS_TRAP_UNCATEGORIZED;
-     uint32_t *d = vd;
+         }
      uint8_t *n = vn;
 -    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
 +    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
      /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
       * Otherwise opr_sz is a multiple of 16.
 --
-.20.1
+.34.1

-New patch
+[PULL 13/33] target/arm: Remove CP_ACCESS_TRAP_UNCATEGORIZED_{EL2, EL3}
+We added the CPAccessResult values CP_ACCESS_TRAP_UNCATEGORIZED_EL2
+and CP_ACCESS_TRAP_UNCATEGORIZED_EL3 purely in order to use them in
+the ats_access() function, but doing so was incorrect (a bug fixed in
+a previous commit).  There aren't any cases where we want an access
+function to be able to request a trap to EL2 or EL3 with a zero
+syndrome value, so remove these enum values.
+As well as cleaning up dead code, the motivation here is that
+we'd like to implement fine-grained-trap handling in
+helper_access_check_cp_reg(). Although the fine-grained traps
+to EL2 are always lower priority than trap-to-same-EL and
+higher priority than trap-to-EL3, they are in the middle of
+various other kinds of trap-to-EL2. Knowing that a trap-to-EL2
+must always for us have the same syndrome (ie that an access
+function will return CP_ACCESS_TRAP_EL2 and there is no other
+kind of trap-to-EL2 enum value) means we don't have to try
+to choose which of the two syndrome values to report if the
+access would trap to EL2 both for the fine-grained-trap and
+because the access function requires it.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Message-id: 20230130182459.3309057-4-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-4-peter.maydell@linaro.org
+---
+ target/arm/cpregs.h    | 4 ++--
+ target/arm/op_helper.c | 2 ++
+files changed, 4 insertions(+), 2 deletions(-)
+diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpregs.h
++++ b/target/arm/cpregs.h
+@@ -XXX,XX +XXX,XX @@ typedef enum CPAccessResult {
+      * Access fails and results in an exception syndrome 0x0 ("uncategorized").
+      * Note that this is not a catch-all case -- the set of cases which may
+      * result in this failure is specifically defined by the architecture.
++     * This trap is always to the usual target EL, never directly to a
++     * specified target EL.
+      */
+     CP_ACCESS_TRAP_UNCATEGORIZED = (2 << 2),
+-    CP_ACCESS_TRAP_UNCATEGORIZED_EL2 = CP_ACCESS_TRAP_UNCATEGORIZED | 2,
+-    CP_ACCESS_TRAP_UNCATEGORIZED_EL3 = CP_ACCESS_TRAP_UNCATEGORIZED | 3,
+ } CPAccessResult;
+ typedef struct ARMCPRegInfo ARMCPRegInfo;
+diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/op_helper.c
++++ b/target/arm/op_helper.c
+@@ -XXX,XX +XXX,XX @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, uint32_t key,
+     case CP_ACCESS_TRAP:
+         break;
+     case CP_ACCESS_TRAP_UNCATEGORIZED:
++        /* Only CP_ACCESS_TRAP traps are direct to a specified EL */
++        assert((res & CP_ACCESS_EL_MASK) == 0);
+         if (cpu_isar_feature(aa64_ids, cpu) && isread &&
+             arm_cpreg_in_idspace(ri)) {
+             /*
+--
+.34.1

-[PULL 04/26] target/arm: Use neon_element_offset in vfp_reg_offset
+[PULL 14/33] target/arm: Move do_coproc_insn() syndrome calculation earlier
-From: Richard Henderson <richard.henderson@linaro.org>
+Rearrange the code in do_coproc_insn() so that we calculate the
 syndrome value for a potential trap early; we're about to add a
 second check that wants this value earlier than where it is currently
 determined.
-This seems a bit more readable than using offsetof CPU_DoubleU.
+(Specifically, a trap to EL2 because of HSTR_EL2 should take
 priority over an UNDEF to EL1, even when the UNDEF is because
 the register does not exist at all or because its ri->access
 bits non-configurably fail the access. So the check we put in
 for HSTR_EL2 trapping at EL1 (which needs the syndrome) is
 going to have to be done before the check "is the ARMCPRegInfo
 pointer NULL".)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+This commit is just code motion; the change to HSTR_EL2
-Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
+handling that will use the 'syndrome' variable is in a
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+subsequent commit.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Message-id: 20230130182459.3309057-5-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-5-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 13 ++++---------
+ target/arm/translate.c | 83 +++++++++++++++++++++---------------------
-file changed, 4 insertions(+), 9 deletions(-)
+file changed, 41 insertions(+), 42 deletions(-)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
+@@ -XXX,XX +XXX,XX @@ static void do_coproc_insn(DisasContext *s, int cpnum, int is64,
-     return neon_full_reg_offset(reg) + ofs;
+     const ARMCPRegInfo *ri = get_arm_cp_reginfo(s->cp_regs, key);
- }
+     TCGv_ptr tcg_ri = NULL;
+     bool need_exit_tb;
--static inline long vfp_reg_offset(bool dp, unsigned reg)
++    uint32_t syndrome;
-+/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
++
-+static long vfp_reg_offset(bool dp, unsigned reg)
++    /*
- {
++     * Note that since we are an implementation which takes an
-     if (dp) {
++     * exception on a trapped conditional instruction only if the
--        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
++     * instruction passes its condition code check, we can take
-+        return neon_element_offset(reg, 0, MO_64);
++     * advantage of the clause in the ARM ARM that allows us to set
-     } else {
++     * the COND field in the instruction to 0xE in all cases.
--        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
++     * We could fish the actual condition out of the insn (ARM)
--        if (reg & 1) {
++     * or the condexec bits (Thumb) but it isn't necessary.
--            ofs += offsetof(CPU_DoubleU, l.upper);
++     */
--        } else {
++    switch (cpnum) {
--            ofs += offsetof(CPU_DoubleU, l.lower);
++    case 14:
 +        if (is64) {
 +            syndrome = syn_cp14_rrt_trap(1, 0xe, opc1, crm, rt, rt2,
 +                                         isread, false);
 +        } else {
 +            syndrome = syn_cp14_rt_trap(1, 0xe, opc1, opc2, crn, crm,
 +                                        rt, isread, false);
 +        }
 +        break;
 +    case 15:
 +        if (is64) {
 +            syndrome = syn_cp15_rrt_trap(1, 0xe, opc1, crm, rt, rt2,
 +                                         isread, false);
 +        } else {
 +            syndrome = syn_cp15_rt_trap(1, 0xe, opc1, opc2, crn, crm,
 +                                        rt, isread, false);
 +        }
 +        break;
 +    default:
 +        /*
 +         * ARMv8 defines that only coprocessors 14 and 15 exist,
 +         * so this can only happen if this is an ARMv7 or earlier CPU,
 +         * in which case the syndrome information won't actually be
 +         * guest visible.
 +         */
 +        assert(!arm_dc_feature(s, ARM_FEATURE_V8));
 +        syndrome = syn_uncategorized();
 +        break;
 +    }
      if (!ri) {
          /*
@@ -XXX,XX +XXX,XX @@ static void do_coproc_insn(DisasContext *s, int cpnum, int is64,
           * Note that on XScale all cp0..c13 registers do an access check
           * call in order to handle c15_cpar.
           */
 -        uint32_t syndrome;
 -
 -        /*
 -         * Note that since we are an implementation which takes an
 -         * exception on a trapped conditional instruction only if the
 -         * instruction passes its condition code check, we can take
 -         * advantage of the clause in the ARM ARM that allows us to set
 -         * the COND field in the instruction to 0xE in all cases.
 -         * We could fish the actual condition out of the insn (ARM)
 -         * or the condexec bits (Thumb) but it isn't necessary.
 -         */
 -        switch (cpnum) {
 -        case 14:
 -            if (is64) {
 -                syndrome = syn_cp14_rrt_trap(1, 0xe, opc1, crm, rt, rt2,
 -                                             isread, false);
 -            } else {
 -                syndrome = syn_cp14_rt_trap(1, 0xe, opc1, opc2, crn, crm,
 -                                            rt, isread, false);
 -            }
 -            break;
 -        case 15:
 -            if (is64) {
 -                syndrome = syn_cp15_rrt_trap(1, 0xe, opc1, crm, rt, rt2,
 -                                             isread, false);
 -            } else {
 -                syndrome = syn_cp15_rt_trap(1, 0xe, opc1, opc2, crn, crm,
 -                                            rt, isread, false);
 -            }
 -            break;
 -        default:
 -            /*
 -             * ARMv8 defines that only coprocessors 14 and 15 exist,
 -             * so this can only happen if this is an ARMv7 or earlier CPU,
 -             * in which case the syndrome information won't actually be
 -             * guest visible.
 -             */
 -            assert(!arm_dc_feature(s, ARM_FEATURE_V8));
 -            syndrome = syn_uncategorized();
 -            break;
 -        }
--        return ofs;
+-
-+        return neon_element_offset(reg >> 1, reg & 1, MO_32);
+         gen_set_condexec(s);
-     }
+         gen_update_pc(s, 0);
- }
+         tcg_ri = tcg_temp_new_ptr();
 --
-.20.1
+.34.1

-[PULL 19/26] hw/display/omap_lcdc: Fix potential NULL pointer dereference
+[PULL 15/33] target/arm: All UNDEF-at-EL0 traps take priority over HSTR_EL2 traps
-From: AlexChen <alex.chen@huawei.com>
+The HSTR_EL2 register has a collection of trap bits which allow
 trapping to EL2 for AArch32 EL0 or EL1 accesses to coprocessor
 registers.  The specification of these bits is that when the bit is
 set we should trap
  * EL1 accesses
  * EL0 accesses, if the access is not UNDEFINED when the
    trap bit is 0
-In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
+In other words, all UNDEF traps from EL0 to EL1 take precedence over
-being check if it is valid, which may lead to NULL pointer dereference.
+the HSTR_EL2 trap to EL2.  (Since this is all AArch32, the only kind
-So move the assignment to surface after checking that the omap_lcd is valid
+of trap-to-EL1 is the UNDEF.)
 and move surface_bits_per_pixel(surface) to after the surface assignment.
-Reported-by: Euler Robot <euler.robot@huawei.com>
+Our implementation doesn't quite get this right -- we check for traps
-Signed-off-by: AlexChen <alex.chen@huawei.com>
+in the order:
-Message-id: 5F9CDB8A.9000001@huawei.com
+ * no such register
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+ * ARMCPRegInfo::access bits
  * HSTR_EL2 trap bits
  * ARMCPRegInfo::accessfn
 So UNDEFs that happen because of the access bits or because the
 register doesn't exist at all correctly take priority over the
 HSTR_EL2 trap, but where a register can UNDEF at EL0 because of the
 accessfn we are incorrectly always taking the HSTR_EL2 trap.  There
 aren't many of these, but one example is the PMCR; if you look at the
 access pseudocode for this register you can see that UNDEFs taken
 because of the value of PMUSERENR.EN are checked before the HSTR_EL2
 bit.
 Rearrange helper_access_check_cp_reg() so that we always call the
 accessfn, and use its return value if it indicates that the access
 traps to EL0 rather than continuing to do the HSTR_EL2 check.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Message-id: 20230130182459.3309057-6-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-6-peter.maydell@linaro.org
 ---
- hw/display/omap_lcdc.c | 10 +++++++---
+ target/arm/op_helper.c | 21 ++++++++++++++++-----
-file changed, 7 insertions(+), 3 deletions(-)
+file changed, 16 insertions(+), 5 deletions(-)
-diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
+diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/omap_lcdc.c
+--- a/target/arm/op_helper.c
-+++ b/hw/display/omap_lcdc.c
++++ b/target/arm/op_helper.c
-@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
+@@ -XXX,XX +XXX,XX @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, uint32_t key,
- static void omap_update_display(void *opaque)
+         goto fail;
- {
+     }
-     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
--    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
++    if (ri->accessfn) {
-+    DisplaySurface *surface;
++        res = ri->accessfn(env, ri, isread);
      draw_line_func draw_line;
      int size, height, first, last;
      int width, linesize, step, bpp, frame_offset;
      hwaddr frame_base;
 -    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
 -        !surface_bits_per_pixel(surface)) {
 +    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
 +        return;
 +    }
 +
-+    surface = qemu_console_surface(omap_lcd->con);
+     /*
-+    if (!surface_bits_per_pixel(surface)) {
+-     * Check for an EL2 trap due to HSTR_EL2. We expect EL0 accesses
-         return;
+-     * to sysregs non accessible at EL0 to have UNDEF-ed already.
 +     * If the access function indicates a trap from EL0 to EL1 then
 +     * that always takes priority over the HSTR_EL2 trap. (If it indicates
 +     * a trap to EL3, then the HSTR_EL2 trap takes priority; if it indicates
 +     * a trap to EL2, then the syndrome is the same either way so we don't
 +     * care whether technically the architecture says that HSTR_EL2 trap or
 +     * the other trap takes priority. So we take the "check HSTR_EL2" path
 +     * for all of those cases.)
       */
 +    if (res != CP_ACCESS_OK && ((res & CP_ACCESS_EL_MASK) == 0) &&
 +        arm_current_el(env) == 0) {
 +        goto fail;
 +    }
 +
      if (!is_a64(env) && arm_current_el(env) < 2 && ri->cp == 15 &&
          (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
          uint32_t mask = 1 << ri->crn;
@@ -XXX,XX +XXX,XX @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, uint32_t key,
          }
      }
+-    if (ri->accessfn) {
+-        res = ri->accessfn(env, ri, isread);
+-    }
+     if (likely(res == CP_ACCESS_OK)) {
+         return ri;
+     }
 --
-.20.1
+.34.1

-[PULL 11/26] target/arm: Improve do_prewiden_3d
+[PULL 16/33] target/arm: Make HSTR_EL2 traps take priority over UNDEF-at-EL1
-From: Richard Henderson <richard.henderson@linaro.org>
+The semantics of HSTR_EL2 require that it traps cpreg accesses
 to EL2 for:
  * EL1 accesses
  * EL0 accesses, if the access is not UNDEFINED when the
    trap bit is 0
-We can use proper widening loads to extend 32-bit inputs,
+(You can see this in the I_ZFGJP priority ordering, where HSTR_EL2
-and skip the "widenfn" step.
+traps from EL1 to EL2 are priority 12, UNDEFs are priority 13, and
 HSTR_EL2 traps from EL0 are priority 15.)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+However, we don't get this right for EL1 accesses which UNDEF because
-Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
+the register doesn't exist at all or because its ri->access bits
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+non-configurably forbid the access.  At EL1, check for the HSTR_EL2
 trap early, before either of these UNDEF reasons.
 We have to retain the HSTR_EL2 check in access_check_cp_reg(),
 because at EL0 any kind of UNDEF-to-EL1 (including "no such
 register", "bad ri->access" and "ri->accessfn returns 'trap to EL1'")
 takes precedence over the trap to EL2.  But we only need to do that
 check for EL0 now.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230130182459.3309057-7-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-7-peter.maydell@linaro.org
 ---
- target/arm/translate.c          |  6 +++
+ target/arm/op_helper.c |  6 +++++-
- target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
+ target/arm/translate.c | 28 +++++++++++++++++++++++++++-
-files changed, 43 insertions(+), 29 deletions(-)
+files changed, 32 insertions(+), 2 deletions(-)
+diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/op_helper.c
++++ b/target/arm/op_helper.c
+@@ -XXX,XX +XXX,XX @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, uint32_t key,
+         goto fail;
+     }
+-    if (!is_a64(env) && arm_current_el(env) < 2 && ri->cp == 15 &&
++    /*
++     * HSTR_EL2 traps from EL1 are checked earlier, in generated code;
++     * we only need to check here for traps from EL0.
++     */
++    if (!is_a64(env) && arm_current_el(env) == 0 && ri->cp == 15 &&
+         (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
+         uint32_t mask = 1 << ri->crn;
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+@@ -XXX,XX +XXX,XX @@ static void do_coproc_insn(DisasContext *s, int cpnum, int is64,
      long off = neon_element_offset(reg, ele, memop);
      switch (memop) {
 +    case MO_SL:
 +        tcg_gen_ld32s_i64(dest, cpu_env, off);
 +        break;
 +    case MO_UL:
 +        tcg_gen_ld32u_i64(dest, cpu_env, off);
 +        break;
      case MO_Q:
          tcg_gen_ld_i64(dest, cpu_env, off);
          break;
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
-+++ b/target/arm/translate-neon.c.inc
-@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
- static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
-                            NeonGenWidenFn *widenfn,
-                            NeonGenTwo64OpFn *opfn,
--                           bool src1_wide)
-+                           int src1_mop, int src2_mop)
- {
-     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
-     TCGv_i64 rn0_64, rn1_64, rm_64;
--    TCGv_i32 rm;
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-         return false;
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
-         return false;
      }
--    if (!widenfn || !opfn) {
++    if (s->hstr_active && cpnum == 15 && s->current_el == 1) {
-+    if (!opfn) {
++        /*
-         /* size == 3 case, which is an entirely different insn group */
++         * At EL1, check for a HSTR_EL2 trap, which must take precedence
-         return false;
++         * over the UNDEF for "no such register" or the UNDEF for "access
 +         * permissions forbid this EL1 access". HSTR_EL2 traps from EL0
 +         * only happen if the cpreg doesn't UNDEF at EL0, so we do those in
 +         * access_check_cp_reg(), after the checks for whether the access
 +         * configurably trapped to EL1.
 +         */
 +        uint32_t maskbit = is64 ? crm : crn;
 +
 +        if (maskbit != 4 && maskbit != 14) {
 +            /* T4 and T14 are RES0 so never cause traps */
 +            TCGv_i32 t;
 +            DisasLabel over = gen_disas_label(s);
 +
 +            t = load_cpu_offset(offsetoflow32(CPUARMState, cp15.hstr_el2));
 +            tcg_gen_andi_i32(t, t, 1u << maskbit);
 +            tcg_gen_brcondi_i32(TCG_COND_EQ, t, 0, over.label);
 +            tcg_temp_free_i32(t);
 +
 +            gen_exception_insn(s, 0, EXCP_UDEF, syndrome);
 +            set_disas_label(s, over);
 +        }
 +    }
 +
      if (!ri) {
          /*
           * Unknown register; this might be a guest error or a QEMU
@@ -XXX,XX +XXX,XX @@ static void do_coproc_insn(DisasContext *s, int cpnum, int is64,
          return;
      }
--    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+-    if (s->hstr_active || ri->accessfn ||
-+    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
++    if ((s->hstr_active && s->current_el == 0) || ri->accessfn ||
-         return false;
+         (arm_dc_feature(s, ARM_FEATURE_XSCALE) && cpnum < 14)) {
-     }
+         /*
+          * Emit code to perform further access permissions checks at
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rn1_64 = tcg_temp_new_i64();
      rm_64 = tcg_temp_new_i64();
 -    if (src1_wide) {
 -        read_neon_element64(rn0_64, a->vn, 0, MO_64);
 +    if (src1_mop >= 0) {
 +        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = tcg_temp_new_i32();
 -    read_neon_element32(rm, a->vm, 0, MO_32);
 +    if (src2_mop >= 0) {
 +        read_neon_element64(rm_64, a->vm, 0, src2_mop);
 +    } else {
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, 0, MO_32);
 +        widenfn(rm_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
 -    widenfn(rm_64, rm);
 -    tcg_temp_free_i32(rm);
      opfn(rn0_64, rn0_64, rm_64);
      /*
       * Load second pass inputs before storing the first pass result, to
       * avoid incorrect results if a narrow input overlaps with the result.
       */
 -    if (src1_wide) {
 -        read_neon_element64(rn1_64, a->vn, 1, MO_64);
 +    if (src1_mop >= 0) {
 +        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = tcg_temp_new_i32();
 -    read_neon_element32(rm, a->vm, 1, MO_32);
 +    if (src2_mop >= 0) {
 +        read_neon_element64(rm_64, a->vm, 1, src2_mop);
 +    } else {
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, 1, MO_32);
 +        widenfn(rm_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
      write_neon_element64(rn0_64, a->vd, 0, MO_64);
 -    widenfn(rm_64, rm);
 -    tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
      write_neon_element64(rn1_64, a->vd, 1, MO_64);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      return true;
  }
 -#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
 +#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
      static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
      {                                                                   \
          static NeonGenWidenFn * const widenfn[] = {                     \
              gen_helper_neon_widen_##S##8,                               \
              gen_helper_neon_widen_##S##16,                              \
 -            tcg_gen_##EXT##_i32_i64,                                    \
 -            NULL,                                                       \
 +            NULL, NULL,                                                 \
          };                                                              \
          static NeonGenTwo64OpFn * const addfn[] = {                     \
              gen_helper_neon_##OP##l_u16,                                \
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
              tcg_gen_##OP##_i64,                                         \
              NULL,                                                       \
          };                                                              \
 -        return do_prewiden_3d(s, a, widenfn[a->size],                   \
 -                              addfn[a->size], SRC1WIDE);                \
 +        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
 +        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
 +                              SRC1WIDE ? MO_Q : narrow_mop,             \
 +                              narrow_mop);                              \
      }
 -DO_PREWIDEN(VADDL_S, s, ext, add, false)
 -DO_PREWIDEN(VADDL_U, u, extu, add, false)
 -DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
 -DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
 -DO_PREWIDEN(VADDW_S, s, ext, add, true)
 -DO_PREWIDEN(VADDW_U, u, extu, add, true)
 -DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
 -DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
 +DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
 +DO_PREWIDEN(VADDL_U, u, add, false, 0)
 +DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
 +DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
 +DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
 +DO_PREWIDEN(VADDW_U, u, add, true, 0)
 +DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
 +DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
  static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
                           NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
 --
-.20.1
+.34.1

-[PULL 12/26] target/arm: Fix float16 pairwise Neon ops on big-endian hosts
+[PULL 17/33] target/arm: Disable HSTR_EL2 traps if EL2 is not enabled
-In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
+The HSTR_EL2 register is not supposed to have an effect unless EL2 is
-meant we were using the H4() address swizzler macro rather than the
+enabled in the current security state.  We weren't checking for this,
-H2() which is required for 2-byte data.  This had no effect on
+which meant that if the guest set up the HSTR_EL2 register we would
-little-endian hosts but meant we put the result data into the
+incorrectly trap even for accesses from Secure EL0 and EL1.
-destination Dreg in the wrong order on big-endian hosts.
 Add the missing checks. (Other places where we look at HSTR_EL2
 for the not-in-v8A bits TTEE and TJDBX are already checking that
 we are in NS EL0 or EL1, so there we alredy know EL2 is enabled.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Tested-by: Fuad Tabba <tabba@google.com>
-Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
+Message-id: 20230130182459.3309057-8-peter.maydell@linaro.org
 Message-id: 20230127175507.2895013-8-peter.maydell@linaro.org
 ---
- target/arm/vec_helper.c | 8 ++++----
+ target/arm/helper.c    | 2 +-
-file changed, 4 insertions(+), 4 deletions(-)
+ target/arm/op_helper.c | 1 +
 files changed, 2 insertions(+), 1 deletion(-)
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
+--- a/target/arm/helper.c
-+++ b/target/arm/vec_helper.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
+@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el,
-         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
+         DP_TBFLAG_A32(flags, VFPEN, 1);
          r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
                                                                          \
 -        d[H4(0)] = r0;                                                  \
 -        d[H4(1)] = r1;                                                  \
 -        d[H4(2)] = r2;                                                  \
 -        d[H4(3)] = r3;                                                  \
 +        d[H2(0)] = r0;                                                  \
 +        d[H2(1)] = r1;                                                  \
 +        d[H2(2)] = r2;                                                  \
 +        d[H2(3)] = r3;                                                  \
      }
- DO_NEON_PAIRWISE(neon_padd, add)
+-    if (el < 2 && env->cp15.hstr_el2 &&
 +    if (el < 2 && env->cp15.hstr_el2 && arm_is_el2_enabled(env) &&
          (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
          DP_TBFLAG_A32(flags, HSTR_ACTIVE, 1);
      }
 diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/op_helper.c
 +++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, uint32_t key,
       * we only need to check here for traps from EL0.
       */
      if (!is_a64(env) && arm_current_el(env) == 0 && ri->cp == 15 &&
 +        arm_is_el2_enabled(env) &&
          (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
          uint32_t mask = 1 << ri->crn;
 --
-.20.1
+.34.1

-New patch
+[PULL 18/33] target/arm: Define the FEAT_FGT registers
+Define the system registers which are provided by the
 FEAT_FGT fine-grained trap architectural feature:
  HFGRTR_EL2, HFGWTR_EL2, HDFGRTR_EL2, HDFGWTR_EL2, HFGITR_EL2
 All these registers are a set of bit fields, where each bit is set
 for a trap and clear to not trap on a particular system register
 access.  The R and W register pairs are for system registers,
 allowing trapping to be done separately for reads and writes; the I
 register is for system instructions where trapping is on instruction
 execution.
 The data storage in the CPU state struct is arranged as a set of
 arrays rather than separate fields so that when we're looking up the
 bits for a system register access we can just index into the array
 rather than having to use a switch to select a named struct member.
 The later FEAT_FGT2 will add extra elements to these arrays.
 The field definitions for the new registers are in cpregs.h because
 in practice the code that needs them is code that also needs
 the cpregs information; cpu.h is included in a lot more files.
 We're also going to add some FGT-specific definitions to cpregs.h
 in the next commit.
 We do not implement HAFGRTR_EL2, because we don't implement
 FEAT_AMUv1.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Tested-by: Fuad Tabba <tabba@google.com>
 Message-id: 20230130182459.3309057-9-peter.maydell@linaro.org
 Message-id: 20230127175507.2895013-9-peter.maydell@linaro.org
 ---
  target/arm/cpregs.h | 285 ++++++++++++++++++++++++++++++++++++++++++++
  target/arm/cpu.h    |  15 +++
  target/arm/helper.c |  40 +++++++
 files changed, 340 insertions(+)
 diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpregs.h
 +++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum CPAccessResult {
      CP_ACCESS_TRAP_UNCATEGORIZED = (2 << 2),
  } CPAccessResult;
 +/* Indexes into fgt_read[] */
 +#define FGTREG_HFGRTR 0
 +#define FGTREG_HDFGRTR 1
 +/* Indexes into fgt_write[] */
 +#define FGTREG_HFGWTR 0
 +#define FGTREG_HDFGWTR 1
 +/* Indexes into fgt_exec[] */
 +#define FGTREG_HFGITR 0
 +
 +FIELD(HFGRTR_EL2, AFSR0_EL1, 0, 1)
 +FIELD(HFGRTR_EL2, AFSR1_EL1, 1, 1)
 +FIELD(HFGRTR_EL2, AIDR_EL1, 2, 1)
 +FIELD(HFGRTR_EL2, AMAIR_EL1, 3, 1)
 +FIELD(HFGRTR_EL2, APDAKEY, 4, 1)
 +FIELD(HFGRTR_EL2, APDBKEY, 5, 1)
 +FIELD(HFGRTR_EL2, APGAKEY, 6, 1)
 +FIELD(HFGRTR_EL2, APIAKEY, 7, 1)
 +FIELD(HFGRTR_EL2, APIBKEY, 8, 1)
 +FIELD(HFGRTR_EL2, CCSIDR_EL1, 9, 1)
 +FIELD(HFGRTR_EL2, CLIDR_EL1, 10, 1)
 +FIELD(HFGRTR_EL2, CONTEXTIDR_EL1, 11, 1)
 +FIELD(HFGRTR_EL2, CPACR_EL1, 12, 1)
 +FIELD(HFGRTR_EL2, CSSELR_EL1, 13, 1)
 +FIELD(HFGRTR_EL2, CTR_EL0, 14, 1)
 +FIELD(HFGRTR_EL2, DCZID_EL0, 15, 1)
 +FIELD(HFGRTR_EL2, ESR_EL1, 16, 1)
 +FIELD(HFGRTR_EL2, FAR_EL1, 17, 1)
 +FIELD(HFGRTR_EL2, ISR_EL1, 18, 1)
 +FIELD(HFGRTR_EL2, LORC_EL1, 19, 1)
 +FIELD(HFGRTR_EL2, LOREA_EL1, 20, 1)
 +FIELD(HFGRTR_EL2, LORID_EL1, 21, 1)
 +FIELD(HFGRTR_EL2, LORN_EL1, 22, 1)
 +FIELD(HFGRTR_EL2, LORSA_EL1, 23, 1)
 +FIELD(HFGRTR_EL2, MAIR_EL1, 24, 1)
 +FIELD(HFGRTR_EL2, MIDR_EL1, 25, 1)
 +FIELD(HFGRTR_EL2, MPIDR_EL1, 26, 1)
 +FIELD(HFGRTR_EL2, PAR_EL1, 27, 1)
 +FIELD(HFGRTR_EL2, REVIDR_EL1, 28, 1)
 +FIELD(HFGRTR_EL2, SCTLR_EL1, 29, 1)
 +FIELD(HFGRTR_EL2, SCXTNUM_EL1, 30, 1)
 +FIELD(HFGRTR_EL2, SCXTNUM_EL0, 31, 1)
 +FIELD(HFGRTR_EL2, TCR_EL1, 32, 1)
 +FIELD(HFGRTR_EL2, TPIDR_EL1, 33, 1)
 +FIELD(HFGRTR_EL2, TPIDRRO_EL0, 34, 1)
 +FIELD(HFGRTR_EL2, TPIDR_EL0, 35, 1)
 +FIELD(HFGRTR_EL2, TTBR0_EL1, 36, 1)
 +FIELD(HFGRTR_EL2, TTBR1_EL1, 37, 1)
 +FIELD(HFGRTR_EL2, VBAR_EL1, 38, 1)
 +FIELD(HFGRTR_EL2, ICC_IGRPENN_EL1, 39, 1)
 +FIELD(HFGRTR_EL2, ERRIDR_EL1, 40, 1)
 +FIELD(HFGRTR_EL2, ERRSELR_EL1, 41, 1)
 +FIELD(HFGRTR_EL2, ERXFR_EL1, 42, 1)
 +FIELD(HFGRTR_EL2, ERXCTLR_EL1, 43, 1)
 +FIELD(HFGRTR_EL2, ERXSTATUS_EL1, 44, 1)
 +FIELD(HFGRTR_EL2, ERXMISCN_EL1, 45, 1)
 +FIELD(HFGRTR_EL2, ERXPFGF_EL1, 46, 1)
 +FIELD(HFGRTR_EL2, ERXPFGCTL_EL1, 47, 1)
 +FIELD(HFGRTR_EL2, ERXPFGCDN_EL1, 48, 1)
 +FIELD(HFGRTR_EL2, ERXADDR_EL1, 49, 1)
 +FIELD(HFGRTR_EL2, NACCDATA_EL1, 50, 1)
 +/* 51-53: RES0 */
 +FIELD(HFGRTR_EL2, NSMPRI_EL1, 54, 1)
 +FIELD(HFGRTR_EL2, NTPIDR2_EL0, 55, 1)
 +/* 56-63: RES0 */
 +
 +/* These match HFGRTR but bits for RO registers are RES0 */
 +FIELD(HFGWTR_EL2, AFSR0_EL1, 0, 1)
 +FIELD(HFGWTR_EL2, AFSR1_EL1, 1, 1)
 +FIELD(HFGWTR_EL2, AMAIR_EL1, 3, 1)
 +FIELD(HFGWTR_EL2, APDAKEY, 4, 1)
 +FIELD(HFGWTR_EL2, APDBKEY, 5, 1)
 +FIELD(HFGWTR_EL2, APGAKEY, 6, 1)
 +FIELD(HFGWTR_EL2, APIAKEY, 7, 1)
 +FIELD(HFGWTR_EL2, APIBKEY, 8, 1)
 +FIELD(HFGWTR_EL2, CONTEXTIDR_EL1, 11, 1)
 +FIELD(HFGWTR_EL2, CPACR_EL1, 12, 1)
 +FIELD(HFGWTR_EL2, CSSELR_EL1, 13, 1)
 +FIELD(HFGWTR_EL2, ESR_EL1, 16, 1)
 +FIELD(HFGWTR_EL2, FAR_EL1, 17, 1)
 +FIELD(HFGWTR_EL2, LORC_EL1, 19, 1)
 +FIELD(HFGWTR_EL2, LOREA_EL1, 20, 1)
 +FIELD(HFGWTR_EL2, LORN_EL1, 22, 1)
 +FIELD(HFGWTR_EL2, LORSA_EL1, 23, 1)
 +FIELD(HFGWTR_EL2, MAIR_EL1, 24, 1)
 +FIELD(HFGWTR_EL2, PAR_EL1, 27, 1)
 +FIELD(HFGWTR_EL2, SCTLR_EL1, 29, 1)
 +FIELD(HFGWTR_EL2, SCXTNUM_EL1, 30, 1)
 +FIELD(HFGWTR_EL2, SCXTNUM_EL0, 31, 1)
 +FIELD(HFGWTR_EL2, TCR_EL1, 32, 1)
 +FIELD(HFGWTR_EL2, TPIDR_EL1, 33, 1)
 +FIELD(HFGWTR_EL2, TPIDRRO_EL0, 34, 1)
 +FIELD(HFGWTR_EL2, TPIDR_EL0, 35, 1)
 +FIELD(HFGWTR_EL2, TTBR0_EL1, 36, 1)
 +FIELD(HFGWTR_EL2, TTBR1_EL1, 37, 1)
 +FIELD(HFGWTR_EL2, VBAR_EL1, 38, 1)
 +FIELD(HFGWTR_EL2, ICC_IGRPENN_EL1, 39, 1)
 +FIELD(HFGWTR_EL2, ERRSELR_EL1, 41, 1)
 +FIELD(HFGWTR_EL2, ERXCTLR_EL1, 43, 1)
 +FIELD(HFGWTR_EL2, ERXSTATUS_EL1, 44, 1)
 +FIELD(HFGWTR_EL2, ERXMISCN_EL1, 45, 1)
 +FIELD(HFGWTR_EL2, ERXPFGCTL_EL1, 47, 1)
 +FIELD(HFGWTR_EL2, ERXPFGCDN_EL1, 48, 1)
 +FIELD(HFGWTR_EL2, ERXADDR_EL1, 49, 1)
 +FIELD(HFGWTR_EL2, NACCDATA_EL1, 50, 1)
 +FIELD(HFGWTR_EL2, NSMPRI_EL1, 54, 1)
 +FIELD(HFGWTR_EL2, NTPIDR2_EL0, 55, 1)
 +
 +FIELD(HFGITR_EL2, ICIALLUIS, 0, 1)
 +FIELD(HFGITR_EL2, ICIALLU, 1, 1)
 +FIELD(HFGITR_EL2, ICIVAU, 2, 1)
 +FIELD(HFGITR_EL2, DCIVAC, 3, 1)
 +FIELD(HFGITR_EL2, DCISW, 4, 1)
 +FIELD(HFGITR_EL2, DCCSW, 5, 1)
 +FIELD(HFGITR_EL2, DCCISW, 6, 1)
 +FIELD(HFGITR_EL2, DCCVAU, 7, 1)
 +FIELD(HFGITR_EL2, DCCVAP, 8, 1)
 +FIELD(HFGITR_EL2, DCCVADP, 9, 1)
 +FIELD(HFGITR_EL2, DCCIVAC, 10, 1)
 +FIELD(HFGITR_EL2, DCZVA, 11, 1)
 +FIELD(HFGITR_EL2, ATS1E1R, 12, 1)
 +FIELD(HFGITR_EL2, ATS1E1W, 13, 1)
 +FIELD(HFGITR_EL2, ATS1E0R, 14, 1)
 +FIELD(HFGITR_EL2, ATS1E0W, 15, 1)
 +FIELD(HFGITR_EL2, ATS1E1RP, 16, 1)
 +FIELD(HFGITR_EL2, ATS1E1WP, 17, 1)
 +FIELD(HFGITR_EL2, TLBIVMALLE1OS, 18, 1)
 +FIELD(HFGITR_EL2, TLBIVAE1OS, 19, 1)
 +FIELD(HFGITR_EL2, TLBIASIDE1OS, 20, 1)
 +FIELD(HFGITR_EL2, TLBIVAAE1OS, 21, 1)
 +FIELD(HFGITR_EL2, TLBIVALE1OS, 22, 1)
 +FIELD(HFGITR_EL2, TLBIVAALE1OS, 23, 1)
 +FIELD(HFGITR_EL2, TLBIRVAE1OS, 24, 1)
 +FIELD(HFGITR_EL2, TLBIRVAAE1OS, 25, 1)
 +FIELD(HFGITR_EL2, TLBIRVALE1OS, 26, 1)
 +FIELD(HFGITR_EL2, TLBIRVAALE1OS, 27, 1)
 +FIELD(HFGITR_EL2, TLBIVMALLE1IS, 28, 1)
 +FIELD(HFGITR_EL2, TLBIVAE1IS, 29, 1)
 +FIELD(HFGITR_EL2, TLBIASIDE1IS, 30, 1)
 +FIELD(HFGITR_EL2, TLBIVAAE1IS, 31, 1)
 +FIELD(HFGITR_EL2, TLBIVALE1IS, 32, 1)
 +FIELD(HFGITR_EL2, TLBIVAALE1IS, 33, 1)
 +FIELD(HFGITR_EL2, TLBIRVAE1IS, 34, 1)
 +FIELD(HFGITR_EL2, TLBIRVAAE1IS, 35, 1)
 +FIELD(HFGITR_EL2, TLBIRVALE1IS, 36, 1)
 +FIELD(HFGITR_EL2, TLBIRVAALE1IS, 37, 1)
 +FIELD(HFGITR_EL2, TLBIRVAE1, 38, 1)
 +FIELD(HFGITR_EL2, TLBIRVAAE1, 39, 1)
 +FIELD(HFGITR_EL2, TLBIRVALE1, 40, 1)
 +FIELD(HFGITR_EL2, TLBIRVAALE1, 41, 1)
 +FIELD(HFGITR_EL2, TLBIVMALLE1, 42, 1)
 +FIELD(HFGITR_EL2, TLBIVAE1, 43, 1)
 +FIELD(HFGITR_EL2, TLBIASIDE1, 44, 1)
 +FIELD(HFGITR_EL2, TLBIVAAE1, 45, 1)
 +FIELD(HFGITR_EL2, TLBIVALE1, 46, 1)
 +FIELD(HFGITR_EL2, TLBIVAALE1, 47, 1)
 +FIELD(HFGITR_EL2, CFPRCTX, 48, 1)
 +FIELD(HFGITR_EL2, DVPRCTX, 49, 1)
 +FIELD(HFGITR_EL2, CPPRCTX, 50, 1)
 +FIELD(HFGITR_EL2, ERET, 51, 1)
 +FIELD(HFGITR_EL2, SVC_EL0, 52, 1)
 +FIELD(HFGITR_EL2, SVC_EL1, 53, 1)
 +FIELD(HFGITR_EL2, DCCVAC, 54, 1)
 +FIELD(HFGITR_EL2, NBRBINJ, 55, 1)
 +FIELD(HFGITR_EL2, NBRBIALL, 56, 1)
 +
 +FIELD(HDFGRTR_EL2, DBGBCRN_EL1, 0, 1)
 +FIELD(HDFGRTR_EL2, DBGBVRN_EL1, 1, 1)
 +FIELD(HDFGRTR_EL2, DBGWCRN_EL1, 2, 1)
 +FIELD(HDFGRTR_EL2, DBGWVRN_EL1, 3, 1)
 +FIELD(HDFGRTR_EL2, MDSCR_EL1, 4, 1)
 +FIELD(HDFGRTR_EL2, DBGCLAIM, 5, 1)
 +FIELD(HDFGRTR_EL2, DBGAUTHSTATUS_EL1, 6, 1)
 +FIELD(HDFGRTR_EL2, DBGPRCR_EL1, 7, 1)
 +/* 8: RES0: OSLAR_EL1 is WO */
 +FIELD(HDFGRTR_EL2, OSLSR_EL1, 9, 1)
 +FIELD(HDFGRTR_EL2, OSECCR_EL1, 10, 1)
 +FIELD(HDFGRTR_EL2, OSDLR_EL1, 11, 1)
 +FIELD(HDFGRTR_EL2, PMEVCNTRN_EL0, 12, 1)
 +FIELD(HDFGRTR_EL2, PMEVTYPERN_EL0, 13, 1)
 +FIELD(HDFGRTR_EL2, PMCCFILTR_EL0, 14, 1)
 +FIELD(HDFGRTR_EL2, PMCCNTR_EL0, 15, 1)
 +FIELD(HDFGRTR_EL2, PMCNTEN, 16, 1)
 +FIELD(HDFGRTR_EL2, PMINTEN, 17, 1)
 +FIELD(HDFGRTR_EL2, PMOVS, 18, 1)
 +FIELD(HDFGRTR_EL2, PMSELR_EL0, 19, 1)
 +/* 20: RES0: PMSWINC_EL0 is WO */
 +/* 21: RES0: PMCR_EL0 is WO */
 +FIELD(HDFGRTR_EL2, PMMIR_EL1, 22, 1)
 +FIELD(HDFGRTR_EL2, PMBLIMITR_EL1, 23, 1)
 +FIELD(HDFGRTR_EL2, PMBPTR_EL1, 24, 1)
 +FIELD(HDFGRTR_EL2, PMBSR_EL1, 25, 1)
 +FIELD(HDFGRTR_EL2, PMSCR_EL1, 26, 1)
 +FIELD(HDFGRTR_EL2, PMSEVFR_EL1, 27, 1)
 +FIELD(HDFGRTR_EL2, PMSFCR_EL1, 28, 1)
 +FIELD(HDFGRTR_EL2, PMSICR_EL1, 29, 1)
 +FIELD(HDFGRTR_EL2, PMSIDR_EL1, 30, 1)
 +FIELD(HDFGRTR_EL2, PMSIRR_EL1, 31, 1)
 +FIELD(HDFGRTR_EL2, PMSLATFR_EL1, 32, 1)
 +FIELD(HDFGRTR_EL2, TRC, 33, 1)
 +FIELD(HDFGRTR_EL2, TRCAUTHSTATUS, 34, 1)
 +FIELD(HDFGRTR_EL2, TRCAUXCTLR, 35, 1)
 +FIELD(HDFGRTR_EL2, TRCCLAIM, 36, 1)
 +FIELD(HDFGRTR_EL2, TRCCNTVRn, 37, 1)
 +/* 38, 39: RES0 */
 +FIELD(HDFGRTR_EL2, TRCID, 40, 1)
 +FIELD(HDFGRTR_EL2, TRCIMSPECN, 41, 1)
 +/* 42: RES0: TRCOSLAR is WO */
 +FIELD(HDFGRTR_EL2, TRCOSLSR, 43, 1)
 +FIELD(HDFGRTR_EL2, TRCPRGCTLR, 44, 1)
 +FIELD(HDFGRTR_EL2, TRCSEQSTR, 45, 1)
 +FIELD(HDFGRTR_EL2, TRCSSCSRN, 46, 1)
 +FIELD(HDFGRTR_EL2, TRCSTATR, 47, 1)
 +FIELD(HDFGRTR_EL2, TRCVICTLR, 48, 1)
 +/* 49: RES0: TRFCR_EL1 is WO */
 +FIELD(HDFGRTR_EL2, TRBBASER_EL1, 50, 1)
 +FIELD(HDFGRTR_EL2, TRBIDR_EL1, 51, 1)
 +FIELD(HDFGRTR_EL2, TRBLIMITR_EL1, 52, 1)
 +FIELD(HDFGRTR_EL2, TRBMAR_EL1, 53, 1)
 +FIELD(HDFGRTR_EL2, TRBPTR_EL1, 54, 1)
 +FIELD(HDFGRTR_EL2, TRBSR_EL1, 55, 1)
 +FIELD(HDFGRTR_EL2, TRBTRG_EL1, 56, 1)
 +FIELD(HDFGRTR_EL2, PMUSERENR_EL0, 57, 1)
 +FIELD(HDFGRTR_EL2, PMCEIDN_EL0, 58, 1)
 +FIELD(HDFGRTR_EL2, NBRBIDR, 59, 1)
 +FIELD(HDFGRTR_EL2, NBRBCTL, 60, 1)
 +FIELD(HDFGRTR_EL2, NBRBDATA, 61, 1)
 +FIELD(HDFGRTR_EL2, NPMSNEVFR_EL1, 62, 1)
 +FIELD(HDFGRTR_EL2, PMBIDR_EL1, 63, 1)
 +
 +/*
 + * These match HDFGRTR_EL2, but bits for RO registers are RES0.
 + * A few bits are for WO registers, where the HDFGRTR_EL2 bit is RES0.
 + */
 +FIELD(HDFGWTR_EL2, DBGBCRN_EL1, 0, 1)
 +FIELD(HDFGWTR_EL2, DBGBVRN_EL1, 1, 1)
 +FIELD(HDFGWTR_EL2, DBGWCRN_EL1, 2, 1)
 +FIELD(HDFGWTR_EL2, DBGWVRN_EL1, 3, 1)
 +FIELD(HDFGWTR_EL2, MDSCR_EL1, 4, 1)
 +FIELD(HDFGWTR_EL2, DBGCLAIM, 5, 1)
 +FIELD(HDFGWTR_EL2, DBGPRCR_EL1, 7, 1)
 +FIELD(HDFGWTR_EL2, OSLAR_EL1, 8, 1)
 +FIELD(HDFGWTR_EL2, OSLSR_EL1, 9, 1)
 +FIELD(HDFGWTR_EL2, OSECCR_EL1, 10, 1)
 +FIELD(HDFGWTR_EL2, OSDLR_EL1, 11, 1)
 +FIELD(HDFGWTR_EL2, PMEVCNTRN_EL0, 12, 1)
 +FIELD(HDFGWTR_EL2, PMEVTYPERN_EL0, 13, 1)
 +FIELD(HDFGWTR_EL2, PMCCFILTR_EL0, 14, 1)
 +FIELD(HDFGWTR_EL2, PMCCNTR_EL0, 15, 1)
 +FIELD(HDFGWTR_EL2, PMCNTEN, 16, 1)
 +FIELD(HDFGWTR_EL2, PMINTEN, 17, 1)
 +FIELD(HDFGWTR_EL2, PMOVS, 18, 1)
 +FIELD(HDFGWTR_EL2, PMSELR_EL0, 19, 1)
 +FIELD(HDFGWTR_EL2, PMSWINC_EL0, 20, 1)
 +FIELD(HDFGWTR_EL2, PMCR_EL0, 21, 1)
 +FIELD(HDFGWTR_EL2, PMBLIMITR_EL1, 23, 1)
 +FIELD(HDFGWTR_EL2, PMBPTR_EL1, 24, 1)
 +FIELD(HDFGWTR_EL2, PMBSR_EL1, 25, 1)
 +FIELD(HDFGWTR_EL2, PMSCR_EL1, 26, 1)
 +FIELD(HDFGWTR_EL2, PMSEVFR_EL1, 27, 1)
 +FIELD(HDFGWTR_EL2, PMSFCR_EL1, 28, 1)
 +FIELD(HDFGWTR_EL2, PMSICR_EL1, 29, 1)
 +FIELD(HDFGWTR_EL2, PMSIRR_EL1, 31, 1)
 +FIELD(HDFGWTR_EL2, PMSLATFR_EL1, 32, 1)
 +FIELD(HDFGWTR_EL2, TRC, 33, 1)
 +FIELD(HDFGWTR_EL2, TRCAUXCTLR, 35, 1)
 +FIELD(HDFGWTR_EL2, TRCCLAIM, 36, 1)
 +FIELD(HDFGWTR_EL2, TRCCNTVRn, 37, 1)
 +FIELD(HDFGWTR_EL2, TRCIMSPECN, 41, 1)
 +FIELD(HDFGWTR_EL2, TRCOSLAR, 42, 1)
 +FIELD(HDFGWTR_EL2, TRCPRGCTLR, 44, 1)
 +FIELD(HDFGWTR_EL2, TRCSEQSTR, 45, 1)
 +FIELD(HDFGWTR_EL2, TRCSSCSRN, 46, 1)
 +FIELD(HDFGWTR_EL2, TRCVICTLR, 48, 1)
 +FIELD(HDFGWTR_EL2, TRFCR_EL1, 49, 1)
 +FIELD(HDFGWTR_EL2, TRBBASER_EL1, 50, 1)
 +FIELD(HDFGWTR_EL2, TRBLIMITR_EL1, 52, 1)
 +FIELD(HDFGWTR_EL2, TRBMAR_EL1, 53, 1)
 +FIELD(HDFGWTR_EL2, TRBPTR_EL1, 54, 1)
 +FIELD(HDFGWTR_EL2, TRBSR_EL1, 55, 1)
 +FIELD(HDFGWTR_EL2, TRBTRG_EL1, 56, 1)
 +FIELD(HDFGWTR_EL2, PMUSERENR_EL0, 57, 1)
 +FIELD(HDFGWTR_EL2, NBRBCTL, 60, 1)
 +FIELD(HDFGWTR_EL2, NBRBDATA, 61, 1)
 +FIELD(HDFGWTR_EL2, NPMSNEVFR_EL1, 62, 1)
 +
  typedef struct ARMCPRegInfo ARMCPRegInfo;
  /*
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          uint64_t disr_el1;
          uint64_t vdisr_el2;
          uint64_t vsesr_el2;
 +
 +        /*
 +         * Fine-Grained Trap registers. We store these as arrays so the
 +         * access checking code doesn't have to manually select
 +         * HFGRTR_EL2 vs HFDFGRTR_EL2 etc when looking up the bit to test.
 +         * FEAT_FGT2 will add more elements to these arrays.
 +         */
 +        uint64_t fgt_read[2]; /* HFGRTR, HDFGRTR */
 +        uint64_t fgt_write[2]; /* HFGWTR, HDFGWTR */
 +        uint64_t fgt_exec[1]; /* HFGITR */
      } cp15;
      struct {
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_tgran64_2(const ARMISARegisters *id)
      return t >= 2 || (t == 0 && isar_feature_aa64_tgran64(id));
  }
 +static inline bool isar_feature_aa64_fgt(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, FGT) != 0;
 +}
 +
  static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
  {
      return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
          if (cpu_isar_feature(aa64_hcx, cpu)) {
              valid_mask |= SCR_HXEN;
          }
 +        if (cpu_isar_feature(aa64_fgt, cpu)) {
 +            valid_mask |= SCR_FGTEN;
 +        }
      } else {
          valid_mask &= ~(SCR_RW | SCR_ST);
          if (cpu_isar_feature(aa32_ras, cpu)) {
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo scxtnum_reginfo[] = {
        .access = PL3_RW,
        .fieldoffset = offsetof(CPUARMState, scxtnum_el[3]) },
  };
 +
 +static CPAccessResult access_fgt(CPUARMState *env, const ARMCPRegInfo *ri,
 +                                 bool isread)
 +{
 +    if (arm_current_el(env) == 2 &&
 +        arm_feature(env, ARM_FEATURE_EL3) && !(env->cp15.scr_el3 & SCR_FGTEN)) {
 +        return CP_ACCESS_TRAP_EL3;
 +    }
 +    return CP_ACCESS_OK;
 +}
 +
 +static const ARMCPRegInfo fgt_reginfo[] = {
 +    { .name = "HFGRTR_EL2", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 4,
 +      .access = PL2_RW, .accessfn = access_fgt,
 +      .fieldoffset = offsetof(CPUARMState, cp15.fgt_read[FGTREG_HFGRTR]) },
 +    { .name = "HFGWTR_EL2", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 5,
 +      .access = PL2_RW, .accessfn = access_fgt,
 +      .fieldoffset = offsetof(CPUARMState, cp15.fgt_write[FGTREG_HFGWTR]) },
 +    { .name = "HDFGRTR_EL2", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 3, .opc1 = 4, .crn = 3, .crm = 1, .opc2 = 4,
 +      .access = PL2_RW, .accessfn = access_fgt,
 +      .fieldoffset = offsetof(CPUARMState, cp15.fgt_read[FGTREG_HDFGRTR]) },
 +    { .name = "HDFGWTR_EL2", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 3, .opc1 = 4, .crn = 3, .crm = 1, .opc2 = 5,
 +      .access = PL2_RW, .accessfn = access_fgt,
 +      .fieldoffset = offsetof(CPUARMState, cp15.fgt_write[FGTREG_HDFGWTR]) },
 +    { .name = "HFGITR_EL2", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 6,
 +      .access = PL2_RW, .accessfn = access_fgt,
 +      .fieldoffset = offsetof(CPUARMState, cp15.fgt_exec[FGTREG_HFGITR]) },
 +};
  #endif /* TARGET_AARCH64 */
  static CPAccessResult access_predinv(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
      if (cpu_isar_feature(aa64_scxtnum, cpu)) {
          define_arm_cp_regs(cpu, scxtnum_reginfo);
      }
 +
 +    if (cpu_isar_feature(aa64_fgt, cpu)) {
 +        define_arm_cp_regs(cpu, fgt_reginfo);
 +    }
  #endif
      if (cpu_isar_feature(any_predinv, cpu)) {
 --
 .34.1

-[PULL 03/26] target/arm: Use neon_element_offset in neon_load/store_reg
+[PULL 19/33] target/arm: Implement FGT trapping infrastructure
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the machinery for fine-grained traps on normal sysregs.
+Any sysreg with a fine-grained trap will set the new field to
-These are the only users of neon_reg_offset, so remove that.
+indicate which FGT register bit it should trap on.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+FGT traps only happen when an AArch64 EL2 enables them for
-Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
+an AArch64 EL1. They therefore are only relevant for AArch32
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+cpregs when the cpreg can be accessed from EL0. The logic
 in access_check_cp_reg() will check this, so it is safe to
 add a .fgt marking to an ARM_CP_STATE_BOTH ARMCPRegInfo.
 The DO_BIT and DO_REV_BIT macros define enum constants FGT_##bitname
 which can be used to specify the FGT bit, eg
    .fgt = FGT_AFSR0_EL1
 (We assume that there is no bit name duplication across the FGT
 registers, for brevity's sake.)
 Subsequent commits will add the .fgt fields to the relevant register
 definitions and define the FGT_nnn values for them.
 Note that some of the FGT traps are for instructions that we don't
 handle via the cpregs mechanisms (mostly these are instruction traps).
 Those we will have to handle separately.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Message-id: 20230130182459.3309057-10-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-10-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 14 ++------------
+ target/arm/cpregs.h        | 72 ++++++++++++++++++++++++++++++++++++++
-file changed, 2 insertions(+), 12 deletions(-)
+ target/arm/cpu.h           |  1 +
+ target/arm/internals.h     | 20 +++++++++++
  target/arm/translate.h     |  2 ++
  target/arm/helper.c        |  9 +++++
  target/arm/op_helper.c     | 30 ++++++++++++++++
  target/arm/translate-a64.c |  3 +-
  target/arm/translate.c     |  2 ++
 files changed, 138 insertions(+), 1 deletion(-)
 diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpregs.h
 +++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ FIELD(HDFGWTR_EL2, NBRBCTL, 60, 1)
  FIELD(HDFGWTR_EL2, NBRBDATA, 61, 1)
  FIELD(HDFGWTR_EL2, NPMSNEVFR_EL1, 62, 1)
 +/* Which fine-grained trap bit register to check, if any */
 +FIELD(FGT, TYPE, 10, 3)
 +FIELD(FGT, REV, 9, 1) /* Is bit sense reversed? */
 +FIELD(FGT, IDX, 6, 3) /* Index within a uint64_t[] array */
 +FIELD(FGT, BITPOS, 0, 6) /* Bit position within the uint64_t */
 +
 +/*
 + * Macros to define FGT_##bitname enum constants to use in ARMCPRegInfo::fgt
 + * fields. We assume for brevity's sake that there are no duplicated
 + * bit names across the various FGT registers.
 + */
 +#define DO_BIT(REG, BITNAME)                                    \
 +    FGT_##BITNAME = FGT_##REG | R_##REG##_EL2_##BITNAME##_SHIFT
 +
 +/* Some bits have reversed sense, so 0 means trap and 1 means not */
 +#define DO_REV_BIT(REG, BITNAME)                                        \
 +    FGT_##BITNAME = FGT_##REG | FGT_REV | R_##REG##_EL2_##BITNAME##_SHIFT
 +
 +typedef enum FGTBit {
 +    /*
 +     * These bits tell us which register arrays to use:
 +     * if FGT_R is set then reads are checked against fgt_read[];
 +     * if FGT_W is set then writes are checked against fgt_write[];
 +     * if FGT_EXEC is set then all accesses are checked against fgt_exec[].
 +     *
 +     * For almost all bits in the R/W register pairs, the bit exists in
 +     * both registers for a RW register, in HFGRTR/HDFGRTR for a RO register
 +     * with the corresponding HFGWTR/HDFGTWTR bit being RES0, and vice-versa
 +     * for a WO register. There are unfortunately a couple of exceptions
 +     * (PMCR_EL0, TRFCR_EL1) where the register being trapped is RW but
 +     * the FGT system only allows trapping of writes, not reads.
 +     *
 +     * Note that we arrange these bits so that a 0 FGTBit means "no trap".
 +     */
 +    FGT_R = 1 << R_FGT_TYPE_SHIFT,
 +    FGT_W = 2 << R_FGT_TYPE_SHIFT,
 +    FGT_EXEC = 4 << R_FGT_TYPE_SHIFT,
 +    FGT_RW = FGT_R | FGT_W,
 +    /* Bit to identify whether trap bit is reversed sense */
 +    FGT_REV = R_FGT_REV_MASK,
 +
 +    /*
 +     * If a bit exists in HFGRTR/HDFGRTR then either the register being
 +     * trapped is RO or the bit also exists in HFGWTR/HDFGWTR, so we either
 +     * want to trap for both reads and writes or else it's harmless to mark
 +     * it as trap-on-writes.
 +     * If a bit exists only in HFGWTR/HDFGWTR then either the register being
 +     * trapped is WO, or else it is one of the two oddball special cases
 +     * which are RW but have only a write trap. We mark these as only
 +     * FGT_W so we get the right behaviour for those special cases.
 +     * (If a bit was added in future that provided only a read trap for an
 +     * RW register we'd need to do something special to get the FGT_R bit
 +     * only. But this seems unlikely to happen.)
 +     *
 +     * So for the DO_BIT/DO_REV_BIT macros: use FGT_HFGRTR/FGT_HDFGRTR if
 +     * the bit exists in that register. Otherwise use FGT_HFGWTR/FGT_HDFGWTR.
 +     */
 +    FGT_HFGRTR = FGT_RW | (FGTREG_HFGRTR << R_FGT_IDX_SHIFT),
 +    FGT_HFGWTR = FGT_W | (FGTREG_HFGWTR << R_FGT_IDX_SHIFT),
 +    FGT_HDFGRTR = FGT_RW | (FGTREG_HDFGRTR << R_FGT_IDX_SHIFT),
 +    FGT_HDFGWTR = FGT_W | (FGTREG_HDFGWTR << R_FGT_IDX_SHIFT),
 +    FGT_HFGITR = FGT_EXEC | (FGTREG_HFGITR << R_FGT_IDX_SHIFT),
 +} FGTBit;
 +
 +#undef DO_BIT
 +#undef DO_REV_BIT
 +
  typedef struct ARMCPRegInfo ARMCPRegInfo;
  /*
@@ -XXX,XX +XXX,XX @@ struct ARMCPRegInfo {
      CPAccessRights access;
      /* Security state: ARM_CP_SECSTATE_* bits/values */
      CPSecureState secure;
 +    /*
 +     * Which fine-grained trap register bit to check, if any. This
 +     * value encodes both the trap register and bit within it.
 +     */
 +    FGTBit fgt;
      /*
       * The opaque pointer passed to define_arm_cp_regs_with_opaque() when
       * this register was defined: can be used to hand data through to the
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_ANY, FPEXC_EL, 8, 2)
  /* Memory operations require alignment: SCTLR_ELx.A or CCR.UNALIGN_TRP */
  FIELD(TBFLAG_ANY, ALIGN_MEM, 10, 1)
  FIELD(TBFLAG_ANY, PSTATE__IL, 11, 1)
 +FIELD(TBFLAG_ANY, FGT_ACTIVE, 12, 1)
  /*
   * Bit usage when in AArch32 state, both A- and M-profile.
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/internals.h
 +++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t arm_mdcr_el2_eff(CPUARMState *env)
      ((1 << (1 - 1)) | (1 << (2 - 1)) |                  \
       (1 << (4 - 1)) | (1 << (8 - 1)) | (1 << (16 - 1)))
 +/*
 + * Return true if it is possible to take a fine-grained-trap to EL2.
 + */
 +static inline bool arm_fgt_active(CPUARMState *env, int el)
 +{
 +    /*
 +     * The Arm ARM only requires the "{E2H,TGE} != {1,1}" test for traps
 +     * that can affect EL0, but it is harmless to do the test also for
 +     * traps on registers that are only accessible at EL1 because if the test
 +     * returns true then we can't be executing at EL1 anyway.
 +     * FGT traps only happen when EL2 is enabled and EL1 is AArch64;
 +     * traps from AArch32 only happen for the EL0 is AArch32 case.
 +     */
 +    return cpu_isar_feature(aa64_fgt, env_archcpu(env)) &&
 +        el < 2 && arm_is_el2_enabled(env) &&
 +        arm_el_is_aa64(env, 1) &&
 +        (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE) &&
 +        (!arm_feature(env, ARM_FEATURE_EL3) || (env->cp15.scr_el3 & SCR_FGTEN));
 +}
 +
  #endif
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
      bool is_nonstreaming;
      /* True if MVE insns are definitely not predicated by VPR or LTPSIZE */
      bool mve_no_pred;
 +    /* True if fine-grained traps are active */
 +    bool fgt_active;
      /*
       * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
       *  < 0, set by the current instruction.
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_common(CPUARMState *env, int fp_el,
      if (arm_singlestep_active(env)) {
          DP_TBFLAG_ANY(flags, SS_ACTIVE, 1);
      }
 +
      return flags;
  }
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el,
          DP_TBFLAG_A32(flags, HSTR_ACTIVE, 1);
      }
 +    if (arm_fgt_active(env, el)) {
 +        DP_TBFLAG_ANY(flags, FGT_ACTIVE, 1);
 +    }
 +
      if (env->uncached_cpsr & CPSR_IL) {
          DP_TBFLAG_ANY(flags, PSTATE__IL, 1);
      }
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
          DP_TBFLAG_ANY(flags, PSTATE__IL, 1);
      }
 +    if (arm_fgt_active(env, el)) {
 +        DP_TBFLAG_ANY(flags, FGT_ACTIVE, 1);
 +    }
 +
      if (cpu_isar_feature(aa64_mte, env_archcpu(env))) {
          /*
           * Set MTE_ACTIVE if any access may be Checked, and leave clear
 diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/op_helper.c
 +++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, uint32_t key,
          }
      }
 +    /*
 +     * Fine-grained traps also are lower priority than undef-to-EL1,
 +     * higher priority than trap-to-EL3, and we don't care about priority
 +     * order with other EL2 traps because the syndrome value is the same.
 +     */
 +    if (arm_fgt_active(env, arm_current_el(env))) {
 +        uint64_t trapword = 0;
 +        unsigned int idx = FIELD_EX32(ri->fgt, FGT, IDX);
 +        unsigned int bitpos = FIELD_EX32(ri->fgt, FGT, BITPOS);
 +        bool rev = FIELD_EX32(ri->fgt, FGT, REV);
 +        bool trapbit;
 +
 +        if (ri->fgt & FGT_EXEC) {
 +            assert(idx < ARRAY_SIZE(env->cp15.fgt_exec));
 +            trapword = env->cp15.fgt_exec[idx];
 +        } else if (isread && (ri->fgt & FGT_R)) {
 +            assert(idx < ARRAY_SIZE(env->cp15.fgt_read));
 +            trapword = env->cp15.fgt_read[idx];
 +        } else if (!isread && (ri->fgt & FGT_W)) {
 +            assert(idx < ARRAY_SIZE(env->cp15.fgt_write));
 +            trapword = env->cp15.fgt_write[idx];
 +        }
 +
 +        trapbit = extract64(trapword, bitpos, 1);
 +        if (trapbit != rev) {
 +            res = CP_ACCESS_TRAP_EL2;
 +            goto fail;
 +        }
 +    }
 +
      if (likely(res == CP_ACCESS_OK)) {
          return ri;
      }
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
          return;
      }
 -    if (ri->accessfn) {
 +    if (ri->accessfn || (ri->fgt && s->fgt_active)) {
          /* Emit code to perform further access permissions checks at
           * runtime; this may result in an exception.
           */
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
      dc->fp_excp_el = EX_TBFLAG_ANY(tb_flags, FPEXC_EL);
      dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
      dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
 +    dc->fgt_active = EX_TBFLAG_ANY(tb_flags, FGT_ACTIVE);
      dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL);
      dc->sme_excp_el = EX_TBFLAG_A64(tb_flags, SMEEXC_EL);
      dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16;
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
+@@ -XXX,XX +XXX,XX @@ static void do_coproc_insn(DisasContext *s, int cpnum, int is64,
      }
- }
+     if ((s->hstr_active && s->current_el == 0) || ri->accessfn ||
--/* Return the offset of a 32-bit piece of a NEON register.
++        (ri->fgt && s->fgt_active) ||
--   zero is the least significant end of the register.  */
+         (arm_dc_feature(s, ARM_FEATURE_XSCALE) && cpnum < 14)) {
--static inline long
+         /*
--neon_reg_offset (int reg, int n)
+          * Emit code to perform further access permissions checks at
--{
+@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
--    int sreg;
+     dc->fp_excp_el = EX_TBFLAG_ANY(tb_flags, FPEXC_EL);
--    sreg = reg * 2 + n;
+     dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
--    return vfp_reg_offset(0, sreg);
+     dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
--}
++    dc->fgt_active = EX_TBFLAG_ANY(tb_flags, FGT_ACTIVE);
--
- static TCGv_i32 neon_load_reg(int reg, int pass)
+     if (arm_feature(env, ARM_FEATURE_M)) {
- {
+         dc->vfp_enabled = 1;
      TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
 +    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
      return tmp;
  }
  static void neon_store_reg(int reg, int pass, TCGv_i32 var)
  {
 -    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
 +    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
      tcg_temp_free_i32(var);
  }
 --
-.20.1
+.34.1

-New patch
+[PULL 20/33] target/arm: Mark up sysregs for HFGRTR bits 0..11
+Mark up the sysreg definitions for the registers trapped
+by HFGRTR/HFGWTR bits 0..11.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Message-id: 20230130182459.3309057-11-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-11-peter.maydell@linaro.org
+---
+ target/arm/cpregs.h | 14 ++++++++++++++
+ target/arm/helper.c | 17 +++++++++++++++++
+files changed, 31 insertions(+)
+diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpregs.h
++++ b/target/arm/cpregs.h
+@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
+     FGT_HDFGRTR = FGT_RW | (FGTREG_HDFGRTR << R_FGT_IDX_SHIFT),
+     FGT_HDFGWTR = FGT_W | (FGTREG_HDFGWTR << R_FGT_IDX_SHIFT),
+     FGT_HFGITR = FGT_EXEC | (FGTREG_HFGITR << R_FGT_IDX_SHIFT),
++
++    /* Trap bits in HFGRTR_EL2 / HFGWTR_EL2, starting from bit 0. */
++    DO_BIT(HFGRTR, AFSR0_EL1),
++    DO_BIT(HFGRTR, AFSR1_EL1),
++    DO_BIT(HFGRTR, AIDR_EL1),
++    DO_BIT(HFGRTR, AMAIR_EL1),
++    DO_BIT(HFGRTR, APDAKEY),
++    DO_BIT(HFGRTR, APDBKEY),
++    DO_BIT(HFGRTR, APGAKEY),
++    DO_BIT(HFGRTR, APIAKEY),
++    DO_BIT(HFGRTR, APIBKEY),
++    DO_BIT(HFGRTR, CCSIDR_EL1),
++    DO_BIT(HFGRTR, CLIDR_EL1),
++    DO_BIT(HFGRTR, CONTEXTIDR_EL1),
+ } FGTBit;
+ #undef DO_BIT
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo cp_reginfo[] = {
+     { .name = "CONTEXTIDR_EL1", .state = ARM_CP_STATE_BOTH,
+       .opc0 = 3, .opc1 = 0, .crn = 13, .crm = 0, .opc2 = 1,
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
++      .fgt = FGT_CONTEXTIDR_EL1,
+       .secure = ARM_CP_SECSTATE_NS,
+       .fieldoffset = offsetof(CPUARMState, cp15.contextidr_el[1]),
+       .resetvalue = 0, .writefn = contextidr_write, .raw_writefn = raw_write, },
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
+       .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 0,
+       .access = PL1_R,
+       .accessfn = access_tid4,
++      .fgt = FGT_CCSIDR_EL1,
+       .readfn = ccsidr_read, .type = ARM_CP_NO_RAW },
+     { .name = "CSSELR", .state = ARM_CP_STATE_BOTH,
+       .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 2, .opc2 = 0,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
+       .opc0 = 3, .opc1 = 1, .crn = 0, .crm = 0, .opc2 = 7,
+       .access = PL1_R, .type = ARM_CP_CONST,
+       .accessfn = access_aa64_tid1,
++      .fgt = FGT_AIDR_EL1,
+       .resetvalue = 0 },
+     /*
+      * Auxiliary fault status registers: these also are IMPDEF, and we
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
+     { .name = "AFSR0_EL1", .state = ARM_CP_STATE_BOTH,
+       .opc0 = 3, .opc1 = 0, .crn = 5, .crm = 1, .opc2 = 0,
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
++      .fgt = FGT_AFSR0_EL1,
+       .type = ARM_CP_CONST, .resetvalue = 0 },
+     { .name = "AFSR1_EL1", .state = ARM_CP_STATE_BOTH,
+       .opc0 = 3, .opc1 = 0, .crn = 5, .crm = 1, .opc2 = 1,
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
++      .fgt = FGT_AFSR1_EL1,
+       .type = ARM_CP_CONST, .resetvalue = 0 },
+     /*
+      * MAIR can just read-as-written because we don't implement caches
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lpae_cp_reginfo[] = {
+     { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH,
+       .opc0 = 3, .crn = 10, .crm = 3, .opc1 = 0, .opc2 = 0,
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
++      .fgt = FGT_AMAIR_EL1,
+       .type = ARM_CP_CONST, .resetvalue = 0 },
+     /* AMAIR1 is mapped to AMAIR_EL1[63:32] */
+     { .name = "AMAIR1", .cp = 15, .crn = 10, .crm = 3, .opc1 = 0, .opc2 = 1,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo pauth_reginfo[] = {
+     { .name = "APDAKEYLO_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 0,
+       .access = PL1_RW, .accessfn = access_pauth,
++      .fgt = FGT_APDAKEY,
+       .fieldoffset = offsetof(CPUARMState, keys.apda.lo) },
+     { .name = "APDAKEYHI_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 1,
+       .access = PL1_RW, .accessfn = access_pauth,
++      .fgt = FGT_APDAKEY,
+       .fieldoffset = offsetof(CPUARMState, keys.apda.hi) },
+     { .name = "APDBKEYLO_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 2,
+       .access = PL1_RW, .accessfn = access_pauth,
++      .fgt = FGT_APDBKEY,
+       .fieldoffset = offsetof(CPUARMState, keys.apdb.lo) },
+     { .name = "APDBKEYHI_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 3,
+       .access = PL1_RW, .accessfn = access_pauth,
++      .fgt = FGT_APDBKEY,
+       .fieldoffset = offsetof(CPUARMState, keys.apdb.hi) },
+     { .name = "APGAKEYLO_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 3, .opc2 = 0,
+       .access = PL1_RW, .accessfn = access_pauth,
++      .fgt = FGT_APGAKEY,
+       .fieldoffset = offsetof(CPUARMState, keys.apga.lo) },
+     { .name = "APGAKEYHI_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 3, .opc2 = 1,
+       .access = PL1_RW, .accessfn = access_pauth,
++      .fgt = FGT_APGAKEY,
+       .fieldoffset = offsetof(CPUARMState, keys.apga.hi) },
+     { .name = "APIAKEYLO_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 0,
+       .access = PL1_RW, .accessfn = access_pauth,
++      .fgt = FGT_APIAKEY,
+       .fieldoffset = offsetof(CPUARMState, keys.apia.lo) },
+     { .name = "APIAKEYHI_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 1,
+       .access = PL1_RW, .accessfn = access_pauth,
++      .fgt = FGT_APIAKEY,
+       .fieldoffset = offsetof(CPUARMState, keys.apia.hi) },
+     { .name = "APIBKEYLO_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 2,
+       .access = PL1_RW, .accessfn = access_pauth,
++      .fgt = FGT_APIBKEY,
+       .fieldoffset = offsetof(CPUARMState, keys.apib.lo) },
+     { .name = "APIBKEYHI_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 3,
+       .access = PL1_RW, .accessfn = access_pauth,
++      .fgt = FGT_APIBKEY,
+       .fieldoffset = offsetof(CPUARMState, keys.apib.hi) },
+ };
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+             .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 1,
+             .access = PL1_R, .type = ARM_CP_CONST,
+             .accessfn = access_tid4,
++            .fgt = FGT_CLIDR_EL1,
+             .resetvalue = cpu->clidr
+         };
+         define_one_arm_cp_reg(cpu, &clidr);
+--
+.34.1

-[PULL 15/26] target/arm: fix LORID_EL1 access check
+[PULL 21/33] target/arm: Mark up sysregs for HFGRTR bits 12..23
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+Mark up the sysreg definitions for the registers trapped
 by HFGRTR/HFGWTR bits 12..23.
-Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-future HCR_EL2.TLOR when S-EL2 is enabled.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Tested-by: Fuad Tabba <tabba@google.com>
 Message-id: 20230130182459.3309057-12-peter.maydell@linaro.org
 Message-id: 20230127175507.2895013-12-peter.maydell@linaro.org
 ---
  target/arm/cpregs.h | 12 ++++++++++++
  target/arm/helper.c | 12 ++++++++++++
 files changed, 24 insertions(+)
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+index XXXXXXX..XXXXXXX 100644
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+--- a/target/arm/cpregs.h
----
++++ b/target/arm/cpregs.h
- target/arm/helper.c | 19 +++++--------------
+@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
-file changed, 5 insertions(+), 14 deletions(-)
+     DO_BIT(HFGRTR, CCSIDR_EL1),
+     DO_BIT(HFGRTR, CLIDR_EL1),
      DO_BIT(HFGRTR, CONTEXTIDR_EL1),
 +    DO_BIT(HFGRTR, CPACR_EL1),
 +    DO_BIT(HFGRTR, CSSELR_EL1),
 +    DO_BIT(HFGRTR, CTR_EL0),
 +    DO_BIT(HFGRTR, DCZID_EL0),
 +    DO_BIT(HFGRTR, ESR_EL1),
 +    DO_BIT(HFGRTR, FAR_EL1),
 +    DO_BIT(HFGRTR, ISR_EL1),
 +    DO_BIT(HFGRTR, LORC_EL1),
 +    DO_BIT(HFGRTR, LOREA_EL1),
 +    DO_BIT(HFGRTR, LORID_EL1),
 +    DO_BIT(HFGRTR, LORN_EL1),
 +    DO_BIT(HFGRTR, LORSA_EL1),
  } FGTBit;
  #undef DO_BIT
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
- #endif
+       .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0, },
+     { .name = "CPACR", .state = ARM_CP_STATE_BOTH, .opc0 = 3,
- /* Shared logic between LORID and the rest of the LOR* registers.
+       .crn = 1, .crm = 0, .opc1 = 0, .opc2 = 2, .accessfn = cpacr_access,
-- * Secure state has already been delt with.
++      .fgt = FGT_CPACR_EL1,
-+ * Secure state exclusion has already been dealt with.
+       .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.cpacr_el1),
-  */
+       .resetfn = cpacr_reset, .writefn = cpacr_write, .readfn = cpacr_read },
--static CPAccessResult access_lor_ns(CPUARMState *env)
+ };
-+static CPAccessResult access_lor_ns(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
-+                                    const ARMCPRegInfo *ri, bool isread)
+       .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 2, .opc2 = 0,
- {
+       .access = PL1_RW,
-     int el = arm_current_el(env);
+       .accessfn = access_tid4,
++      .fgt = FGT_CSSELR_EL1,
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
+       .writefn = csselr_write, .resetvalue = 0,
-     return CP_ACCESS_OK;
+       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.csselr_s),
- }
+                              offsetof(CPUARMState, cp15.csselr_ns) } },
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
--static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
+       .resetfn = arm_cp_reset_ignore },
--                                   bool isread)
+     { .name = "ISR_EL1", .state = ARM_CP_STATE_BOTH,
--{
+       .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 1, .opc2 = 0,
--    if (arm_is_secure_below_el3(env)) {
++      .fgt = FGT_ISR_EL1,
--        /* Access ok in secure mode.  */
+       .type = ARM_CP_NO_RAW, .access = PL1_R, .readfn = isr_read },
--        return CP_ACCESS_OK;
+     /* 32 bit ITLB invalidates */
--    }
+     { .name = "ITLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 0,
--    return access_lor_ns(env);
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_pmsa_cp_reginfo[] = {
--}
+     { .name = "FAR_EL1", .state = ARM_CP_STATE_AA64,
--
+       .opc0 = 3, .crn = 6, .crm = 0, .opc1 = 0, .opc2 = 0,
- static CPAccessResult access_lor_other(CPUARMState *env,
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
-                                        const ARMCPRegInfo *ri, bool isread)
++      .fgt = FGT_FAR_EL1,
- {
+       .fieldoffset = offsetof(CPUARMState, cp15.far_el[1]),
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
+       .resetvalue = 0, },
-         /* Access denied in secure mode.  */
+ };
-         return CP_ACCESS_TRAP;
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
-     }
+     { .name = "ESR_EL1", .state = ARM_CP_STATE_AA64,
--    return access_lor_ns(env);
+       .opc0 = 3, .crn = 5, .crm = 2, .opc1 = 0, .opc2 = 0,
-+    return access_lor_ns(env, ri, isread);
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
- }
++      .fgt = FGT_ESR_EL1,
+       .fieldoffset = offsetof(CPUARMState, cp15.esr_el[1]), .resetvalue = 0, },
- /*
+     { .name = "TTBR0_EL1", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
      { .name = "DCZID_EL0", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .opc2 = 7, .crn = 0, .crm = 0,
        .access = PL0_R, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_DCZID_EL0,
        .readfn = aa64_dczid_read },
      { .name = "DC_ZVA", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 4, .opc2 = 1,
 @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
+     { .name = "LORSA_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 0,
+       .access = PL1_RW, .accessfn = access_lor_other,
++      .fgt = FGT_LORSA_EL1,
+       .type = ARM_CP_CONST, .resetvalue = 0 },
+     { .name = "LOREA_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 1,
+       .access = PL1_RW, .accessfn = access_lor_other,
++      .fgt = FGT_LOREA_EL1,
+       .type = ARM_CP_CONST, .resetvalue = 0 },
+     { .name = "LORN_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 2,
+       .access = PL1_RW, .accessfn = access_lor_other,
++      .fgt = FGT_LORN_EL1,
+       .type = ARM_CP_CONST, .resetvalue = 0 },
+     { .name = "LORC_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 3,
+       .access = PL1_RW, .accessfn = access_lor_other,
++      .fgt = FGT_LORC_EL1,
        .type = ARM_CP_CONST, .resetvalue = 0 },
      { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
--      .access = PL1_R, .accessfn = access_lorid,
+       .access = PL1_R, .accessfn = access_lor_ns,
-+      .access = PL1_R, .accessfn = access_lor_ns,
++      .fgt = FGT_LORID_EL1,
        .type = ARM_CP_CONST, .resetvalue = 0 },
-     REGINFO_SENTINEL
  };
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+             { .name = "CTR_EL0", .state = ARM_CP_STATE_AA64,
+               .opc0 = 3, .opc1 = 3, .opc2 = 1, .crn = 0, .crm = 0,
+               .access = PL0_R, .accessfn = ctr_el0_access,
++              .fgt = FGT_CTR_EL0,
+               .type = ARM_CP_CONST, .resetvalue = cpu->ctr },
+             /* TCMTR and TLBTR exist in v8 but have no 64-bit versions */
+             { .name = "TCMTR",
 --
-.20.1
+.34.1

-New patch
+[PULL 22/33] target/arm: Mark up sysregs for HFGRTR bits 24..35
+Mark up the sysreg definitions for the registers trapped
+by HFGRTR/HFGWTR bits 24..35.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Message-id: 20230130182459.3309057-13-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-13-peter.maydell@linaro.org
+---
+ target/arm/cpregs.h | 12 ++++++++++++
+ target/arm/helper.c | 14 ++++++++++++++
+files changed, 26 insertions(+)
+diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpregs.h
++++ b/target/arm/cpregs.h
+@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
+     DO_BIT(HFGRTR, LORID_EL1),
+     DO_BIT(HFGRTR, LORN_EL1),
+     DO_BIT(HFGRTR, LORSA_EL1),
++    DO_BIT(HFGRTR, MAIR_EL1),
++    DO_BIT(HFGRTR, MIDR_EL1),
++    DO_BIT(HFGRTR, MPIDR_EL1),
++    DO_BIT(HFGRTR, PAR_EL1),
++    DO_BIT(HFGRTR, REVIDR_EL1),
++    DO_BIT(HFGRTR, SCTLR_EL1),
++    DO_BIT(HFGRTR, SCXTNUM_EL1),
++    DO_BIT(HFGRTR, SCXTNUM_EL0),
++    DO_BIT(HFGRTR, TCR_EL1),
++    DO_BIT(HFGRTR, TPIDR_EL1),
++    DO_BIT(HFGRTR, TPIDRRO_EL0),
++    DO_BIT(HFGRTR, TPIDR_EL0),
+ } FGTBit;
+ #undef DO_BIT
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
+     { .name = "MAIR_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 0,
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
++      .fgt = FGT_MAIR_EL1,
+       .fieldoffset = offsetof(CPUARMState, cp15.mair_el[1]),
+       .resetvalue = 0 },
+     { .name = "MAIR_EL3", .state = ARM_CP_STATE_AA64,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6k_cp_reginfo[] = {
+     { .name = "TPIDR_EL0", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 3, .opc2 = 2, .crn = 13, .crm = 0,
+       .access = PL0_RW,
++      .fgt = FGT_TPIDR_EL0,
+       .fieldoffset = offsetof(CPUARMState, cp15.tpidr_el[0]), .resetvalue = 0 },
+     { .name = "TPIDRURW", .cp = 15, .crn = 13, .crm = 0, .opc1 = 0, .opc2 = 2,
+       .access = PL0_RW,
++      .fgt = FGT_TPIDR_EL0,
+       .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.tpidrurw_s),
+                              offsetoflow32(CPUARMState, cp15.tpidrurw_ns) },
+       .resetfn = arm_cp_reset_ignore },
+     { .name = "TPIDRRO_EL0", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 3, .opc2 = 3, .crn = 13, .crm = 0,
+       .access = PL0_R | PL1_W,
++      .fgt = FGT_TPIDRRO_EL0,
+       .fieldoffset = offsetof(CPUARMState, cp15.tpidrro_el[0]),
+       .resetvalue = 0},
+     { .name = "TPIDRURO", .cp = 15, .crn = 13, .crm = 0, .opc1 = 0, .opc2 = 3,
+       .access = PL0_R | PL1_W,
++      .fgt = FGT_TPIDRRO_EL0,
+       .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.tpidruro_s),
+                              offsetoflow32(CPUARMState, cp15.tpidruro_ns) },
+       .resetfn = arm_cp_reset_ignore },
+     { .name = "TPIDR_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .opc2 = 4, .crn = 13, .crm = 0,
+       .access = PL1_RW,
++      .fgt = FGT_TPIDR_EL1,
+       .fieldoffset = offsetof(CPUARMState, cp15.tpidr_el[1]), .resetvalue = 0 },
+     { .name = "TPIDRPRW", .opc1 = 0, .cp = 15, .crn = 13, .crm = 0, .opc2 = 4,
+       .access = PL1_RW,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
+     { .name = "TCR_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .crn = 2, .crm = 0, .opc1 = 0, .opc2 = 2,
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
++      .fgt = FGT_TCR_EL1,
+       .writefn = vmsa_tcr_el12_write,
+       .raw_writefn = raw_write,
+       .resetvalue = 0,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
+       .type = ARM_CP_ALIAS,
+       .opc0 = 3, .opc1 = 0, .crn = 7, .crm = 4, .opc2 = 0,
+       .access = PL1_RW, .resetvalue = 0,
++      .fgt = FGT_PAR_EL1,
+       .fieldoffset = offsetof(CPUARMState, cp15.par_el[1]),
+       .writefn = par_write },
+ #endif
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo scxtnum_reginfo[] = {
+     { .name = "SCXTNUM_EL0", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 3, .crn = 13, .crm = 0, .opc2 = 7,
+       .access = PL0_RW, .accessfn = access_scxtnum,
++      .fgt = FGT_SCXTNUM_EL0,
+       .fieldoffset = offsetof(CPUARMState, scxtnum_el[0]) },
+     { .name = "SCXTNUM_EL1", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 0, .crn = 13, .crm = 0, .opc2 = 7,
+       .access = PL1_RW, .accessfn = access_scxtnum,
++      .fgt = FGT_SCXTNUM_EL1,
+       .fieldoffset = offsetof(CPUARMState, scxtnum_el[1]) },
+     { .name = "SCXTNUM_EL2", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 4, .crn = 13, .crm = 0, .opc2 = 7,
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+             { .name = "MIDR_EL1", .state = ARM_CP_STATE_BOTH,
+               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 0, .opc2 = 0,
+               .access = PL1_R, .type = ARM_CP_NO_RAW, .resetvalue = cpu->midr,
++              .fgt = FGT_MIDR_EL1,
+               .fieldoffset = offsetof(CPUARMState, cp15.c0_cpuid),
+               .readfn = midr_read },
+             /* crn = 0 op1 = 0 crm = 0 op2 = 7 : AArch32 aliases of MIDR */
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 0, .opc2 = 6,
+               .access = PL1_R,
+               .accessfn = access_aa64_tid1,
++              .fgt = FGT_REVIDR_EL1,
+               .type = ARM_CP_CONST, .resetvalue = cpu->revidr },
+         };
+         ARMCPRegInfo id_v8_midr_alias_cp_reginfo = {
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+         ARMCPRegInfo mpidr_cp_reginfo[] = {
+             { .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH,
+               .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5,
++              .fgt = FGT_MPIDR_EL1,
+               .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW },
+         };
+ #ifdef CONFIG_USER_ONLY
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+             .name = "SCTLR", .state = ARM_CP_STATE_BOTH,
+             .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 0,
+             .access = PL1_RW, .accessfn = access_tvm_trvm,
++            .fgt = FGT_SCTLR_EL1,
+             .bank_fieldoffsets = { offsetof(CPUARMState, cp15.sctlr_s),
+                                    offsetof(CPUARMState, cp15.sctlr_ns) },
+             .writefn = sctlr_write, .resetvalue = cpu->reset_sctlr,
+--
+.34.1

-[PULL 23/26] hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
+[PULL 23/33] target/arm: Mark up sysregs for HFGRTR bits 36..63
-In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
+Mark up the sysreg definitions for the registers trapped
-into the GICv3CPUState struct's maintenance_irq field.  This will
+by HFGRTR/HFGWTR bits 36..63.
 only work if the board happens to have already wired up the CPU
 maintenance IRQ before the GIC was realized.  Unfortunately this is
 not the case for the 'virt' board, and so the value that gets copied
 is NULL (since a qemu_irq is really a pointer to an IRQState struct
 under the hood).  The effect is that the CPU interface code never
 actually raises the maintenance interrupt line.
-Instead, since the GICv3CPUState has a pointer to the CPUState, make
+Of these, some correspond to RAS registers which we implement as
-the dereference at the point where we want to raise the interrupt, to
+always-UNDEF: these don't need any extra handling for FGT because the
-avoid an implicit requirement on board code to wire things up in a
+UNDEF-to-EL1 always takes priority over any theoretical
-particular order.
+FGT-trap-to-EL2.
-Reported-by: Jose Martins <josemartins90@gmail.com>
+Bit 50 (NACCDATA_EL1) is for the ACCDATA_EL1 register which is part
 of the FEAT_LS64_ACCDATA feature which we don't yet implement.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Luc Michel <luc@lmichel.fr>
+Tested-by: Fuad Tabba <tabba@google.com>
 Message-id: 20230130182459.3309057-14-peter.maydell@linaro.org
 Message-id: 20230127175507.2895013-14-peter.maydell@linaro.org
 ---
- include/hw/intc/arm_gicv3_common.h | 1 -
+ target/arm/cpregs.h       |  7 +++++++
- hw/intc/arm_gicv3_cpuif.c          | 5 ++---
+ hw/intc/arm_gicv3_cpuif.c |  2 ++
-files changed, 2 insertions(+), 4 deletions(-)
+ target/arm/helper.c       | 10 ++++++++++
 files changed, 19 insertions(+)
-diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
+diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/intc/arm_gicv3_common.h
+--- a/target/arm/cpregs.h
-+++ b/include/hw/intc/arm_gicv3_common.h
++++ b/target/arm/cpregs.h
-@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
+@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
-     qemu_irq parent_fiq;
+     DO_BIT(HFGRTR, TPIDR_EL1),
-     qemu_irq parent_virq;
+     DO_BIT(HFGRTR, TPIDRRO_EL0),
-     qemu_irq parent_vfiq;
+     DO_BIT(HFGRTR, TPIDR_EL0),
--    qemu_irq maintenance_irq;
++    DO_BIT(HFGRTR, TTBR0_EL1),
++    DO_BIT(HFGRTR, TTBR1_EL1),
-     /* Redistributor */
++    DO_BIT(HFGRTR, VBAR_EL1),
-     uint32_t level;                  /* Current IRQ level */
++    DO_BIT(HFGRTR, ICC_IGRPENN_EL1),
 +    DO_BIT(HFGRTR, ERRIDR_EL1),
 +    DO_REV_BIT(HFGRTR, NSMPRI_EL1),
 +    DO_REV_BIT(HFGRTR, NTPIDR2_EL0),
  } FGTBit;
  #undef DO_BIT
 diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/arm_gicv3_cpuif.c
 +++ b/hw/intc/arm_gicv3_cpuif.c
-@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo gicv3_cpuif_reginfo[] = {
-     int irqlevel = 0;
+       .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 6,
-     int fiqlevel = 0;
+       .type = ARM_CP_IO | ARM_CP_NO_RAW,
-     int maintlevel = 0;
+       .access = PL1_RW, .accessfn = gicv3_fiq_access,
-+    ARMCPU *cpu = ARM_CPU(cs->cpu);
++      .fgt = FGT_ICC_IGRPENN_EL1,
+       .readfn = icc_igrpen_read,
-     idx = hppvi_index(cs);
+       .writefn = icc_igrpen_write,
-     trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
+     },
-@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo gicv3_cpuif_reginfo[] = {
+       .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 7,
-     qemu_set_irq(cs->parent_vfiq, fiqlevel);
+       .type = ARM_CP_IO | ARM_CP_NO_RAW,
-     qemu_set_irq(cs->parent_virq, irqlevel);
+       .access = PL1_RW, .accessfn = gicv3_irq_access,
--    qemu_set_irq(cs->maintenance_irq, maintlevel);
++      .fgt = FGT_ICC_IGRPENN_EL1,
-+    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
+       .readfn = icc_igrpen_read,
- }
+       .writefn = icc_igrpen_write,
+     },
- static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
+diff --git a/target/arm/helper.c b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
+index XXXXXXX..XXXXXXX 100644
-             && cpu->gic_num_lrs) {
+--- a/target/arm/helper.c
-             int j;
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
--            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
+     { .name = "TTBR0_EL1", .state = ARM_CP_STATE_BOTH,
--
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 0,
-             cs->num_list_regs = cpu->gic_num_lrs;
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
-             cs->vpribits = cpu->gic_vpribits;
++      .fgt = FGT_TTBR0_EL1,
-             cs->vprebits = cpu->gic_vprebits;
+       .writefn = vmsa_ttbr_write, .resetvalue = 0,
        .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
                               offsetof(CPUARMState, cp15.ttbr0_ns) } },
      { .name = "TTBR1_EL1", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 1,
        .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .fgt = FGT_TTBR1_EL1,
        .writefn = vmsa_ttbr_write, .resetvalue = 0,
        .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr1_s),
                               offsetof(CPUARMState, cp15.ttbr1_ns) } },
@@ -XXX,XX +XXX,XX @@ static void disr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t val)
   *   ERRSELR_EL1
   * may generate UNDEFINED, which is the effect we get by not
   * listing them at all.
 + *
 + * These registers have fine-grained trap bits, but UNDEF-to-EL1
 + * is higher priority than FGT-to-EL2 so we do not need to list them
 + * in order to check for an FGT.
   */
  static const ARMCPRegInfo minimal_ras_reginfo[] = {
      { .name = "DISR_EL1", .state = ARM_CP_STATE_BOTH,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo minimal_ras_reginfo[] = {
      { .name = "ERRIDR_EL1", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 0, .crn = 5, .crm = 3, .opc2 = 0,
        .access = PL1_R, .accessfn = access_terr,
 +      .fgt = FGT_ERRIDR_EL1,
        .type = ARM_CP_CONST, .resetvalue = 0 },
      { .name = "VDISR_EL2", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 4, .crn = 12, .crm = 1, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo sme_reginfo[] = {
      { .name = "TPIDR2_EL0", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .crn = 13, .crm = 0, .opc2 = 5,
        .access = PL0_RW, .accessfn = access_tpidr2,
 +      .fgt = FGT_NTPIDR2_EL0,
        .fieldoffset = offsetof(CPUARMState, cp15.tpidr2_el0) },
      { .name = "SVCR", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .crn = 4, .crm = 2, .opc2 = 2,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo sme_reginfo[] = {
      { .name = "SMPRI_EL1", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 2, .opc2 = 4,
        .access = PL1_RW, .accessfn = access_esm,
 +      .fgt = FGT_NSMPRI_EL1,
        .type = ARM_CP_CONST, .resetvalue = 0 },
      { .name = "SMPRIMAP_EL2", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 2, .opc2 = 5,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
              { .name = "VBAR", .state = ARM_CP_STATE_BOTH,
                .opc0 = 3, .crn = 12, .crm = 0, .opc1 = 0, .opc2 = 0,
                .access = PL1_RW, .writefn = vbar_write,
 +              .fgt = FGT_VBAR_EL1,
                .bank_fieldoffsets = { offsetof(CPUARMState, cp15.vbar_s),
                                       offsetof(CPUARMState, cp15.vbar_ns) },
                .resetvalue = 0 },
 --
-.20.1
+.34.1

-New patch
+[PULL 24/33] target/arm: Mark up sysregs for HDFGRTR bits 0..11
+Mark up the sysreg definitons for the registers trapped
+by HDFGRTR/HDFGWTR bits 0..11. These cover various debug
+related registers.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Message-id: 20230130182459.3309057-15-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-15-peter.maydell@linaro.org
+---
+ target/arm/cpregs.h       | 12 ++++++++++++
+ target/arm/debug_helper.c | 11 +++++++++++
+files changed, 23 insertions(+)
+diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpregs.h
++++ b/target/arm/cpregs.h
+@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
+     DO_BIT(HFGRTR, ERRIDR_EL1),
+     DO_REV_BIT(HFGRTR, NSMPRI_EL1),
+     DO_REV_BIT(HFGRTR, NTPIDR2_EL0),
++
++    /* Trap bits in HDFGRTR_EL2 / HDFGWTR_EL2, starting from bit 0. */
++    DO_BIT(HDFGRTR, DBGBCRN_EL1),
++    DO_BIT(HDFGRTR, DBGBVRN_EL1),
++    DO_BIT(HDFGRTR, DBGWCRN_EL1),
++    DO_BIT(HDFGRTR, DBGWVRN_EL1),
++    DO_BIT(HDFGRTR, MDSCR_EL1),
++    DO_BIT(HDFGRTR, DBGCLAIM),
++    DO_BIT(HDFGWTR, OSLAR_EL1),
++    DO_BIT(HDFGRTR, OSLSR_EL1),
++    DO_BIT(HDFGRTR, OSECCR_EL1),
++    DO_BIT(HDFGRTR, OSDLR_EL1),
+ } FGTBit;
+ #undef DO_BIT
+diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/debug_helper.c
++++ b/target/arm/debug_helper.c
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
+     { .name = "MDSCR_EL1", .state = ARM_CP_STATE_BOTH,
+       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 2,
+       .access = PL1_RW, .accessfn = access_tda,
++      .fgt = FGT_MDSCR_EL1,
+       .fieldoffset = offsetof(CPUARMState, cp15.mdscr_el1),
+       .resetvalue = 0 },
+     /*
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
+     { .name = "OSECCR_EL1", .state = ARM_CP_STATE_BOTH, .cp = 14,
+       .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 2,
+       .access = PL1_RW, .accessfn = access_tda,
++      .fgt = FGT_OSECCR_EL1,
+       .type = ARM_CP_CONST, .resetvalue = 0 },
+     /*
+      * DBGDSCRint[15,12,5:2] map to MDSCR_EL1[15,12,5:2].  Map all bits as
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
+       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 4,
+       .access = PL1_W, .type = ARM_CP_NO_RAW,
+       .accessfn = access_tdosa,
++      .fgt = FGT_OSLAR_EL1,
+       .writefn = oslar_write },
+     { .name = "OSLSR_EL1", .state = ARM_CP_STATE_BOTH,
+       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 4,
+       .access = PL1_R, .resetvalue = 10,
+       .accessfn = access_tdosa,
++      .fgt = FGT_OSLSR_EL1,
+       .fieldoffset = offsetof(CPUARMState, cp15.oslsr_el1) },
+     /* Dummy OSDLR_EL1: 32-bit Linux will read this */
+     { .name = "OSDLR_EL1", .state = ARM_CP_STATE_BOTH,
+       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 1, .crm = 3, .opc2 = 4,
+       .access = PL1_RW, .accessfn = access_tdosa,
++      .fgt = FGT_OSDLR_EL1,
+       .writefn = osdlr_write,
+       .fieldoffset = offsetof(CPUARMState, cp15.osdlr_el1) },
+     /*
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
+       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 6,
+       .type = ARM_CP_ALIAS,
+       .access = PL1_RW, .accessfn = access_tda,
++      .fgt = FGT_DBGCLAIM,
+       .writefn = dbgclaimset_write, .readfn = dbgclaimset_read },
+     { .name = "DBGCLAIMCLR_EL1", .state = ARM_CP_STATE_BOTH,
+       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 6,
+       .access = PL1_RW, .accessfn = access_tda,
++      .fgt = FGT_DBGCLAIM,
+       .writefn = dbgclaimclr_write, .raw_writefn = raw_write,
+       .fieldoffset = offsetof(CPUARMState, cp15.dbgclaim) },
+ };
+@@ -XXX,XX +XXX,XX @@ void define_debug_regs(ARMCPU *cpu)
+             { .name = dbgbvr_el1_name, .state = ARM_CP_STATE_BOTH,
+               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 4,
+               .access = PL1_RW, .accessfn = access_tda,
++              .fgt = FGT_DBGBVRN_EL1,
+               .fieldoffset = offsetof(CPUARMState, cp15.dbgbvr[i]),
+               .writefn = dbgbvr_write, .raw_writefn = raw_write
+             },
+             { .name = dbgbcr_el1_name, .state = ARM_CP_STATE_BOTH,
+               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 5,
+               .access = PL1_RW, .accessfn = access_tda,
++              .fgt = FGT_DBGBCRN_EL1,
+               .fieldoffset = offsetof(CPUARMState, cp15.dbgbcr[i]),
+               .writefn = dbgbcr_write, .raw_writefn = raw_write
+             },
+@@ -XXX,XX +XXX,XX @@ void define_debug_regs(ARMCPU *cpu)
+             { .name = dbgwvr_el1_name, .state = ARM_CP_STATE_BOTH,
+               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 6,
+               .access = PL1_RW, .accessfn = access_tda,
++              .fgt = FGT_DBGWVRN_EL1,
+               .fieldoffset = offsetof(CPUARMState, cp15.dbgwvr[i]),
+               .writefn = dbgwvr_write, .raw_writefn = raw_write
+             },
+             { .name = dbgwcr_el1_name, .state = ARM_CP_STATE_BOTH,
+               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 7,
+               .access = PL1_RW, .accessfn = access_tda,
++              .fgt = FGT_DBGWCRN_EL1,
+               .fieldoffset = offsetof(CPUARMState, cp15.dbgwcr[i]),
+               .writefn = dbgwcr_write, .raw_writefn = raw_write
+             },
+--
+.34.1

-[PULL 26/26] tests/qtest/npcm7xx_rng-test: Disable randomness tests
+[PULL 25/33] target/arm: Mark up sysregs for HDFGRTR bits 12..63
-The randomness tests in the NPCM7xx RNG test fail intermittently
+Mark up the sysreg definitions for the registers trapped
-but fairly frequently. On my machine running the test in a loop:
+by HDFGRTR/HDFGWTR bits 12..x.
- while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done
+Bits 12..22 and bit 58 are for PMU registers.
-will fail in less than a minute with an error like:
-ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
+The remaining bits in HDFGRTR/HDFGWTR are for traps on
-assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)
+registers that are part of features we don't implement:
-(Failures have been observed on all 4 of the randomness tests,
+Bits 23..32 and 63 : FEAT_SPE
-not just first_byte_runs.)
+Bits 33..48 : FEAT_ETE
+Bits 50..56 : FEAT_TRBE
-It's not clear why these tests are failing like this, but intermittent
+Bits 59..61 : FEAT_BRBE
-failures make CI and merge testing awkward, so disable running them
+Bit 62 : FEAT_SPEv1p2.
 unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
 running the test suite, until we work out the cause.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
+Tested-by: Fuad Tabba <tabba@google.com>
-Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
+Message-id: 20230130182459.3309057-16-peter.maydell@linaro.org
 Message-id: 20230127175507.2895013-16-peter.maydell@linaro.org
 ---
- tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
+ target/arm/cpregs.h | 12 ++++++++++++
-file changed, 10 insertions(+), 4 deletions(-)
+ target/arm/helper.c | 37 +++++++++++++++++++++++++++++++++++++
+files changed, 49 insertions(+)
-diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
 diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
 index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/npcm7xx_rng-test.c
+--- a/target/arm/cpregs.h
-+++ b/tests/qtest/npcm7xx_rng-test.c
++++ b/target/arm/cpregs.h
-@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
+@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
+     DO_BIT(HDFGRTR, OSLSR_EL1),
-     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
+     DO_BIT(HDFGRTR, OSECCR_EL1),
-     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
+     DO_BIT(HDFGRTR, OSDLR_EL1),
--    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
++    DO_BIT(HDFGRTR, PMEVCNTRN_EL0),
--    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
++    DO_BIT(HDFGRTR, PMEVTYPERN_EL0),
--    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
++    DO_BIT(HDFGRTR, PMCCFILTR_EL0),
--    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
++    DO_BIT(HDFGRTR, PMCCNTR_EL0),
-+    /*
++    DO_BIT(HDFGRTR, PMCNTEN),
-+     * These tests fail intermittently; only run them on explicit
++    DO_BIT(HDFGRTR, PMINTEN),
-+     * request until we figure out why.
++    DO_BIT(HDFGRTR, PMOVS),
-+     */
++    DO_BIT(HDFGRTR, PMSELR_EL0),
-+    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
++    DO_BIT(HDFGWTR, PMSWINC_EL0),
-+        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
++    DO_BIT(HDFGWTR, PMCR_EL0),
-+        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
++    DO_BIT(HDFGRTR, PMMIR_EL1),
-+        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
++    DO_BIT(HDFGRTR, PMCEIDN_EL0),
-+        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+ } FGTBit;
-+    }
+ #undef DO_BIT
-     qtest_start("-machine npcm750-evb");
+diff --git a/target/arm/helper.c b/target/arm/helper.c
-     ret = g_test_run();
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
        .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmcnten),
        .writefn = pmcntenset_write,
        .accessfn = pmreg_access,
 +      .fgt = FGT_PMCNTEN,
        .raw_writefn = raw_write },
      { .name = "PMCNTENSET_EL0", .state = ARM_CP_STATE_AA64, .type = ARM_CP_IO,
        .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 1,
        .access = PL0_RW, .accessfn = pmreg_access,
 +      .fgt = FGT_PMCNTEN,
        .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcnten), .resetvalue = 0,
        .writefn = pmcntenset_write, .raw_writefn = raw_write },
      { .name = "PMCNTENCLR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 2,
        .access = PL0_RW,
        .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmcnten),
        .accessfn = pmreg_access,
 +      .fgt = FGT_PMCNTEN,
        .writefn = pmcntenclr_write,
        .type = ARM_CP_ALIAS | ARM_CP_IO },
      { .name = "PMCNTENCLR_EL0", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 2,
        .access = PL0_RW, .accessfn = pmreg_access,
 +      .fgt = FGT_PMCNTEN,
        .type = ARM_CP_ALIAS | ARM_CP_IO,
        .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcnten),
        .writefn = pmcntenclr_write },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
        .access = PL0_RW, .type = ARM_CP_IO,
        .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmovsr),
        .accessfn = pmreg_access,
 +      .fgt = FGT_PMOVS,
        .writefn = pmovsr_write,
        .raw_writefn = raw_write },
      { .name = "PMOVSCLR_EL0", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 3,
        .access = PL0_RW, .accessfn = pmreg_access,
 +      .fgt = FGT_PMOVS,
        .type = ARM_CP_ALIAS | ARM_CP_IO,
        .fieldoffset = offsetof(CPUARMState, cp15.c9_pmovsr),
        .writefn = pmovsr_write,
        .raw_writefn = raw_write },
      { .name = "PMSWINC", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 4,
        .access = PL0_W, .accessfn = pmreg_access_swinc,
 +      .fgt = FGT_PMSWINC_EL0,
        .type = ARM_CP_NO_RAW | ARM_CP_IO,
        .writefn = pmswinc_write },
      { .name = "PMSWINC_EL0", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 4,
        .access = PL0_W, .accessfn = pmreg_access_swinc,
 +      .fgt = FGT_PMSWINC_EL0,
        .type = ARM_CP_NO_RAW | ARM_CP_IO,
        .writefn = pmswinc_write },
      { .name = "PMSELR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 5,
        .access = PL0_RW, .type = ARM_CP_ALIAS,
 +      .fgt = FGT_PMSELR_EL0,
        .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmselr),
        .accessfn = pmreg_access_selr, .writefn = pmselr_write,
        .raw_writefn = raw_write},
      { .name = "PMSELR_EL0", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 5,
        .access = PL0_RW, .accessfn = pmreg_access_selr,
 +      .fgt = FGT_PMSELR_EL0,
        .fieldoffset = offsetof(CPUARMState, cp15.c9_pmselr),
        .writefn = pmselr_write, .raw_writefn = raw_write, },
      { .name = "PMCCNTR", .cp = 15, .crn = 9, .crm = 13, .opc1 = 0, .opc2 = 0,
        .access = PL0_RW, .resetvalue = 0, .type = ARM_CP_ALIAS | ARM_CP_IO,
 +      .fgt = FGT_PMCCNTR_EL0,
        .readfn = pmccntr_read, .writefn = pmccntr_write32,
        .accessfn = pmreg_access_ccntr },
      { .name = "PMCCNTR_EL0", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 13, .opc2 = 0,
        .access = PL0_RW, .accessfn = pmreg_access_ccntr,
 +      .fgt = FGT_PMCCNTR_EL0,
        .type = ARM_CP_IO,
        .fieldoffset = offsetof(CPUARMState, cp15.c15_ccnt),
        .readfn = pmccntr_read, .writefn = pmccntr_write,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
      { .name = "PMCCFILTR", .cp = 15, .opc1 = 0, .crn = 14, .crm = 15, .opc2 = 7,
        .writefn = pmccfiltr_write_a32, .readfn = pmccfiltr_read_a32,
        .access = PL0_RW, .accessfn = pmreg_access,
 +      .fgt = FGT_PMCCFILTR_EL0,
        .type = ARM_CP_ALIAS | ARM_CP_IO,
        .resetvalue = 0, },
      { .name = "PMCCFILTR_EL0", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 15, .opc2 = 7,
        .writefn = pmccfiltr_write, .raw_writefn = raw_write,
        .access = PL0_RW, .accessfn = pmreg_access,
 +      .fgt = FGT_PMCCFILTR_EL0,
        .type = ARM_CP_IO,
        .fieldoffset = offsetof(CPUARMState, cp15.pmccfiltr_el0),
        .resetvalue = 0, },
      { .name = "PMXEVTYPER", .cp = 15, .crn = 9, .crm = 13, .opc1 = 0, .opc2 = 1,
        .access = PL0_RW, .type = ARM_CP_NO_RAW | ARM_CP_IO,
        .accessfn = pmreg_access,
 +      .fgt = FGT_PMEVTYPERN_EL0,
        .writefn = pmxevtyper_write, .readfn = pmxevtyper_read },
      { .name = "PMXEVTYPER_EL0", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 13, .opc2 = 1,
        .access = PL0_RW, .type = ARM_CP_NO_RAW | ARM_CP_IO,
        .accessfn = pmreg_access,
 +      .fgt = FGT_PMEVTYPERN_EL0,
        .writefn = pmxevtyper_write, .readfn = pmxevtyper_read },
      { .name = "PMXEVCNTR", .cp = 15, .crn = 9, .crm = 13, .opc1 = 0, .opc2 = 2,
        .access = PL0_RW, .type = ARM_CP_NO_RAW | ARM_CP_IO,
        .accessfn = pmreg_access_xevcntr,
 +      .fgt = FGT_PMEVCNTRN_EL0,
        .writefn = pmxevcntr_write, .readfn = pmxevcntr_read },
      { .name = "PMXEVCNTR_EL0", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 13, .opc2 = 2,
        .access = PL0_RW, .type = ARM_CP_NO_RAW | ARM_CP_IO,
        .accessfn = pmreg_access_xevcntr,
 +      .fgt = FGT_PMEVCNTRN_EL0,
        .writefn = pmxevcntr_write, .readfn = pmxevcntr_read },
      { .name = "PMUSERENR", .cp = 15, .crn = 9, .crm = 14, .opc1 = 0, .opc2 = 0,
        .access = PL0_R | PL1_RW, .accessfn = access_tpm,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
        .writefn = pmuserenr_write, .raw_writefn = raw_write },
      { .name = "PMINTENSET", .cp = 15, .crn = 9, .crm = 14, .opc1 = 0, .opc2 = 1,
        .access = PL1_RW, .accessfn = access_tpm,
 +      .fgt = FGT_PMINTEN,
        .type = ARM_CP_ALIAS | ARM_CP_IO,
        .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pminten),
        .resetvalue = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
      { .name = "PMINTENSET_EL1", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 1,
        .access = PL1_RW, .accessfn = access_tpm,
 +      .fgt = FGT_PMINTEN,
        .type = ARM_CP_IO,
        .fieldoffset = offsetof(CPUARMState, cp15.c9_pminten),
        .writefn = pmintenset_write, .raw_writefn = raw_write,
        .resetvalue = 0x0 },
      { .name = "PMINTENCLR", .cp = 15, .crn = 9, .crm = 14, .opc1 = 0, .opc2 = 2,
        .access = PL1_RW, .accessfn = access_tpm,
 +      .fgt = FGT_PMINTEN,
        .type = ARM_CP_ALIAS | ARM_CP_IO | ARM_CP_NO_RAW,
        .fieldoffset = offsetof(CPUARMState, cp15.c9_pminten),
        .writefn = pmintenclr_write, },
      { .name = "PMINTENCLR_EL1", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 2,
        .access = PL1_RW, .accessfn = access_tpm,
 +      .fgt = FGT_PMINTEN,
        .type = ARM_CP_ALIAS | ARM_CP_IO | ARM_CP_NO_RAW,
        .fieldoffset = offsetof(CPUARMState, cp15.c9_pminten),
        .writefn = pmintenclr_write },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo pmovsset_cp_reginfo[] = {
      /* PMOVSSET is not implemented in v7 before v7ve */
      { .name = "PMOVSSET", .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 3,
        .access = PL0_RW, .accessfn = pmreg_access,
 +      .fgt = FGT_PMOVS,
        .type = ARM_CP_ALIAS | ARM_CP_IO,
        .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmovsr),
        .writefn = pmovsset_write,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo pmovsset_cp_reginfo[] = {
      { .name = "PMOVSSET_EL0", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 14, .opc2 = 3,
        .access = PL0_RW, .accessfn = pmreg_access,
 +      .fgt = FGT_PMOVS,
        .type = ARM_CP_ALIAS | ARM_CP_IO,
        .fieldoffset = offsetof(CPUARMState, cp15.c9_pmovsr),
        .writefn = pmovsset_write,
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
      ARMCPRegInfo pmcr = {
          .name = "PMCR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 0,
          .access = PL0_RW,
 +        .fgt = FGT_PMCR_EL0,
          .type = ARM_CP_IO | ARM_CP_ALIAS,
          .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmcr),
          .accessfn = pmreg_access, .writefn = pmcr_write,
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
          .name = "PMCR_EL0", .state = ARM_CP_STATE_AA64,
          .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 0,
          .access = PL0_RW, .accessfn = pmreg_access,
 +        .fgt = FGT_PMCR_EL0,
          .type = ARM_CP_IO,
          .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcr),
          .resetvalue = cpu->isar.reset_pmcr_el0,
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
              { .name = pmevcntr_name, .cp = 15, .crn = 14,
                .crm = 8 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
                .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
 +              .fgt = FGT_PMEVCNTRN_EL0,
                .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
                .accessfn = pmreg_access_xevcntr },
              { .name = pmevcntr_el0_name, .state = ARM_CP_STATE_AA64,
                .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 8 | (3 & (i >> 3)),
                .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access_xevcntr,
                .type = ARM_CP_IO,
 +              .fgt = FGT_PMEVCNTRN_EL0,
                .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
                .raw_readfn = pmevcntr_rawread,
                .raw_writefn = pmevcntr_rawwrite },
              { .name = pmevtyper_name, .cp = 15, .crn = 14,
                .crm = 12 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
                .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
 +              .fgt = FGT_PMEVTYPERN_EL0,
                .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
                .accessfn = pmreg_access },
              { .name = pmevtyper_el0_name, .state = ARM_CP_STATE_AA64,
                .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 12 | (3 & (i >> 3)),
                .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
 +              .fgt = FGT_PMEVTYPERN_EL0,
                .type = ARM_CP_IO,
                .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
                .raw_writefn = pmevtyper_rawwrite },
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
              { .name = "PMCEID2", .state = ARM_CP_STATE_AA32,
                .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 4,
                .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
 +              .fgt = FGT_PMCEIDN_EL0,
                .resetvalue = extract64(cpu->pmceid0, 32, 32) },
              { .name = "PMCEID3", .state = ARM_CP_STATE_AA32,
                .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 5,
                .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
 +              .fgt = FGT_PMCEIDN_EL0,
                .resetvalue = extract64(cpu->pmceid1, 32, 32) },
          };
          define_arm_cp_regs(cpu, v81_pmu_regs);
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
              .name = "PMMIR_EL1", .state = ARM_CP_STATE_BOTH,
              .opc0 = 3, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 6,
              .access = PL1_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
 +            .fgt = FGT_PMMIR_EL1,
              .resetvalue = 0
          };
          define_one_arm_cp_reg(cpu, &v84_pmmir);
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
              { .name = "PMCEID0", .state = ARM_CP_STATE_AA32,
                .cp = 15, .opc1 = 0, .crn = 9, .crm = 12, .opc2 = 6,
                .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
 +              .fgt = FGT_PMCEIDN_EL0,
                .resetvalue = extract64(cpu->pmceid0, 0, 32) },
              { .name = "PMCEID0_EL0", .state = ARM_CP_STATE_AA64,
                .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 6,
                .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
 +              .fgt = FGT_PMCEIDN_EL0,
                .resetvalue = cpu->pmceid0 },
              { .name = "PMCEID1", .state = ARM_CP_STATE_AA32,
                .cp = 15, .opc1 = 0, .crn = 9, .crm = 12, .opc2 = 7,
                .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
 +              .fgt = FGT_PMCEIDN_EL0,
                .resetvalue = extract64(cpu->pmceid1, 0, 32) },
              { .name = "PMCEID1_EL0", .state = ARM_CP_STATE_AA64,
                .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 7,
                .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
 +              .fgt = FGT_PMCEIDN_EL0,
                .resetvalue = cpu->pmceid1 },
          };
  #ifdef CONFIG_USER_ONLY
 --
-.20.1
+.34.1

-[PULL 25/26] qemu-option-trace.rst.inc: Don't use option:: markup
+[PULL 26/33] target/arm: Mark up sysregs for HFGITR bits 0..11
-Sphinx 3.2 is pickier than earlier versions about the option:: markup,
+Mark up the sysreg definitions for the system instructions
-and complains about our usage in qemu-option-trace.rst:
+trapped by HFGITR bits 0..11. These bits cover various
+cache maintenance operations.
 ../../docs/qemu-option-trace.rst.inc:4:Malformed option description
   '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
   "/opt args" or "+opt args"
 In this file, we're really trying to document the different parts of
 the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
 have already introduced with an option:: markup.  So it's not right
 to use option:: here anyway.  Switch to a different markup
 (definition lists) which gives about the same formatted output.
 (Unlike option::, this markup doesn't produce index entries; but
 at the moment we don't do anything much with indexes anyway, and
 in any case I think it doesn't make much sense to have individual
 index entries for the sub-parts of the --trace option.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
+Tested-by: Fuad Tabba <tabba@google.com>
-Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
+Message-id: 20230130182459.3309057-17-peter.maydell@linaro.org
 Message-id: 20230127175507.2895013-17-peter.maydell@linaro.org
 ---
- docs/qemu-option-trace.rst.inc | 6 +++---
+ target/arm/cpregs.h | 14 ++++++++++++++
-file changed, 3 insertions(+), 3 deletions(-)
+ target/arm/helper.c | 28 ++++++++++++++++++++++++++++
 files changed, 42 insertions(+)
-diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
+diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
 index XXXXXXX..XXXXXXX 100644
---- a/docs/qemu-option-trace.rst.inc
+--- a/target/arm/cpregs.h
-+++ b/docs/qemu-option-trace.rst.inc
++++ b/target/arm/cpregs.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
+     DO_BIT(HDFGWTR, PMCR_EL0),
- Specify tracing options.
+     DO_BIT(HDFGRTR, PMMIR_EL1),
+     DO_BIT(HDFGRTR, PMCEIDN_EL0),
--.. option:: [enable=]PATTERN
++
-+``[enable=]PATTERN``
++    /* Trap bits in HFGITR_EL2, starting from bit 0 */
++    DO_BIT(HFGITR, ICIALLUIS),
-   Immediately enable events matching *PATTERN*
++    DO_BIT(HFGITR, ICIALLU),
-   (either event name or a globbing pattern).  This option is only
++    DO_BIT(HFGITR, ICIVAU),
-@@ -XXX,XX +XXX,XX @@ Specify tracing options.
++    DO_BIT(HFGITR, DCIVAC),
++    DO_BIT(HFGITR, DCISW),
-   Use :option:`-trace help` to print a list of names of trace points.
++    DO_BIT(HFGITR, DCCSW),
++    DO_BIT(HFGITR, DCCISW),
--.. option:: events=FILE
++    DO_BIT(HFGITR, DCCVAU),
-+``events=FILE``
++    DO_BIT(HFGITR, DCCVAP),
++    DO_BIT(HFGITR, DCCVADP),
-   Immediately enable events listed in *FILE*.
++    DO_BIT(HFGITR, DCCIVAC),
-   The file must contain one event name (as listed in the ``trace-events-all``
++    DO_BIT(HFGITR, DCZVA),
-@@ -XXX,XX +XXX,XX @@ Specify tracing options.
+ } FGTBit;
-   available if QEMU has been compiled with the ``simple``, ``log`` or
-   ``ftrace`` tracing backend.
+ #undef DO_BIT
+diff --git a/target/arm/helper.c b/target/arm/helper.c
--.. option:: file=FILE
+index XXXXXXX..XXXXXXX 100644
-+``file=FILE``
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
-   Log output traces to *FILE*.
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-   This option is only available if QEMU has been compiled with
+ #ifndef CONFIG_USER_ONLY
        /* Avoid overhead of an access check that always passes in user-mode */
        .accessfn = aa64_zva_access,
 +      .fgt = FGT_DCZVA,
  #endif
      },
      { .name = "CURRENTEL", .state = ARM_CP_STATE_AA64,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
      { .name = "IC_IALLUIS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
        .access = PL1_W, .type = ARM_CP_NOP,
 +      .fgt = FGT_ICIALLUIS,
        .accessfn = access_ticab },
      { .name = "IC_IALLU", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 0,
        .access = PL1_W, .type = ARM_CP_NOP,
 +      .fgt = FGT_ICIALLU,
        .accessfn = access_tocu },
      { .name = "IC_IVAU", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 5, .opc2 = 1,
        .access = PL0_W, .type = ARM_CP_NOP,
 +      .fgt = FGT_ICIVAU,
        .accessfn = access_tocu },
      { .name = "DC_IVAC", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
        .access = PL1_W, .accessfn = aa64_cacheop_poc_access,
 +      .fgt = FGT_DCIVAC,
        .type = ARM_CP_NOP },
      { .name = "DC_ISW", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 2,
 +      .fgt = FGT_DCISW,
        .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
      { .name = "DC_CVAC", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 10, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
        .accessfn = aa64_cacheop_poc_access },
      { .name = "DC_CSW", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
 +      .fgt = FGT_DCCSW,
        .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
      { .name = "DC_CVAU", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 11, .opc2 = 1,
        .access = PL0_W, .type = ARM_CP_NOP,
 +      .fgt = FGT_DCCVAU,
        .accessfn = access_tocu },
      { .name = "DC_CIVAC", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 1,
        .access = PL0_W, .type = ARM_CP_NOP,
 +      .fgt = FGT_DCCIVAC,
        .accessfn = aa64_cacheop_poc_access },
      { .name = "DC_CISW", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
 +      .fgt = FGT_DCCISW,
        .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
      /* TLBI operations */
      { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo dcpop_reg[] = {
      { .name = "DC_CVAP", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 1,
        .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
 +      .fgt = FGT_DCCVAP,
        .accessfn = aa64_cacheop_poc_access, .writefn = dccvap_writefn },
  };
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo dcpodp_reg[] = {
      { .name = "DC_CVADP", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 1,
        .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
 +      .fgt = FGT_DCCVADP,
        .accessfn = aa64_cacheop_poc_access, .writefn = dccvap_writefn },
  };
  #endif /*CONFIG_USER_ONLY*/
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo mte_reginfo[] = {
      { .name = "DC_IGVAC", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 3,
        .type = ARM_CP_NOP, .access = PL1_W,
 +      .fgt = FGT_DCIVAC,
        .accessfn = aa64_cacheop_poc_access },
      { .name = "DC_IGSW", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 4,
 +      .fgt = FGT_DCISW,
        .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
      { .name = "DC_IGDVAC", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 5,
        .type = ARM_CP_NOP, .access = PL1_W,
 +      .fgt = FGT_DCIVAC,
        .accessfn = aa64_cacheop_poc_access },
      { .name = "DC_IGDSW", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 6,
 +      .fgt = FGT_DCISW,
        .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
      { .name = "DC_CGSW", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 4,
 +      .fgt = FGT_DCCSW,
        .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
      { .name = "DC_CGDSW", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 6,
 +      .fgt = FGT_DCCSW,
        .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
      { .name = "DC_CIGSW", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 4,
 +      .fgt = FGT_DCCISW,
        .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
      { .name = "DC_CIGDSW", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 6,
 +      .fgt = FGT_DCCISW,
        .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
  };
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo mte_el0_cacheop_reginfo[] = {
      { .name = "DC_CGVAP", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 3,
        .type = ARM_CP_NOP, .access = PL0_W,
 +      .fgt = FGT_DCCVAP,
        .accessfn = aa64_cacheop_poc_access },
      { .name = "DC_CGDVAP", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 5,
        .type = ARM_CP_NOP, .access = PL0_W,
 +      .fgt = FGT_DCCVAP,
        .accessfn = aa64_cacheop_poc_access },
      { .name = "DC_CGVADP", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 3,
        .type = ARM_CP_NOP, .access = PL0_W,
 +      .fgt = FGT_DCCVADP,
        .accessfn = aa64_cacheop_poc_access },
      { .name = "DC_CGDVADP", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 5,
        .type = ARM_CP_NOP, .access = PL0_W,
 +      .fgt = FGT_DCCVADP,
        .accessfn = aa64_cacheop_poc_access },
      { .name = "DC_CIGVAC", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 3,
        .type = ARM_CP_NOP, .access = PL0_W,
 +      .fgt = FGT_DCCIVAC,
        .accessfn = aa64_cacheop_poc_access },
      { .name = "DC_CIGDVAC", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 5,
        .type = ARM_CP_NOP, .access = PL0_W,
 +      .fgt = FGT_DCCIVAC,
        .accessfn = aa64_cacheop_poc_access },
      { .name = "DC_GVA", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 4, .opc2 = 3,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo mte_el0_cacheop_reginfo[] = {
  #ifndef CONFIG_USER_ONLY
        /* Avoid overhead of an access check that always passes in user-mode */
        .accessfn = aa64_zva_access,
 +      .fgt = FGT_DCZVA,
  #endif
      },
      { .name = "DC_GZVA", .state = ARM_CP_STATE_AA64,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo mte_el0_cacheop_reginfo[] = {
  #ifndef CONFIG_USER_ONLY
        /* Avoid overhead of an access check that always passes in user-mode */
        .accessfn = aa64_zva_access,
 +      .fgt = FGT_DCZVA,
  #endif
      },
  };
 --
-.20.1
+.34.1

-New patch
+[PULL 27/33] target/arm: Mark up sysregs for HFGITR bits 12..17
+Mark up the sysreg definitions for the system instructions
+trapped by HFGITR bits 12..17. These bits cover AT address
+translation instructions.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Message-id: 20230130182459.3309057-18-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-18-peter.maydell@linaro.org
+---
+ target/arm/cpregs.h | 6 ++++++
+ target/arm/helper.c | 6 ++++++
+files changed, 12 insertions(+)
+diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpregs.h
++++ b/target/arm/cpregs.h
+@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
+     DO_BIT(HFGITR, DCCVADP),
+     DO_BIT(HFGITR, DCCIVAC),
+     DO_BIT(HFGITR, DCZVA),
++    DO_BIT(HFGITR, ATS1E1R),
++    DO_BIT(HFGITR, ATS1E1W),
++    DO_BIT(HFGITR, ATS1E0R),
++    DO_BIT(HFGITR, ATS1E0W),
++    DO_BIT(HFGITR, ATS1E1RP),
++    DO_BIT(HFGITR, ATS1E1WP),
+ } FGTBit;
+ #undef DO_BIT
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
+     { .name = "AT_S1E1R", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 0,
+       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
++      .fgt = FGT_ATS1E1R,
+       .writefn = ats_write64 },
+     { .name = "AT_S1E1W", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 1,
+       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
++      .fgt = FGT_ATS1E1W,
+       .writefn = ats_write64 },
+     { .name = "AT_S1E0R", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 2,
+       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
++      .fgt = FGT_ATS1E0R,
+       .writefn = ats_write64 },
+     { .name = "AT_S1E0W", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 3,
+       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
++      .fgt = FGT_ATS1E0W,
+       .writefn = ats_write64 },
+     { .name = "AT_S12E1R", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 4, .crn = 7, .crm = 8, .opc2 = 4,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo ats1e1_reginfo[] = {
+     { .name = "AT_S1E1RP", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 0,
+       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
++      .fgt = FGT_ATS1E1RP,
+       .writefn = ats_write64 },
+     { .name = "AT_S1E1WP", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 1,
+       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
++      .fgt = FGT_ATS1E1WP,
+       .writefn = ats_write64 },
+ };
+--
+.34.1

-[PULL 22/26] configure: Test that gio libs from pkg-config work
+[PULL 28/33] target/arm: Mark up sysregs for HFGITR bits 18..47
-On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
+Mark up the sysreg definitions for the system instructions
-libraries for gio-2.0 which don't actually work when compiling
+trapped by HFGITR bits 18..47. These bits cover TLBI
-statically. (Specifically, the returned library string includes
+TLB maintenance instructions.
 -lmount, but not -lblkid which -lmount depends upon, so linking
 fails due to missing symbols.)
-Check that the libraries work, and don't enable gio if they don't,
+(If we implemented FEAT_XS we would need to trap some of the
-in the same way we do for gnutls.
+instructions added by that feature using these bits; but we don't
 yet, so will need to add the .fgt markup when we do.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Tested-by: Fuad Tabba <tabba@google.com>
-Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
+Message-id: 20230130182459.3309057-19-peter.maydell@linaro.org
 Message-id: 20230127175507.2895013-19-peter.maydell@linaro.org
 ---
- configure | 10 +++++++++-
+ target/arm/cpregs.h | 30 ++++++++++++++++++++++++++++++
-file changed, 9 insertions(+), 1 deletion(-)
+ target/arm/helper.c | 30 ++++++++++++++++++++++++++++++
 files changed, 60 insertions(+)
-diff --git a/configure b/configure
+diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/configure
+--- a/target/arm/cpregs.h
-+++ b/configure
++++ b/target/arm/cpregs.h
-@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
+@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
- fi
+     DO_BIT(HFGITR, ATS1E0W),
+     DO_BIT(HFGITR, ATS1E1RP),
- if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
+     DO_BIT(HFGITR, ATS1E1WP),
--    gio=yes
++    DO_BIT(HFGITR, TLBIVMALLE1OS),
-     gio_cflags=$($pkg_config --cflags gio-2.0)
++    DO_BIT(HFGITR, TLBIVAE1OS),
-     gio_libs=$($pkg_config --libs gio-2.0)
++    DO_BIT(HFGITR, TLBIASIDE1OS),
-     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
++    DO_BIT(HFGITR, TLBIVAAE1OS),
-     if [ ! -x "$gdbus_codegen" ]; then
++    DO_BIT(HFGITR, TLBIVALE1OS),
-         gdbus_codegen=
++    DO_BIT(HFGITR, TLBIVAALE1OS),
-     fi
++    DO_BIT(HFGITR, TLBIRVAE1OS),
-+    # Check that the libraries actually work -- Ubuntu 18.04 ships
++    DO_BIT(HFGITR, TLBIRVAAE1OS),
-+    # with pkg-config --static --libs data for gio-2.0 that is missing
++    DO_BIT(HFGITR, TLBIRVALE1OS),
-+    # -lblkid and will give a link error.
++    DO_BIT(HFGITR, TLBIRVAALE1OS),
-+    write_c_skeleton
++    DO_BIT(HFGITR, TLBIVMALLE1IS),
-+    if compile_prog "" "gio_libs" ; then
++    DO_BIT(HFGITR, TLBIVAE1IS),
-+        gio=yes
++    DO_BIT(HFGITR, TLBIASIDE1IS),
-+    else
++    DO_BIT(HFGITR, TLBIVAAE1IS),
-+        gio=no
++    DO_BIT(HFGITR, TLBIVALE1IS),
-+    fi
++    DO_BIT(HFGITR, TLBIVAALE1IS),
- else
++    DO_BIT(HFGITR, TLBIRVAE1IS),
-     gio=no
++    DO_BIT(HFGITR, TLBIRVAAE1IS),
- fi
++    DO_BIT(HFGITR, TLBIRVALE1IS),
 +    DO_BIT(HFGITR, TLBIRVAALE1IS),
 +    DO_BIT(HFGITR, TLBIRVAE1),
 +    DO_BIT(HFGITR, TLBIRVAAE1),
 +    DO_BIT(HFGITR, TLBIRVALE1),
 +    DO_BIT(HFGITR, TLBIRVAALE1),
 +    DO_BIT(HFGITR, TLBIVMALLE1),
 +    DO_BIT(HFGITR, TLBIVAE1),
 +    DO_BIT(HFGITR, TLBIASIDE1),
 +    DO_BIT(HFGITR, TLBIVAAE1),
 +    DO_BIT(HFGITR, TLBIVALE1),
 +    DO_BIT(HFGITR, TLBIVAALE1),
  } FGTBit;
  #undef DO_BIT
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
      { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
        .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVMALLE1IS,
        .writefn = tlbi_aa64_vmalle1is_write },
      { .name = "TLBI_VAE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1,
        .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVAE1IS,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_ASIDE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2,
        .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIASIDE1IS,
        .writefn = tlbi_aa64_vmalle1is_write },
      { .name = "TLBI_VAAE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3,
        .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVAAE1IS,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_VALE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5,
        .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVALE1IS,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_VAALE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7,
        .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVAALE1IS,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_VMALLE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0,
        .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVMALLE1,
        .writefn = tlbi_aa64_vmalle1_write },
      { .name = "TLBI_VAE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 1,
        .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVAE1,
        .writefn = tlbi_aa64_vae1_write },
      { .name = "TLBI_ASIDE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 2,
        .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIASIDE1,
        .writefn = tlbi_aa64_vmalle1_write },
      { .name = "TLBI_VAAE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 3,
        .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVAAE1,
        .writefn = tlbi_aa64_vae1_write },
      { .name = "TLBI_VALE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5,
        .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVALE1,
        .writefn = tlbi_aa64_vae1_write },
      { .name = "TLBI_VAALE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7,
        .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVAALE1,
        .writefn = tlbi_aa64_vae1_write },
      { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbirange_reginfo[] = {
      { .name = "TLBI_RVAE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 1,
        .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVAE1IS,
        .writefn = tlbi_aa64_rvae1is_write },
      { .name = "TLBI_RVAAE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 3,
        .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVAAE1IS,
        .writefn = tlbi_aa64_rvae1is_write },
     { .name = "TLBI_RVALE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 5,
        .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVALE1IS,
        .writefn = tlbi_aa64_rvae1is_write },
      { .name = "TLBI_RVAALE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 7,
        .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVAALE1IS,
        .writefn = tlbi_aa64_rvae1is_write },
      { .name = "TLBI_RVAE1OS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1,
        .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVAE1OS,
        .writefn = tlbi_aa64_rvae1is_write },
      { .name = "TLBI_RVAAE1OS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 3,
        .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVAAE1OS,
        .writefn = tlbi_aa64_rvae1is_write },
     { .name = "TLBI_RVALE1OS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 5,
        .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVALE1OS,
        .writefn = tlbi_aa64_rvae1is_write },
      { .name = "TLBI_RVAALE1OS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 7,
        .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVAALE1OS,
        .writefn = tlbi_aa64_rvae1is_write },
      { .name = "TLBI_RVAE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 1,
        .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVAE1,
        .writefn = tlbi_aa64_rvae1_write },
      { .name = "TLBI_RVAAE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 3,
        .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVAAE1,
        .writefn = tlbi_aa64_rvae1_write },
     { .name = "TLBI_RVALE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 5,
        .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVALE1,
        .writefn = tlbi_aa64_rvae1_write },
      { .name = "TLBI_RVAALE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 7,
        .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIRVAALE1,
        .writefn = tlbi_aa64_rvae1_write },
      { .name = "TLBI_RIPAS2E1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 2,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbios_reginfo[] = {
      { .name = "TLBI_VMALLE1OS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 0,
        .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVMALLE1OS,
        .writefn = tlbi_aa64_vmalle1is_write },
      { .name = "TLBI_VAE1OS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 1,
 +      .fgt = FGT_TLBIVAE1OS,
        .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_ASIDE1OS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 2,
        .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIASIDE1OS,
        .writefn = tlbi_aa64_vmalle1is_write },
      { .name = "TLBI_VAAE1OS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 3,
        .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVAAE1OS,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_VALE1OS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 5,
        .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVALE1OS,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_VAALE1OS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 7,
        .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
 +      .fgt = FGT_TLBIVAALE1OS,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_ALLE2OS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 0,
 --
-.20.1
+.34.1

-New patch
+[PULL 29/33] target/arm: Mark up sysregs for HFGITR bits 48..63
+Mark up the sysreg definitions for the system instructions
+trapped by HFGITR bits 48..63.
+Some of these bits are for trapping instructions which are
+not in the system instruction encoding (i.e. which are
+not handled by the ARMCPRegInfo mechanism):
+ * ERET, ERETAA, ERETAB
+ * SVC
+We will have to handle those separately and manually.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Message-id: 20230130182459.3309057-20-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-20-peter.maydell@linaro.org
+---
+ target/arm/cpregs.h | 4 ++++
+ target/arm/helper.c | 9 +++++++++
+files changed, 13 insertions(+)
+diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpregs.h
++++ b/target/arm/cpregs.h
+@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
+     DO_BIT(HFGITR, TLBIVAAE1),
+     DO_BIT(HFGITR, TLBIVALE1),
+     DO_BIT(HFGITR, TLBIVAALE1),
++    DO_BIT(HFGITR, CFPRCTX),
++    DO_BIT(HFGITR, DVPRCTX),
++    DO_BIT(HFGITR, CPPRCTX),
++    DO_BIT(HFGITR, DCCVAC),
+ } FGTBit;
+ #undef DO_BIT
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
+     { .name = "DC_CVAC", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 10, .opc2 = 1,
+       .access = PL0_W, .type = ARM_CP_NOP,
++      .fgt = FGT_DCCVAC,
+       .accessfn = aa64_cacheop_poc_access },
+     { .name = "DC_CSW", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo mte_el0_cacheop_reginfo[] = {
+     { .name = "DC_CGVAC", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 10, .opc2 = 3,
+       .type = ARM_CP_NOP, .access = PL0_W,
++      .fgt = FGT_DCCVAC,
+       .accessfn = aa64_cacheop_poc_access },
+     { .name = "DC_CGDVAC", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 10, .opc2 = 5,
+       .type = ARM_CP_NOP, .access = PL0_W,
++      .fgt = FGT_DCCVAC,
+       .accessfn = aa64_cacheop_poc_access },
+     { .name = "DC_CGVAP", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 3,
+@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_predinv(CPUARMState *env, const ARMCPRegInfo *ri,
+ static const ARMCPRegInfo predinv_reginfo[] = {
+     { .name = "CFP_RCTX", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 3, .opc2 = 4,
++      .fgt = FGT_CFPRCTX,
+       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
+     { .name = "DVP_RCTX", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 3, .opc2 = 5,
++      .fgt = FGT_DVPRCTX,
+       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
+     { .name = "CPP_RCTX", .state = ARM_CP_STATE_AA64,
+       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 3, .opc2 = 7,
++      .fgt = FGT_CPPRCTX,
+       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
+     /*
+      * Note the AArch32 opcodes have a different OPC1.
+      */
+     { .name = "CFPRCTX", .state = ARM_CP_STATE_AA32,
+       .cp = 15, .opc1 = 0, .crn = 7, .crm = 3, .opc2 = 4,
++      .fgt = FGT_CFPRCTX,
+       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
+     { .name = "DVPRCTX", .state = ARM_CP_STATE_AA32,
+       .cp = 15, .opc1 = 0, .crn = 7, .crm = 3, .opc2 = 5,
++      .fgt = FGT_DVPRCTX,
+       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
+     { .name = "CPPRCTX", .state = ARM_CP_STATE_AA32,
+       .cp = 15, .opc1 = 0, .crn = 7, .crm = 3, .opc2 = 7,
++      .fgt = FGT_CPPRCTX,
+       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
+ };
+--
+.34.1

-[PULL 24/26] scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
+[PULL 30/33] target/arm: Implement the HFGITR_EL2.ERET trap
-The kerneldoc script currently emits Sphinx markup for a macro with
+Implement the HFGITR_EL2.ERET fine-grained trap.  This traps
-arguments that uses the c:function directive. This is correct for
+execution from AArch64 EL1 of ERET, ERETAA and ERETAB.  The trap is
-Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
+reported with a syndrome value of 0x1a.
 documentation of macros with arguments and c:function is not picky
 about the syntax of what it is passed. However, in Sphinx 3 the
 c:macro directive was enhanced to support macros with arguments,
 and c:function was made more picky about what syntax it accepted.
-When kerneldoc is told that it needs to produce output for Sphinx
+The trap must take precedence over a possible pointer-authentication
-or later, make it emit c:function only for functions and c:macro
+trap for ERETAA and ERETAB.
 for macros with arguments. We assume that anything with a return
 type is a function and anything without is a macro.
 This fixes the Sphinx error:
 /home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
 If declarator-id with parameters (e.g., 'void f(int arg)'):
   Invalid C declaration: Expected identifier in nested name. [error at 25]
     DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
     -------------------------^
 If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
   Error in declarator or parameters
   Invalid C declaration: Expecting "(" in parameters. [error at 39]
     DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
     ---------------------------------------^
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
+Tested-by: Fuad Tabba <tabba@google.com>
-Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
+Message-id: 20230130182459.3309057-21-peter.maydell@linaro.org
 Message-id: 20230127175507.2895013-21-peter.maydell@linaro.org
 ---
- scripts/kernel-doc | 18 +++++++++++++++++-
+ target/arm/cpu.h           |  1 +
-file changed, 17 insertions(+), 1 deletion(-)
+ target/arm/syndrome.h      | 10 ++++++++++
  target/arm/translate.h     |  2 ++
  target/arm/helper.c        |  3 +++
  target/arm/translate-a64.c | 10 ++++++++++
 files changed, 26 insertions(+)
-diff --git a/scripts/kernel-doc b/scripts/kernel-doc
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/scripts/kernel-doc
+--- a/target/arm/cpu.h
-+++ b/scripts/kernel-doc
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
+@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, PSTATE_ZA, 23, 1)
-     output_highlight_rst($args{'purpose'});
+ FIELD(TBFLAG_A64, SVL, 24, 4)
-     $start = "\n\n**Syntax**\n\n  ``";
+ /* Indicates that SME Streaming mode is active, and SMCR_ELx.FA64 is not. */
-     } else {
+ FIELD(TBFLAG_A64, SME_TRAP_NONSTREAMING, 28, 1)
--    print ".. c:function:: ";
++FIELD(TBFLAG_A64, FGT_ERET, 29, 1)
-+        if ((split(/\./, $sphinx_version))[0] >= 3) {
-+            # Sphinx 3 and later distinguish macros and functions and
+ /*
-+            # complain if you use c:function with something that's not
+  * Helpers for using the above.
-+            # syntactically valid as a function declaration.
+diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
-+            # We assume that anything with a return type is a function
+index XXXXXXX..XXXXXXX 100644
-+            # and anything without is a macro.
+--- a/target/arm/syndrome.h
-+            if ($args{'functiontype'} ne "") {
++++ b/target/arm/syndrome.h
-+                print ".. c:function:: ";
+@@ -XXX,XX +XXX,XX @@ enum arm_exception_class {
-+            } else {
+     EC_AA64_SMC               = 0x17,
-+                print ".. c:macro:: ";
+     EC_SYSTEMREGISTERTRAP     = 0x18,
-+            }
+     EC_SVEACCESSTRAP          = 0x19,
-+        } else {
++    EC_ERETTRAP               = 0x1a,
-+            # Older Sphinx don't support documenting macros that take
+     EC_SMETRAP                = 0x1d,
-+            # arguments with c:macro, and don't complain about the use
+     EC_INSNABORT              = 0x20,
-+            # of c:function for this.
+     EC_INSNABORT_SAME_EL      = 0x21,
-+            print ".. c:function:: ";
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_sve_access_trap(void)
      return EC_SVEACCESSTRAP << ARM_EL_EC_SHIFT;
  }
 +/*
 + * eret_op is bits [1:0] of the ERET instruction, so:
 + * 0 for ERET, 2 for ERETAA, 3 for ERETAB.
 + */
 +static inline uint32_t syn_erettrap(int eret_op)
 +{
 +    return (EC_ERETTRAP << ARM_EL_EC_SHIFT) | ARM_EL_IL | eret_op;
 +}
 +
  static inline uint32_t syn_smetrap(SMEExceptionType etype, bool is_16bit)
  {
      return (EC_SMETRAP << ARM_EL_EC_SHIFT)
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
      bool mve_no_pred;
      /* True if fine-grained traps are active */
      bool fgt_active;
 +    /* True if fine-grained trap on ERET is enabled */
 +    bool fgt_eret;
      /*
       * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
       *  < 0, set by the current instruction.
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
      if (arm_fgt_active(env, el)) {
          DP_TBFLAG_ANY(flags, FGT_ACTIVE, 1);
 +        if (FIELD_EX64(env->cp15.fgt_exec[FGTREG_HFGITR], HFGITR_EL2, ERET)) {
 +            DP_TBFLAG_A64(flags, FGT_ERET, 1);
 +        }
      }
-     if ($args{'functiontype'} ne "") {
-     $start .= $args{'functiontype'} . " " . $args{'function'} . " (";
+     if (cpu_isar_feature(aa64_mte, env_archcpu(env))) {
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
              if (op4 != 0) {
                  goto do_unallocated;
              }
 +            if (s->fgt_eret) {
 +                gen_exception_insn_el(s, 0, EXCP_UDEF, syn_erettrap(op3), 2);
 +                return;
 +            }
              dst = tcg_temp_new_i64();
              tcg_gen_ld_i64(dst, cpu_env,
                             offsetof(CPUARMState, elr_el[s->current_el]));
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
              if (rn != 0x1f || op4 != 0x1f) {
                  goto do_unallocated;
              }
 +            /* The FGT trap takes precedence over an auth trap. */
 +            if (s->fgt_eret) {
 +                gen_exception_insn_el(s, 0, EXCP_UDEF, syn_erettrap(op3), 2);
 +                return;
 +            }
              dst = tcg_temp_new_i64();
              tcg_gen_ld_i64(dst, cpu_env,
                             offsetof(CPUARMState, elr_el[s->current_el]));
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
      dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
      dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
      dc->fgt_active = EX_TBFLAG_ANY(tb_flags, FGT_ACTIVE);
 +    dc->fgt_eret = EX_TBFLAG_A64(tb_flags, FGT_ERET);
      dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL);
      dc->sme_excp_el = EX_TBFLAG_A64(tb_flags, SMEEXC_EL);
      dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16;
 --
-.20.1
+.34.1

-[PULL 07/26] target/arm: Rename neon_load_reg32 to vfp_load_reg32
+[PULL 31/33] target/arm: Implement the HFGITR_EL2.SVC_EL0 and SVC_EL1 traps
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the HFGITR_EL2.SVC_EL0 and SVC_EL1 fine-grained traps.
 These trap execution of the SVC instruction from AArch32 and AArch64.
 (As usual, AArch32 can only trap from EL0, as fine grained traps are
 disabled with an AArch32 EL1.)
-The only uses of this function are for loading VFP
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-single-precision values, and nothing to do with NEON.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Tested-by: Fuad Tabba <tabba@google.com>
 Message-id: 20230130182459.3309057-22-peter.maydell@linaro.org
 Message-id: 20230127175507.2895013-22-peter.maydell@linaro.org
 ---
  target/arm/cpu.h           |  1 +
  target/arm/translate.h     |  2 ++
  target/arm/helper.c        | 20 ++++++++++++++++++++
  target/arm/translate-a64.c |  9 ++++++++-
  target/arm/translate.c     | 12 +++++++++---
 files changed, 40 insertions(+), 4 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
+index XXXXXXX..XXXXXXX 100644
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+--- a/target/arm/cpu.h
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
++++ b/target/arm/cpu.h
----
+@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_ANY, FPEXC_EL, 8, 2)
- target/arm/translate.c         |   4 +-
+ FIELD(TBFLAG_ANY, ALIGN_MEM, 10, 1)
- target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
+ FIELD(TBFLAG_ANY, PSTATE__IL, 11, 1)
-files changed, 94 insertions(+), 94 deletions(-)
+ FIELD(TBFLAG_ANY, FGT_ACTIVE, 12, 1)
++FIELD(TBFLAG_ANY, FGT_SVC, 13, 1)
  /*
   * Bit usage when in AArch32 state, both A- and M-profile.
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
      bool fgt_active;
      /* True if fine-grained trap on ERET is enabled */
      bool fgt_eret;
 +    /* True if fine-grained trap on SVC is enabled */
 +    bool fgt_svc;
      /*
       * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
       *  < 0, set by the current instruction.
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env)
      return arm_mmu_idx_el(env, arm_current_el(env));
  }
 +static inline bool fgt_svc(CPUARMState *env, int el)
 +{
 +    /*
 +     * Assuming fine-grained-traps are active, return true if we
 +     * should be trapping on SVC instructions. Only AArch64 can
 +     * trap on an SVC at EL1, but we don't need to special-case this
 +     * because if this is AArch32 EL1 then arm_fgt_active() is false.
 +     * We also know el is 0 or 1.
 +     */
 +    return el == 0 ?
 +        FIELD_EX64(env->cp15.fgt_exec[FGTREG_HFGITR], HFGITR_EL2, SVC_EL0) :
 +        FIELD_EX64(env->cp15.fgt_exec[FGTREG_HFGITR], HFGITR_EL2, SVC_EL1);
 +}
 +
  static CPUARMTBFlags rebuild_hflags_common(CPUARMState *env, int fp_el,
                                             ARMMMUIdx mmu_idx,
                                             CPUARMTBFlags flags)
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el,
      if (arm_fgt_active(env, el)) {
          DP_TBFLAG_ANY(flags, FGT_ACTIVE, 1);
 +        if (fgt_svc(env, el)) {
 +            DP_TBFLAG_ANY(flags, FGT_SVC, 1);
 +        }
      }
      if (env->uncached_cpsr & CPSR_IL) {
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
          if (FIELD_EX64(env->cp15.fgt_exec[FGTREG_HFGITR], HFGITR_EL2, ERET)) {
              DP_TBFLAG_A64(flags, FGT_ERET, 1);
          }
 +        if (fgt_svc(env, el)) {
 +            DP_TBFLAG_ANY(flags, FGT_SVC, 1);
 +        }
      }
      if (cpu_isar_feature(aa64_mte, env_archcpu(env))) {
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_exc(DisasContext *s, uint32_t insn)
      int opc = extract32(insn, 21, 3);
      int op2_ll = extract32(insn, 0, 5);
      int imm16 = extract32(insn, 5, 16);
 +    uint32_t syndrome;
      switch (opc) {
      case 0:
@@ -XXX,XX +XXX,XX @@ static void disas_exc(DisasContext *s, uint32_t insn)
           */
          switch (op2_ll) {
          case 1:                                                     /* SVC */
 +            syndrome = syn_aa64_svc(imm16);
 +            if (s->fgt_svc) {
 +                gen_exception_insn_el(s, 0, EXCP_UDEF, syndrome, 2);
 +                break;
 +            }
              gen_ss_advance(s);
 -            gen_exception_insn(s, 4, EXCP_SWI, syn_aa64_svc(imm16));
 +            gen_exception_insn(s, 4, EXCP_SWI, syndrome);
              break;
          case 2:                                                     /* HVC */
              if (s->current_el == 0) {
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
      dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
      dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
      dc->fgt_active = EX_TBFLAG_ANY(tb_flags, FGT_ACTIVE);
 +    dc->fgt_svc = EX_TBFLAG_ANY(tb_flags, FGT_SVC);
      dc->fgt_eret = EX_TBFLAG_A64(tb_flags, FGT_ERET);
      dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL);
      dc->sme_excp_el = EX_TBFLAG_A64(tb_flags, SMEEXC_EL);
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
+@@ -XXX,XX +XXX,XX @@ static bool trans_SVC(DisasContext *s, arg_SVC *a)
-     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
+         (a->imm == semihost_imm)) {
- }
+         gen_exception_internal_insn(s, EXCP_SEMIHOST);
+     } else {
--static inline void neon_load_reg32(TCGv_i32 var, int reg)
+-        gen_update_pc(s, curr_insn_len(s));
-+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
+-        s->svc_imm = a->imm;
- {
+-        s->base.is_jmp = DISAS_SWI;
-     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
++        if (s->fgt_svc) {
- }
++            uint32_t syndrome = syn_aa32_svc(a->imm, s->thumb);
++            gen_exception_insn_el(s, 0, EXCP_UDEF, syndrome, 2);
--static inline void neon_store_reg32(TCGv_i32 var, int reg)
++        } else {
-+static inline void vfp_store_reg32(TCGv_i32 var, int reg)
++            gen_update_pc(s, curr_insn_len(s));
- {
++            s->svc_imm = a->imm;
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
++            s->base.is_jmp = DISAS_SWI;
- }
++        }
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          frn = tcg_temp_new_i32();
          frm = tcg_temp_new_i32();
          dest = tcg_temp_new_i32();
 -        neon_load_reg32(frn, rn);
 -        neon_load_reg32(frm, rm);
 +        vfp_load_reg32(frn, rn);
 +        vfp_load_reg32(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          if (sz == 1) {
              tcg_gen_andi_i32(dest, dest, 0xffff);
          }
 -        neon_store_reg32(dest, rd);
 +        vfp_store_reg32(dest, rd);
          tcg_temp_free_i32(frn);
          tcg_temp_free_i32(frm);
          tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i32 tcg_res;
          tcg_op = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_op, rm);
 +        vfp_load_reg32(tcg_op, rm);
          if (sz == 1) {
              gen_helper_rinth(tcg_res, tcg_op, fpst);
          } else {
              gen_helper_rints(tcg_res, tcg_op, fpst);
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_op);
          tcg_temp_free_i32(tcg_res);
      }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
-         }
-         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
--        neon_store_reg32(tcg_tmp, rd);
-+        vfp_store_reg32(tcg_tmp, rd);
-         tcg_temp_free_i32(tcg_tmp);
-         tcg_temp_free_i64(tcg_res);
-         tcg_temp_free_i64(tcg_double);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-         TCGv_i32 tcg_single, tcg_res;
-         tcg_single = tcg_temp_new_i32();
-         tcg_res = tcg_temp_new_i32();
--        neon_load_reg32(tcg_single, rm);
-+        vfp_load_reg32(tcg_single, rm);
-         if (sz == 1) {
-             if (is_signed) {
-                 gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-                 gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
-             }
-         }
--        neon_store_reg32(tcg_res, rd);
-+        vfp_store_reg32(tcg_res, rd);
-         tcg_temp_free_i32(tcg_res);
-         tcg_temp_free_i32(tcg_single);
-     }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
-     if (a->l) {
-         /* VFP to general purpose register */
-         tmp = tcg_temp_new_i32();
--        neon_load_reg32(tmp, a->vn);
-+        vfp_load_reg32(tmp, a->vn);
-         tcg_gen_andi_i32(tmp, tmp, 0xffff);
-         store_reg(s, a->rt, tmp);
-     } else {
-         /* general purpose register to VFP */
-         tmp = load_reg(s, a->rt);
-         tcg_gen_andi_i32(tmp, tmp, 0xffff);
--        neon_store_reg32(tmp, a->vn);
-+        vfp_store_reg32(tmp, a->vn);
-         tcg_temp_free_i32(tmp);
-     }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
-     if (a->l) {
-         /* VFP to general purpose register */
-         tmp = tcg_temp_new_i32();
--        neon_load_reg32(tmp, a->vn);
-+        vfp_load_reg32(tmp, a->vn);
-         if (a->rt == 15) {
-             /* Set the 4 flag bits in the CPSR.  */
-             gen_set_nzcv(tmp);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
-     } else {
-         /* general purpose register to VFP */
-         tmp = load_reg(s, a->rt);
--        neon_store_reg32(tmp, a->vn);
-+        vfp_store_reg32(tmp, a->vn);
-         tcg_temp_free_i32(tmp);
-     }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
-     if (a->op) {
-         /* fpreg to gpreg */
-         tmp = tcg_temp_new_i32();
--        neon_load_reg32(tmp, a->vm);
-+        vfp_load_reg32(tmp, a->vm);
-         store_reg(s, a->rt, tmp);
-         tmp = tcg_temp_new_i32();
--        neon_load_reg32(tmp, a->vm + 1);
-+        vfp_load_reg32(tmp, a->vm + 1);
-         store_reg(s, a->rt2, tmp);
-     } else {
-         /* gpreg to fpreg */
-         tmp = load_reg(s, a->rt);
--        neon_store_reg32(tmp, a->vm);
-+        vfp_store_reg32(tmp, a->vm);
-         tcg_temp_free_i32(tmp);
-         tmp = load_reg(s, a->rt2);
--        neon_store_reg32(tmp, a->vm + 1);
-+        vfp_store_reg32(tmp, a->vm + 1);
-         tcg_temp_free_i32(tmp);
-     }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
-     if (a->op) {
-         /* fpreg to gpreg */
-         tmp = tcg_temp_new_i32();
--        neon_load_reg32(tmp, a->vm * 2);
-+        vfp_load_reg32(tmp, a->vm * 2);
-         store_reg(s, a->rt, tmp);
-         tmp = tcg_temp_new_i32();
--        neon_load_reg32(tmp, a->vm * 2 + 1);
-+        vfp_load_reg32(tmp, a->vm * 2 + 1);
-         store_reg(s, a->rt2, tmp);
-     } else {
-         /* gpreg to fpreg */
-         tmp = load_reg(s, a->rt);
--        neon_store_reg32(tmp, a->vm * 2);
-+        vfp_store_reg32(tmp, a->vm * 2);
-         tcg_temp_free_i32(tmp);
-         tmp = load_reg(s, a->rt2);
--        neon_store_reg32(tmp, a->vm * 2 + 1);
-+        vfp_store_reg32(tmp, a->vm * 2 + 1);
-         tcg_temp_free_i32(tmp);
-     }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-     tmp = tcg_temp_new_i32();
-     if (a->l) {
-         gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
--        neon_store_reg32(tmp, a->vd);
-+        vfp_store_reg32(tmp, a->vd);
-     } else {
--        neon_load_reg32(tmp, a->vd);
-+        vfp_load_reg32(tmp, a->vd);
-         gen_aa32_st16(s, tmp, addr, get_mem_index(s));
-     }
-     tcg_temp_free_i32(tmp);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-     tmp = tcg_temp_new_i32();
-     if (a->l) {
-         gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
--        neon_store_reg32(tmp, a->vd);
-+        vfp_store_reg32(tmp, a->vd);
-     } else {
--        neon_load_reg32(tmp, a->vd);
-+        vfp_load_reg32(tmp, a->vd);
-         gen_aa32_st32(s, tmp, addr, get_mem_index(s));
-     }
-     tcg_temp_free_i32(tmp);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
-         if (a->l) {
-             /* load */
-             gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
--            neon_store_reg32(tmp, a->vd + i);
-+            vfp_store_reg32(tmp, a->vd + i);
-         } else {
-             /* store */
--            neon_load_reg32(tmp, a->vd + i);
-+            vfp_load_reg32(tmp, a->vd + i);
-             gen_aa32_st32(s, tmp, addr, get_mem_index(s));
-         }
-         tcg_gen_addi_i32(addr, addr, offset);
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
-     fd = tcg_temp_new_i32();
-     fpst = fpstatus_ptr(FPST_FPCR);
--    neon_load_reg32(f0, vn);
--    neon_load_reg32(f1, vm);
-+    vfp_load_reg32(f0, vn);
-+    vfp_load_reg32(f1, vm);
-     for (;;) {
-         if (reads_vd) {
--            neon_load_reg32(fd, vd);
-+            vfp_load_reg32(fd, vd);
-         }
-         fn(fd, f0, f1, fpst);
--        neon_store_reg32(fd, vd);
-+        vfp_store_reg32(fd, vd);
-         if (veclen == 0) {
-             break;
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
-         veclen--;
-         vd = vfp_advance_sreg(vd, delta_d);
-         vn = vfp_advance_sreg(vn, delta_d);
--        neon_load_reg32(f0, vn);
-+        vfp_load_reg32(f0, vn);
-         if (delta_m) {
-             vm = vfp_advance_sreg(vm, delta_m);
--            neon_load_reg32(f1, vm);
-+            vfp_load_reg32(f1, vm);
-         }
-     }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
-     fd = tcg_temp_new_i32();
-     fpst = fpstatus_ptr(FPST_FPCR_F16);
--    neon_load_reg32(f0, vn);
--    neon_load_reg32(f1, vm);
-+    vfp_load_reg32(f0, vn);
-+    vfp_load_reg32(f1, vm);
-     if (reads_vd) {
--        neon_load_reg32(fd, vd);
-+        vfp_load_reg32(fd, vd);
-     }
-     fn(fd, f0, f1, fpst);
--    neon_store_reg32(fd, vd);
-+    vfp_store_reg32(fd, vd);
-     tcg_temp_free_i32(f0);
-     tcg_temp_free_i32(f1);
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
-     f0 = tcg_temp_new_i32();
-     fd = tcg_temp_new_i32();
--    neon_load_reg32(f0, vm);
-+    vfp_load_reg32(f0, vm);
-     for (;;) {
-         fn(fd, f0);
--        neon_store_reg32(fd, vd);
-+        vfp_store_reg32(fd, vd);
-         if (veclen == 0) {
-             break;
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
-             /* single source one-many */
-             while (veclen--) {
-                 vd = vfp_advance_sreg(vd, delta_d);
--                neon_store_reg32(fd, vd);
-+                vfp_store_reg32(fd, vd);
-             }
-             break;
-         }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
-         veclen--;
-         vd = vfp_advance_sreg(vd, delta_d);
-         vm = vfp_advance_sreg(vm, delta_m);
--        neon_load_reg32(f0, vm);
-+        vfp_load_reg32(f0, vm);
-     }
-     tcg_temp_free_i32(f0);
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
-     }
-     f0 = tcg_temp_new_i32();
--    neon_load_reg32(f0, vm);
-+    vfp_load_reg32(f0, vm);
-     fn(f0, f0);
--    neon_store_reg32(f0, vd);
-+    vfp_store_reg32(f0, vd);
-     tcg_temp_free_i32(f0);
-     return true;
-@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
-     vm = tcg_temp_new_i32();
-     vd = tcg_temp_new_i32();
--    neon_load_reg32(vn, a->vn);
--    neon_load_reg32(vm, a->vm);
-+    vfp_load_reg32(vn, a->vn);
-+    vfp_load_reg32(vm, a->vm);
-     if (neg_n) {
-         /* VFNMS, VFMS */
-         gen_helper_vfp_negh(vn, vn);
-     }
--    neon_load_reg32(vd, a->vd);
-+    vfp_load_reg32(vd, a->vd);
-     if (neg_d) {
-         /* VFNMA, VFNMS */
-         gen_helper_vfp_negh(vd, vd);
-     }
-     fpst = fpstatus_ptr(FPST_FPCR_F16);
-     gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
--    neon_store_reg32(vd, a->vd);
-+    vfp_store_reg32(vd, a->vd);
-     tcg_temp_free_ptr(fpst);
-     tcg_temp_free_i32(vn);
-@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
-     vm = tcg_temp_new_i32();
-     vd = tcg_temp_new_i32();
--    neon_load_reg32(vn, a->vn);
--    neon_load_reg32(vm, a->vm);
-+    vfp_load_reg32(vn, a->vn);
-+    vfp_load_reg32(vm, a->vm);
-     if (neg_n) {
-         /* VFNMS, VFMS */
-         gen_helper_vfp_negs(vn, vn);
-     }
--    neon_load_reg32(vd, a->vd);
-+    vfp_load_reg32(vd, a->vd);
-     if (neg_d) {
-         /* VFNMA, VFNMS */
-         gen_helper_vfp_negs(vd, vd);
-     }
-     fpst = fpstatus_ptr(FPST_FPCR);
-     gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
--    neon_store_reg32(vd, a->vd);
-+    vfp_store_reg32(vd, a->vd);
-     tcg_temp_free_ptr(fpst);
-     tcg_temp_free_i32(vn);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
-     }
-     fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
--    neon_store_reg32(fd, a->vd);
-+    vfp_store_reg32(fd, a->vd);
-     tcg_temp_free_i32(fd);
      return true;
  }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
+@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
-     fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
+     dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
+     dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
-     for (;;) {
+     dc->fgt_active = EX_TBFLAG_ANY(tb_flags, FGT_ACTIVE);
--        neon_store_reg32(fd, vd);
++    dc->fgt_svc = EX_TBFLAG_ANY(tb_flags, FGT_SVC);
-+        vfp_store_reg32(fd, vd);
+     if (arm_feature(env, ARM_FEATURE_M)) {
-         if (veclen == 0) {
+         dc->vfp_enabled = 1;
              break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i32(vm, 0);
      } else {
 -        neon_load_reg32(vm, a->vm);
 +        vfp_load_reg32(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i32(vm, 0);
      } else {
 -        neon_load_reg32(vm, a->vm);
 +        vfp_load_reg32(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
      /* The T bit tells us if we want the low or high 16 bits of Vm */
      tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_i32(ahp_mode);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
      ahp_mode = get_ahp_flag();
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
      tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
      tcg_temp_free_i32(ahp_mode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_rinth(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rinth(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rints(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_rinth_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
      neon_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vm = tcg_temp_new_i64();
      neon_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      if (a->s) {
          /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f16 */
          gen_helper_vfp_uitoh(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f32 */
          gen_helper_vfp_uitos(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vd = tcg_temp_new_i32();
      neon_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_i32(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      if (a->s) {
          if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
              gen_helper_vfp_touih(vm, vm, fpst);
          }
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR);
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      if (a->s) {
          if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
              gen_helper_vfp_touis(vm, vm, fpst);
          }
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
              gen_helper_vfp_touid(vd, vm, fpst);
          }
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
      /* Insert low half of Vm into high half of Vd */
      rm = tcg_temp_new_i32();
      rd = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 -    neon_load_reg32(rd, a->vd);
 +    vfp_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rd, a->vd);
      tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
 -    neon_store_reg32(rd, a->vd);
 +    vfp_store_reg32(rd, a->vd);
      tcg_temp_free_i32(rm);
      tcg_temp_free_i32(rd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
      /* Set Vd to high half of Vm */
      rm = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rm, a->vm);
      tcg_gen_shri_i32(rm, rm, 16);
 -    neon_store_reg32(rm, a->vd);
 +    vfp_store_reg32(rm, a->vd);
      tcg_temp_free_i32(rm);
      return true;
  }
 --
-.20.1
+.34.1

-[PULL 02/26] target/arm: Move neon_element_offset to translate.c
+[PULL 32/33] target/arm: Implement MDCR_EL2.TDCC and MDCR_EL3.TDCC traps
-From: Richard Henderson <richard.henderson@linaro.org>
+FEAT_FGT also implements an extra trap bit in the MDCR_EL2 and
 MDCR_EL3 registers: bit TDCC enables trapping of use of the Debug
 Comms Channel registers OSDTRRX_EL1, OSDTRTX_EL1, MDCCSR_EL0,
 MDCCINT_EL0, DBGDTR_EL0, DBGDTRRX_EL0 and DBGDTRTX_EL0 (and their
 AArch32 equivalents).  This trapping is independent of whether
 fine-grained traps are enabled or not.
-This will shortly have users outside of translate-neon.c.inc.
+Implement these extra traps.  (We don't implement DBGDTR_EL0,
 DBGDTRRX_EL0 and DBGDTRTX_EL0.)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Tested-by: Fuad Tabba <tabba@google.com>
+Message-id: 20230130182459.3309057-23-peter.maydell@linaro.org
+Message-id: 20230127175507.2895013-23-peter.maydell@linaro.org
 ---
- target/arm/translate.c          | 20 ++++++++++++++++++++
+ target/arm/debug_helper.c | 35 +++++++++++++++++++++++++++++++----
- target/arm/translate-neon.c.inc | 19 -------------------
+file changed, 31 insertions(+), 4 deletions(-)
 files changed, 20 insertions(+), 19 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/debug_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/debug_helper.c
-@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tda(CPUARMState *env, const ARMCPRegInfo *ri,
-     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+     return CP_ACCESS_OK;
  }
 +/*
-+ * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
++ * Check for traps to Debug Comms Channel registers. If FEAT_FGT
-+ * where 0 is the least significant end of the register.
++ * is implemented then these are controlled by MDCR_EL2.TDCC for
 + * EL2 and MDCR_EL3.TDCC for EL3. They are also controlled by
 + * the general debug access trap bits MDCR_EL2.TDA and MDCR_EL3.TDA.
 + */
-+static long neon_element_offset(int reg, int element, MemOp size)
++static CPAccessResult access_tdcc(CPUARMState *env, const ARMCPRegInfo *ri,
 +                                  bool isread)
 +{
-+    int element_size = 1 << size;
++    int el = arm_current_el(env);
-+    int ofs = element * element_size;
++    uint64_t mdcr_el2 = arm_mdcr_el2_eff(env);
-+#ifdef HOST_WORDS_BIGENDIAN
++    bool mdcr_el2_tda = (mdcr_el2 & MDCR_TDA) || (mdcr_el2 & MDCR_TDE) ||
-+    /*
++        (arm_hcr_el2_eff(env) & HCR_TGE);
-+     * Calculate the offset assuming fully little-endian,
++    bool mdcr_el2_tdcc = cpu_isar_feature(aa64_fgt, env_archcpu(env)) &&
-+     * then XOR to account for the order of the 8-byte units.
++                                          (mdcr_el2 & MDCR_TDCC);
-+     */
++    bool mdcr_el3_tdcc = cpu_isar_feature(aa64_fgt, env_archcpu(env)) &&
-+    if (element_size < 8) {
++                                          (env->cp15.mdcr_el3 & MDCR_TDCC);
-+        ofs ^= 8 - element_size;
++
 +    if (el < 2 && (mdcr_el2_tda || mdcr_el2_tdcc)) {
 +        return CP_ACCESS_TRAP_EL2;
 +    }
-+#endif
++    if (el < 3 && ((env->cp15.mdcr_el3 & MDCR_TDA) || mdcr_el3_tdcc)) {
-+    return neon_full_reg_offset(reg) + ofs;
++        return CP_ACCESS_TRAP_EL3;
 +    }
 +    return CP_ACCESS_OK;
 +}
 +
- static inline long vfp_reg_offset(bool dp, unsigned reg)
+ static void oslar_write(CPUARMState *env, const ARMCPRegInfo *ri,
                          uint64_t value)
  {
-     if (dp) {
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+      */
-index XXXXXXX..XXXXXXX 100644
+     { .name = "MDCCSR_EL0", .state = ARM_CP_STATE_AA64,
---- a/target/arm/translate-neon.c.inc
+       .opc0 = 2, .opc1 = 3, .crn = 0, .crm = 1, .opc2 = 0,
-+++ b/target/arm/translate-neon.c.inc
+-      .access = PL0_R, .accessfn = access_tda,
-@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
++      .access = PL0_R, .accessfn = access_tdcc,
- #include "decode-neon-ls.c.inc"
+       .type = ARM_CP_CONST, .resetvalue = 0 },
- #include "decode-neon-shared.c.inc"
+     /*
+      * OSDTRRX_EL1/OSDTRTX_EL1 are used for save and restore of DBGDTRRX_EL0.
--/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
-- * where 0 is the least significant end of the register.
+      */
-- */
+     { .name = "OSDTRRX_EL1", .state = ARM_CP_STATE_BOTH, .cp = 14,
--static inline long
+       .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 0, .opc2 = 2,
--neon_element_offset(int reg, int element, MemOp size)
+-      .access = PL1_RW, .accessfn = access_tda,
--{
++      .access = PL1_RW, .accessfn = access_tdcc,
--    int element_size = 1 << size;
+       .type = ARM_CP_CONST, .resetvalue = 0 },
--    int ofs = element * element_size;
+     { .name = "OSDTRTX_EL1", .state = ARM_CP_STATE_BOTH, .cp = 14,
--#ifdef HOST_WORDS_BIGENDIAN
+       .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 2,
--    /* Calculate the offset assuming fully little-endian,
+-      .access = PL1_RW, .accessfn = access_tda,
--     * then XOR to account for the order of the 8-byte units.
++      .access = PL1_RW, .accessfn = access_tdcc,
--     */
+       .type = ARM_CP_CONST, .resetvalue = 0 },
--    if (element_size < 8) {
+     /*
--        ofs ^= 8 - element_size;
+      * OSECCR_EL1 provides a mechanism for an operating system
--    }
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
--#endif
+      */
--    return neon_full_reg_offset(reg) + ofs;
+     { .name = "MDCCINT_EL1", .state = ARM_CP_STATE_BOTH,
--}
+       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 0,
--
+-      .access = PL1_RW, .accessfn = access_tda,
- static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
++      .access = PL1_RW, .accessfn = access_tdcc,
- {
+       .type = ARM_CP_NOP },
-     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
+     /*
       * Dummy DBGCLAIM registers.
 --
-.20.1
+.34.1

-[PULL 21/26] target/arm: Get correct MMU index for other-security-state
+[PULL 33/33] target/arm: Enable FEAT_FGT on '-cpu max'
-In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
+Update the ID registers for TCG's '-cpu max' to report the
-armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
+presence of FEAT_FGT Fine-Grained Traps support.
 This is incorrect when the security state being queried is not the
 current one, because arm_current_el() uses the current security state
 to determine which of the banked CONTROL.nPRIV bits to look at.
 The effect was that if (for instance) Secure state was in privileged
 mode but Non-Secure was not then we would return the wrong MMU index.
 The only places where we are using this function in a way that could
 trigger this bug are for the stack loads during a v8M function-return
 and for the instruction fetch of a v8M SG insn.
 Fix the bug by expanding out the M-profile version of the
 arm_current_el() logic inline so it can use the passed in secstate
 rather than env->v7m.secure.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
+Tested-by: Fuad Tabba <tabba@google.com>
 Message-id: 20230130182459.3309057-24-peter.maydell@linaro.org
 Message-id: 20230127175507.2895013-24-peter.maydell@linaro.org
 ---
- target/arm/m_helper.c | 3 ++-
+ docs/system/arm/emulation.rst | 1 +
-file changed, 2 insertions(+), 1 deletion(-)
+ target/arm/cpu64.c            | 1 +
 files changed, 2 insertions(+)
-diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/m_helper.c
+--- a/docs/system/arm/emulation.rst
-+++ b/target/arm/m_helper.c
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
- /* Return the MMU index for a v7M CPU in the specified security state */
+ - FEAT_ETS (Enhanced Translation Synchronization)
- ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
+ - FEAT_EVT (Enhanced Virtualization Traps)
- {
+ - FEAT_FCMA (Floating-point complex number instructions)
--    bool priv = arm_current_el(env) != 0;
++- FEAT_FGT (Fine-Grained Traps)
-+    bool priv = arm_v7m_is_handler_mode(env) ||
+ - FEAT_FHM (Floating-point half-precision multiplication instructions)
-+        !(env->v7m.control[secstate] & 1);
+ - FEAT_FP16 (Half-precision floating-point data processing)
+ - FEAT_FRINTTS (Floating-point to integer instructions)
-     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
- }
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
      t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16_2, 2); /* 16k stage2 supported */
      t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN64_2, 2); /* 64k stage2 supported */
      t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN4_2, 2);  /*  4k stage2 supported */
 +    t = FIELD_DP64(t, ID_AA64MMFR0, FGT, 1);       /* FEAT_FGT */
      cpu->isar.id_aa64mmfr0 = t;
      t = cpu->isar.id_aa64mmfr1;
 --
-.20.1
+.34.1

Small pile of bug fixes for rc1. I've included my patches to get
our docs building with Sphinx 3, just for convenience...

-- PMM

The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:

Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102

for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:

tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)

----------------------------------------------------------------
target-arm queue:
 * target/arm: Fix Neon emulation bugs on big-endian hosts
 * target/arm: fix handling of HCR.FB
 * target/arm: fix LORID_EL1 access check
 * disas/capstone: Fix monitor disassembly of >32 bytes
 * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
 * hw/arm/boot: fix SVE for EL3 direct kernel boot
 * hw/display/omap_lcdc: Fix potential NULL pointer dereference
 * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
 * target/arm: Get correct MMU index for other-security-state
 * configure: Test that gio libs from pkg-config work
 * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
 * docs: Fix building with Sphinx 3
 * tests/qtest/npcm7xx_rng-test: Disable randomness tests

----------------------------------------------------------------
AlexChen (2):
      hw/display/omap_lcdc: Fix potential NULL pointer dereference
      hw/display/exynos4210_fimd: Fix potential NULL pointer dereference

Peter Maydell (9):
      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
      target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
      disas/capstone: Fix monitor disassembly of >32 bytes
      target/arm: Get correct MMU index for other-security-state
      configure: Test that gio libs from pkg-config work
      hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
      scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
      qemu-option-trace.rst.inc: Don't use option:: markup
      tests/qtest/npcm7xx_rng-test: Disable randomness tests

Philippe Mathieu-Daudé (1):
      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)

Richard Henderson (11):
      target/arm: Introduce neon_full_reg_offset
      target/arm: Move neon_element_offset to translate.c
      target/arm: Use neon_element_offset in neon_load/store_reg
      target/arm: Use neon_element_offset in vfp_reg_offset
      target/arm: Add read/write_neon_element32
      target/arm: Expand read/write_neon_element32 to all MemOp
      target/arm: Rename neon_load_reg32 to vfp_load_reg32
      target/arm: Add read/write_neon_element64
      target/arm: Rename neon_load_reg64 to vfp_load_reg64
      target/arm: Simplify do_long_3d and do_2scalar_long
      target/arm: Improve do_prewiden_3d

Rémi Denis-Courmont (3):
      target/arm: fix handling of HCR.FB
      target/arm: fix LORID_EL1 access check
      hw/arm/boot: fix SVE for EL3 direct kernel boot

From: Richard Henderson <richard.henderson@linaro.org>

This function makes it clear that we're talking about the whole
register, and not the 32-bit piece at index 0.  This fixes a bug
when running on a big-endian host.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  8 ++++++
 target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
 target/arm/translate-vfp.c.inc  |  2 +-
 3 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
     unallocated_encoding(s);
 }
 
+/*
+ * Return the offset of a "full" NEON Dreg.
+ */
+static long neon_full_reg_offset(unsigned reg)
+{
+    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
         ofs ^= 8 - element_size;
     }
 #endif
-    return neon_reg_offset(reg, 0) + ofs;
+    return neon_full_reg_offset(reg) + ofs;
 }
 
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
              * We cannot write 16 bytes at once because the
              * destination is unaligned.
              */
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  8, 8, tmp);
-            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
-                             neon_reg_offset(vd, 0), 8, 8);
+            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
+                             neon_full_reg_offset(vd), 8, 8);
         } else {
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  vec_size, vec_size, tmp);
         }
         tcg_gen_addi_i32(addr, addr, 1 << size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
 static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
 {
     /* Handle a 2-reg-shift insn which can be vectorized. */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
 {
     /* FP operations in 2-reg-and-shift group */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
     TCGv_ptr fpst;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
         return true;
     }
 
-    reg_ofs = neon_reg_offset(a->vd, 0);
+    reg_ofs = neon_full_reg_offset(a->vd);
     vec_size = a->q ? 16 : 8;
     imm = asimd_imm_const(a->imm, a->cmode, a->op);
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
         return true;
     }
 
-    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
-                       neon_reg_offset(a->vn, 0),
-                       neon_reg_offset(a->vm, 0),
+    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
+                       neon_full_reg_offset(a->vn),
+                       neon_full_reg_offset(a->vm),
                        16, 16, 0, fn_gvec);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
 {
     /* Two registers and a scalar, using gvec */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
     int rm_ofs;
     int idx;
     TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
     /* a->vm is M:Vm, which encodes both register and index */
     idx = extract32(a->vm, a->size + 2, 2);
     a->vm = extract32(a->vm, 0, a->size + 2);
-    rm_ofs = neon_reg_offset(a->vm, 0);
+    rm_ofs = neon_full_reg_offset(a->vm);
 
     fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
     tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
         return true;
     }
 
-    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
+    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                          neon_element_offset(a->vm, a->index, a->size),
                          a->q ? 16 : 8, a->q ? 16 : 8);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
 static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
     }
 
     tmp = load_reg(s, a->rt);
-    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
+    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                          vec_size, vec_size, tmp);
     tcg_temp_free_i32(tmp);
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This will shortly have users outside of translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 20 ++++++++++++++++++++
 target/arm/translate-neon.c.inc | 19 -------------------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
 }
 
+/*
+ * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+ * where 0 is the least significant end of the register.
+ */
+static long neon_element_offset(int reg, int element, MemOp size)
+{
+    int element_size = 1 << size;
+    int ofs = element * element_size;
+#ifdef HOST_WORDS_BIGENDIAN
+    /*
+     * Calculate the offset assuming fully little-endian,
+     * then XOR to account for the order of the 8-byte units.
+     */
+    if (element_size < 8) {
+        ofs ^= 8 - element_size;
+    }
+#endif
+    return neon_full_reg_offset(reg) + ofs;
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
 #include "decode-neon-ls.c.inc"
 #include "decode-neon-shared.c.inc"
 
-/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
- * where 0 is the least significant end of the register.
- */
-static inline long
-neon_element_offset(int reg, int element, MemOp size)
-{
-    int element_size = 1 << size;
-    int ofs = element * element_size;
-#ifdef HOST_WORDS_BIGENDIAN
-    /* Calculate the offset assuming fully little-endian,
-     * then XOR to account for the order of the 8-byte units.
-     */
-    if (element_size < 8) {
-        ofs ^= 8 - element_size;
-    }
-#endif
-    return neon_full_reg_offset(reg) + ofs;
-}
-
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
 {
     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These are the only users of neon_reg_offset, so remove that.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-/* Return the offset of a 32-bit piece of a NEON register.
-   zero is the least significant end of the register.  */
-static inline long
-neon_reg_offset (int reg, int n)
-{
-    int sreg;
-    sreg = reg * 2 + n;
-    return vfp_reg_offset(0, sreg);
-}
-
 static TCGv_i32 neon_load_reg(int reg, int pass)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
     return tmp;
 }
 
 static void neon_store_reg(int reg, int pass, TCGv_i32 var)
 {
-    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
     tcg_temp_free_i32(var);
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This seems a bit more readable than using offsetof CPU_DoubleU.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
     return neon_full_reg_offset(reg) + ofs;
 }
 
-static inline long vfp_reg_offset(bool dp, unsigned reg)
+/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
+static long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
-        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+        return neon_element_offset(reg, 0, MO_64);
     } else {
-        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
-        if (reg & 1) {
-            ofs += offsetof(CPU_DoubleU, l.upper);
-        } else {
-            ofs += offsetof(CPU_DoubleU, l.lower);
-        }
-        return ofs;
+        return neon_element_offset(reg >> 1, reg & 1, MO_32);
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Model these off the aa64 read/write_vec_element functions.
Use it within translate-neon.c.inc.  The new functions do
not allocate or free temps, so this rearranges the calling
code a bit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  26 ++++
 target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
 2 files changed, 183 insertions(+), 99 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_ld_i32(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_st_i32(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
      * early. Since Q is 0 there are always just two passes, so instead
      * of a complicated loop over each pass we just unroll.
      */
-    tmp = neon_load_reg(a->vn, 0);
-    tmp2 = neon_load_reg(a->vn, 1);
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    tmp3 = tcg_temp_new_i32();
+
+    read_neon_element32(tmp, a->vn, 0, MO_32);
+    read_neon_element32(tmp2, a->vn, 1, MO_32);
     fn(tmp, tmp, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    tmp3 = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    read_neon_element32(tmp3, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     fn(tmp3, tmp3, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    neon_store_reg(a->vd, 0, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * 2-reg-and-shift operations, size < 3 case, where the
      * helper needs to be passed cpu_env.
      */
-    TCGv_i32 constimm;
+    TCGv_i32 constimm, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * by immediate using the variable shift operations.
      */
     constimm = tcg_const_i32(dup_const(a->size, a->shift));
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(constimm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i64(-a->shift);
     rm1 = tcg_temp_new_i64();
     rm2 = tcg_temp_new_i64();
+    rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
     neon_load_reg64(rm1, a->vm);
     neon_load_reg64(rm2, a->vm + 1);
 
     shiftfn(rm1, rm1, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm1);
-    neon_store_reg(a->vd, 0, rd);
+    write_neon_element32(rd, a->vd, 0, MO_32);
 
     shiftfn(rm2, rm2, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm2);
-    neon_store_reg(a->vd, 1, rd);
+    write_neon_element32(rd, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i64(rm1);
     tcg_temp_free_i64(rm2);
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i32(imm);
 
     /* Load all inputs first to avoid potential overwrite */
-    rm1 = neon_load_reg(a->vm, 0);
-    rm2 = neon_load_reg(a->vm, 1);
-    rm3 = neon_load_reg(a->vm + 1, 0);
-    rm4 = neon_load_reg(a->vm + 1, 1);
+    rm1 = tcg_temp_new_i32();
+    rm2 = tcg_temp_new_i32();
+    rm3 = tcg_temp_new_i32();
+    rm4 = tcg_temp_new_i32();
+    read_neon_element32(rm1, a->vm, 0, MO_32);
+    read_neon_element32(rm2, a->vm, 1, MO_32);
+    read_neon_element32(rm3, a->vm, 2, MO_32);
+    read_neon_element32(rm4, a->vm, 3, MO_32);
     rtmp = tcg_temp_new_i64();
 
     shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     tcg_temp_free_i32(rm2);
 
     narrowfn(rm1, cpu_env, rtmp);
-    neon_store_reg(a->vd, 0, rm1);
+    write_neon_element32(rm1, a->vd, 0, MO_32);
+    tcg_temp_free_i32(rm1);
 
     shiftfn(rm3, rm3, constimm);
     shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
 
     narrowfn(rm3, cpu_env, rtmp);
     tcg_temp_free_i64(rtmp);
-    neon_store_reg(a->vd, 1, rm3);
+    write_neon_element32(rm3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rm3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         widen_mask = dup_const(a->size + 1, widen_mask);
     }
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
     tmp = tcg_temp_new_i64();
 
     widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn0_64, a->vn);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 0);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 0, MO_32);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn1_64, a->vn + 1);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 1);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 1, MO_32);
 
     neon_store_reg64(rn0_64, a->vd);
 
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
 
     narrowfn(rd1, rn_64);
 
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rn_64);
     tcg_temp_free_i64(rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i64();
     rd1 = tcg_temp_new_i64();
 
-    rn = neon_load_reg(a->vn, 0);
-    rm = neon_load_reg(a->vm, 0);
+    rn = tcg_temp_new_i32();
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
+    read_neon_element32(rm, a->vm, 0, MO_32);
     opfn(rd0, rn, rm);
-    tcg_temp_free_i32(rn);
-    tcg_temp_free_i32(rm);
 
-    rn = neon_load_reg(a->vn, 1);
-    rm = neon_load_reg(a->vm, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
+    read_neon_element32(rm, a->vm, 1, MO_32);
     opfn(rd1, rn, rm);
     tcg_temp_free_i32(rn);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
 
 static inline TCGv_i32 neon_get_scalar(int size, int reg)
 {
-    TCGv_i32 tmp;
-    if (size == 1) {
-        tmp = neon_load_reg(reg & 7, reg >> 4);
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    if (size == MO_16) {
+        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
         if (reg & 8) {
             gen_neon_dup_high16(tmp);
         } else {
             gen_neon_dup_low16(tmp);
         }
     } else {
-        tmp = neon_load_reg(reg & 15, reg >> 4);
+        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
     }
     return tmp;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      * perform an accumulation operation of that result into the
      * destination.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
+        read_neon_element32(tmp, a->vn, pass, MO_32);
         opfn(tmp, tmp, scalar);
         if (accfn) {
-            TCGv_i32 rd = neon_load_reg(a->vd, pass);
+            TCGv_i32 rd = tcg_temp_new_i32();
+            read_neon_element32(rd, a->vd, pass, MO_32);
             accfn(tmp, rd, tmp);
             tcg_temp_free_i32(rd);
         }
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(scalar);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      * performs a kind of fused op-then-accumulate using a helper
      * function that takes all of rd, rn and the scalar at once.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, rn, rd;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    rn = tcg_temp_new_i32();
+    rd = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 rn = neon_load_reg(a->vn, pass);
-        TCGv_i32 rd = neon_load_reg(a->vd, pass);
+        read_neon_element32(rn, a->vn, pass, MO_32);
+        read_neon_element32(rd, a->vd, pass, MO_32);
         opfn(rd, cpu_env, rn, scalar, rd);
-        tcg_temp_free_i32(rn);
-        neon_store_reg(a->vd, pass, rd);
+        write_neon_element32(rd, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i32(scalar);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     scalar = neon_get_scalar(a->size, a->vm);
 
     /* Load all inputs before writing any outputs, in case of overlap */
-    rn = neon_load_reg(a->vn, 0);
+    rn = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
     rn0_64 = tcg_temp_new_i64();
     opfn(rn0_64, rn, scalar);
-    tcg_temp_free_i32(rn);
 
-    rn = neon_load_reg(a->vn, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
     rn1_64 = tcg_temp_new_i64();
     opfn(rn1_64, rn, scalar);
     tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
         return false;
     }
     n <<= 3;
+    tmp = tcg_temp_new_i32();
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 0);
+        read_neon_element32(tmp, a->vd, 0, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp2 = neon_load_reg(a->vm, 0);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 0, MO_32);
     ptr1 = vfp_reg_ptr(true, a->vn);
     tmp4 = tcg_const_i32(n);
     gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
-    tcg_temp_free_i32(tmp);
+
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 1);
+        read_neon_element32(tmp, a->vd, 1, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp3 = neon_load_reg(a->vm, 1);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 1, MO_32);
     gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(tmp4);
     tcg_temp_free_ptr(ptr1);
-    neon_store_reg(a->vd, 0, tmp2);
-    neon_store_reg(a->vd, 1, tmp3);
-    tcg_temp_free_i32(tmp);
+
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
 static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
 {
     int pass, half;
+    TCGv_i32 tmp[2];
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
         return true;
     }
 
-    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        TCGv_i32 tmp[2];
+    tmp[0] = tcg_temp_new_i32();
+    tmp[1] = tcg_temp_new_i32();
 
+    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
         for (half = 0; half < 2; half++) {
-            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
+            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
             switch (a->size) {
             case 0:
                 tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                 g_assert_not_reached();
             }
         }
-        neon_store_reg(a->vd, pass * 2, tmp[1]);
-        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
+        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
+        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
     }
+
+    tcg_temp_free_i32(tmp[0]);
+    tcg_temp_free_i32(tmp[1]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
         rm0_64 = tcg_temp_new_i64();
         rm1_64 = tcg_temp_new_i64();
         rd_64 = tcg_temp_new_i64();
-        tmp = neon_load_reg(a->vm, pass * 2);
+
+        tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
         widenfn(rm0_64, tmp);
-        tcg_temp_free_i32(tmp);
-        tmp = neon_load_reg(a->vm, pass * 2 + 1);
+        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
         widenfn(rm1_64, tmp);
         tcg_temp_free_i32(tmp);
+
         opfn(rd_64, rm0_64, rm1_64);
         tcg_temp_free_i64(rm0_64);
         tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     narrowfn(rd0, cpu_env, rm);
     neon_load_reg64(rm, a->vm + 1);
     narrowfn(rd1, cpu_env, rm);
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
     }
 
     rd = tcg_temp_new_i64();
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
 
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
-    tmp = neon_load_reg(a->vm, 0);
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
     tcg_gen_shli_i32(tmp2, tmp2, 16);
     tcg_gen_or_i32(tmp2, tmp2, tmp);
-    tcg_temp_free_i32(tmp);
-    tmp = neon_load_reg(a->vm, 2);
+    read_neon_element32(tmp, a->vm, 2, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp3 = neon_load_reg(a->vm, 3);
-    neon_store_reg(a->vd, 0, tmp2);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 3, MO_32);
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    tcg_temp_free_i32(tmp2);
     gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
     tcg_gen_shli_i32(tmp3, tmp3, 16);
     tcg_gen_or_i32(tmp3, tmp3, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
     tmp3 = tcg_temp_new_i32();
-    tmp = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     tcg_gen_ext16u_i32(tmp3, tmp);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 0, tmp3);
+    write_neon_element32(tmp3, a->vd, 0, MO_32);
     tcg_gen_shri_i32(tmp, tmp, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
-    neon_store_reg(a->vd, 1, tmp);
-    tmp3 = tcg_temp_new_i32();
+    write_neon_element32(tmp, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp);
     tcg_gen_ext16u_i32(tmp3, tmp2);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 2, tmp3);
+    write_neon_element32(tmp3, a->vd, 2, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_gen_shri_i32(tmp2, tmp2, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
-    neon_store_reg(a->vd, 3, tmp2);
+    write_neon_element32(tmp2, a->vd, 3, MO_32);
+    tcg_temp_free_i32(tmp2);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
 
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
 
 static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
 {
+    TCGv_i32 tmp;
     int pass;
 
     /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
         return true;
     }
 
+    tmp = tcg_temp_new_i32();
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, tmp);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
         return true;
     }
 
-    if (a->size == 2) {
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    if (a->size == MO_32) {
         for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass + 1);
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass + 1, tmp);
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
         }
     } else {
         for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass);
-            if (a->size == 0) {
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass, MO_32);
+            if (a->size == MO_8) {
                 gen_neon_trn_u8(tmp, tmp2);
             } else {
                 gen_neon_trn_u16(tmp, tmp2);
             }
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass, tmp);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass, MO_32);
         }
     }
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can then use this to improve VMOV (scalar to gp) and
VMOV (gp to scalar) so that we simply perform the memory
operation that we wanted, rather than inserting or
extracting from a 32-bit quantity.

These were the last uses of neon_load/store_reg, so remove them.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         | 50 +++++++++++++-----------
 target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
 2 files changed, 37 insertions(+), 84 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
  * where 0 is the least significant end of the register.
  */
-static long neon_element_offset(int reg, int element, MemOp size)
+static long neon_element_offset(int reg, int element, MemOp memop)
 {
-    int element_size = 1 << size;
+    int element_size = 1 << (memop & MO_SIZE);
     int ofs = element * element_size;
 #ifdef HOST_WORDS_BIGENDIAN
     /*
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static TCGv_i32 neon_load_reg(int reg, int pass)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
-    return tmp;
-}
-
-static void neon_store_reg(int reg, int pass, TCGv_i32 var)
-{
-    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
-    tcg_temp_free_i32(var);
-}
-
 static inline void neon_load_reg64(TCGv_i64 var, int reg)
 {
     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
-    case MO_32:
+    switch (memop) {
+    case MO_SB:
+        tcg_gen_ld8s_i32(dest, cpu_env, off);
+        break;
+    case MO_UB:
+        tcg_gen_ld8u_i32(dest, cpu_env, off);
+        break;
+    case MO_SW:
+        tcg_gen_ld16s_i32(dest, cpu_env, off);
+        break;
+    case MO_UW:
+        tcg_gen_ld16u_i32(dest, cpu_env, off);
+        break;
+    case MO_UL:
+    case MO_SL:
         tcg_gen_ld_i32(dest, cpu_env, off);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
     }
 }
 
-static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
+    switch (memop) {
+    case MO_8:
+        tcg_gen_st8_i32(src, cpu_env, off);
+        break;
+    case MO_16:
+        tcg_gen_st16_i32(src, cpu_env, off);
+        break;
     case MO_32:
         tcg_gen_st_i32(src, cpu_env, off);
         break;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 {
     /* VMOV scalar to general purpose register */
     TCGv_i32 tmp;
-    int pass;
-    uint32_t offset;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = neon_load_reg(a->vn, pass);
-    switch (a->size) {
-    case 0:
-        if (offset) {
-            tcg_gen_shri_i32(tmp, tmp, offset);
-        }
-        if (a->u) {
-            gen_uxtb(tmp);
-        } else {
-            gen_sxtb(tmp);
-        }
-        break;
-    case 1:
-        if (a->u) {
-            if (offset) {
-                tcg_gen_shri_i32(tmp, tmp, 16);
-            } else {
-                gen_uxth(tmp);
-            }
-        } else {
-            if (offset) {
-                tcg_gen_sari_i32(tmp, tmp, 16);
-            } else {
-                gen_sxth(tmp);
-            }
-        }
-        break;
-    case 2:
-        break;
-    }
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
     store_reg(s, a->rt, tmp);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
 {
     /* VMOV general purpose register to scalar */
-    TCGv_i32 tmp, tmp2;
-    int pass;
-    uint32_t offset;
+    TCGv_i32 tmp;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
     tmp = load_reg(s, a->rt);
-    switch (a->size) {
-    case 0:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 1:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 2:
-        break;
-    }
-    neon_store_reg(a->vn, pass, tmp);
+    write_neon_element32(tmp, a->vn, a->index, a->size);
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
single-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |   4 +-
 target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
 2 files changed, 94 insertions(+), 94 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 }
 
-static inline void neon_load_reg32(TCGv_i32 var, int reg)
+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static inline void neon_store_reg32(TCGv_i32 var, int reg)
+static inline void vfp_store_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         frn = tcg_temp_new_i32();
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
-        neon_load_reg32(frn, rn);
-        neon_load_reg32(frm, rm);
+        vfp_load_reg32(frn, rn);
+        vfp_load_reg32(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         if (sz == 1) {
             tcg_gen_andi_i32(dest, dest, 0xffff);
         }
-        neon_store_reg32(dest, rd);
+        vfp_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i32 tcg_res;
         tcg_op = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_op, rm);
+        vfp_load_reg32(tcg_op, rm);
         if (sz == 1) {
             gen_helper_rinth(tcg_res, tcg_op, fpst);
         } else {
             gen_helper_rints(tcg_res, tcg_op, fpst);
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_op);
         tcg_temp_free_i32(tcg_res);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
         }
         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        neon_store_reg32(tcg_tmp, rd);
+        vfp_store_reg32(tcg_tmp, rd);
         tcg_temp_free_i32(tcg_tmp);
         tcg_temp_free_i64(tcg_res);
         tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         TCGv_i32 tcg_single, tcg_res;
         tcg_single = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_single, rm);
+        vfp_load_reg32(tcg_single, rm);
         if (sz == 1) {
             if (is_signed) {
                 gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                 gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
             }
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_res);
         tcg_temp_free_i32(tcg_single);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
         store_reg(s, a->rt, tmp);
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         if (a->rt == 15) {
             /* Set the 4 flag bits in the CPSR.  */
             gen_set_nzcv(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm);
+        vfp_load_reg32(tmp, a->vm);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm + 1);
+        vfp_load_reg32(tmp, a->vm + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm);
+        vfp_store_reg32(tmp, a->vm);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm + 1);
+        vfp_store_reg32(tmp, a->vm + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2);
+        vfp_load_reg32(tmp, a->vm * 2);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2 + 1);
+        vfp_load_reg32(tmp, a->vm * 2 + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm * 2);
+        vfp_store_reg32(tmp, a->vm * 2);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm * 2 + 1);
+        vfp_store_reg32(tmp, a->vm * 2 + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st16(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st32(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-            neon_store_reg32(tmp, a->vd + i);
+            vfp_store_reg32(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg32(tmp, a->vd + i);
+            vfp_load_reg32(tmp, a->vd + i);
             gen_aa32_st32(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg32(fd, vd);
+            vfp_load_reg32(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vn = vfp_advance_sreg(vn, delta_d);
-        neon_load_reg32(f0, vn);
+        vfp_load_reg32(f0, vn);
         if (delta_m) {
             vm = vfp_advance_sreg(vm, delta_m);
-            neon_load_reg32(f1, vm);
+            vfp_load_reg32(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR_F16);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     if (reads_vd) {
-        neon_load_reg32(fd, vd);
+        vfp_load_reg32(fd, vd);
     }
     fn(fd, f0, f1, fpst);
-    neon_store_reg32(fd, vd);
+    vfp_store_reg32(fd, vd);
 
     tcg_temp_free_i32(f0);
     tcg_temp_free_i32(f1);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i32();
     fd = tcg_temp_new_i32();
 
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_sreg(vd, delta_d);
-                neon_store_reg32(fd, vd);
+                vfp_store_reg32(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vm = vfp_advance_sreg(vm, delta_m);
-        neon_load_reg32(f0, vm);
+        vfp_load_reg32(f0, vm);
     }
 
     tcg_temp_free_i32(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     }
 
     f0 = tcg_temp_new_i32();
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
     fn(f0, f0);
-    neon_store_reg32(f0, vd);
+    vfp_store_reg32(f0, vd);
     tcg_temp_free_i32(f0);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negh(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negh(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negs(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negs(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
     }
 
     fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
-    neon_store_reg32(fd, a->vd);
+    vfp_store_reg32(fd, a->vd);
     tcg_temp_free_i32(fd);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
     fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
 
     for (;;) {
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
     /* The T bit tells us if we want the low or high 16 bits of Vm */
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
     ahp_mode = get_ahp_flag();
     tmp = tcg_temp_new_i32();
 
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
     tcg_temp_free_i32(ahp_mode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rinth(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rints(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
     neon_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
     vm = tcg_temp_new_i64();
     neon_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     if (a->s) {
         /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f16 */
         gen_helper_vfp_uitoh(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f32 */
         gen_helper_vfp_uitos(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
     vd = tcg_temp_new_i32();
     neon_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_i32(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touih(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touis(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
             gen_helper_vfp_touid(vd, vm, fpst);
         }
     }
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
     /* Insert low half of Vm into high half of Vd */
     rm = tcg_temp_new_i32();
     rd = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
-    neon_load_reg32(rd, a->vd);
+    vfp_load_reg32(rm, a->vm);
+    vfp_load_reg32(rd, a->vd);
     tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
-    neon_store_reg32(rd, a->vd);
+    vfp_store_reg32(rd, a->vd);
     tcg_temp_free_i32(rm);
     tcg_temp_free_i32(rd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
 
     /* Set Vd to high half of Vm */
     rm = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
+    vfp_load_reg32(rm, a->vm);
     tcg_gen_shri_i32(rm, rm, 16);
-    neon_store_reg32(rm, a->vd);
+    vfp_store_reg32(rm, a->vd);
     tcg_temp_free_i32(rm);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 26 +++++++++
 target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
 2 files changed, 73 insertions(+), 47 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
     }
 }
 
+static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_Q:
+        tcg_gen_ld_i64(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
     long off = neon_element_offset(reg, ele, memop);
@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
     }
 }
 
+static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_64:
+        tcg_gen_st_i64(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
     for (pass = 0; pass < a->q + 1; pass++) {
         TCGv_i64 tmp = tcg_temp_new_i64();
 
-        neon_load_reg64(tmp, a->vm + pass);
+        read_neon_element64(tmp, a->vm, pass, MO_64);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg64(tmp, a->vd + pass);
+        write_neon_element64(tmp, a->vd, pass, MO_64);
         tcg_temp_free_i64(tmp);
     }
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
-    neon_load_reg64(rm1, a->vm);
-    neon_load_reg64(rm2, a->vm + 1);
+    read_neon_element64(rm1, a->vm, 0, MO_64);
+    read_neon_element64(rm2, a->vm, 1, MO_64);
 
     shiftfn(rm1, rm1, constimm);
     narrowfn(rd, cpu_env, rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd);
+    write_neon_element64(tmp, a->vd, 0, MO_64);
 
     widenfn(tmp, rm1);
     tcg_temp_free_i32(rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd + 1);
+    write_neon_element64(tmp, a->vd, 1, MO_64);
     tcg_temp_free_i64(tmp);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm_64 = tcg_temp_new_i64();
 
     if (src1_wide) {
-        neon_load_reg64(rn0_64, a->vn);
+        read_neon_element64(rn0_64, a->vn, 0, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      * avoid incorrect results if a narrow input overlaps with the result.
      */
     if (src1_wide) {
-        neon_load_reg64(rn1_64, a->vn + 1);
+        read_neon_element64(rn1_64, a->vn, 1, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm = tcg_temp_new_i32();
     read_neon_element32(rm, a->vm, 1, MO_32);
 
-    neon_store_reg64(rn0_64, a->vd);
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
-    neon_store_reg64(rn1_64, a->vd + 1);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rn_64, a->vn);
-    neon_load_reg64(rm_64, a->vm);
+    read_neon_element64(rn_64, a->vn, 0, MO_64);
+    read_neon_element64(rm_64, a->vm, 0, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
     narrowfn(rd0, rn_64);
 
-    neon_load_reg64(rn_64, a->vn + 1);
-    neon_load_reg64(rm_64, a->vm + 1);
+    read_neon_element64(rn_64, a->vn, 1, MO_64);
+    read_neon_element64(rm_64, a->vm, 1, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     /* Don't store results until after all loads: they might overlap */
     if (accfn) {
         tmp = tcg_temp_new_i64();
-        neon_load_reg64(tmp, a->vd);
+        read_neon_element64(tmp, a->vd, 0, MO_64);
         accfn(tmp, tmp, rd0);
-        neon_store_reg64(tmp, a->vd);
-        neon_load_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 0, MO_64);
+        read_neon_element64(tmp, a->vd, 1, MO_64);
         accfn(tmp, tmp, rd1);
-        neon_store_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 1, MO_64);
         tcg_temp_free_i64(tmp);
     } else {
-        neon_store_reg64(rd0, a->vd);
-        neon_store_reg64(rd1, a->vd + 1);
+        write_neon_element64(rd0, a->vd, 0, MO_64);
+        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
     tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
 
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
-        neon_load_reg64(t64, a->vd);
+        read_neon_element64(t64, a->vd, 0, MO_64);
         accfn(t64, t64, rn0_64);
-        neon_store_reg64(t64, a->vd);
-        neon_load_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 0, MO_64);
+        read_neon_element64(t64, a->vd, 1, MO_64);
         accfn(t64, t64, rn1_64);
-        neon_store_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 1, MO_64);
         tcg_temp_free_i64(t64);
     } else {
-        neon_store_reg64(rn0_64, a->vd);
-        neon_store_reg64(rn1_64, a->vd + 1);
+        write_neon_element64(rn0_64, a->vd, 0, MO_64);
+        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         right = tcg_temp_new_i64();
         dest = tcg_temp_new_i64();
 
-        neon_load_reg64(right, a->vn);
-        neon_load_reg64(left, a->vm);
+        read_neon_element64(right, a->vn, 0, MO_64);
+        read_neon_element64(left, a->vm, 0, MO_64);
         tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
-        neon_store_reg64(dest, a->vd);
+        write_neon_element64(dest, a->vd, 0, MO_64);
 
         tcg_temp_free_i64(left);
         tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         destright = tcg_temp_new_i64();
 
         if (a->imm < 8) {
-            neon_load_reg64(right, a->vn);
-            neon_load_reg64(middle, a->vn + 1);
+            read_neon_element64(right, a->vn, 0, MO_64);
+            read_neon_element64(middle, a->vn, 1, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
-            neon_load_reg64(left, a->vm);
+            read_neon_element64(left, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
         } else {
-            neon_load_reg64(right, a->vn + 1);
-            neon_load_reg64(middle, a->vm);
+            read_neon_element64(right, a->vn, 1, MO_64);
+            read_neon_element64(middle, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
-            neon_load_reg64(left, a->vm + 1);
+            read_neon_element64(left, a->vm, 1, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
         }
 
-        neon_store_reg64(destright, a->vd);
-        neon_store_reg64(destleft, a->vd + 1);
+        write_neon_element64(destright, a->vd, 0, MO_64);
+        write_neon_element64(destleft, a->vd, 1, MO_64);
 
         tcg_temp_free_i64(destright);
         tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
 
         if (accfn) {
             TCGv_i64 tmp64 = tcg_temp_new_i64();
-            neon_load_reg64(tmp64, a->vd + pass);
+            read_neon_element64(tmp64, a->vd, pass, MO_64);
             accfn(rd_64, tmp64, rd_64);
             tcg_temp_free_i64(tmp64);
         }
-        neon_store_reg64(rd_64, a->vd + pass);
+        write_neon_element64(rd_64, a->vd, pass, MO_64);
         tcg_temp_free_i64(rd_64);
     }
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rm, a->vm);
+    read_neon_element64(rm, a->vm, 0, MO_64);
     narrowfn(rd0, cpu_env, rm);
-    neon_load_reg64(rm, a->vm + 1);
+    read_neon_element64(rm, a->vm, 1, MO_64);
     narrowfn(rd1, cpu_env, rm);
     write_neon_element32(rd0, a->vd, 0, MO_32);
     write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd);
+    write_neon_element64(rd, a->vd, 0, MO_64);
     widenfn(rd, rm1);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd + 1);
+    write_neon_element64(rd, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rd);
     tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
     rm = tcg_temp_new_i64();
     rd = tcg_temp_new_i64();
     for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        neon_load_reg64(rm, a->vm + pass);
-        neon_load_reg64(rd, a->vd + pass);
-        neon_store_reg64(rm, a->vd + pass);
-        neon_store_reg64(rd, a->vm + pass);
+        read_neon_element64(rm, a->vm, pass, MO_64);
+        read_neon_element64(rd, a->vd, pass, MO_64);
+        write_neon_element64(rm, a->vd, pass, MO_64);
+        write_neon_element64(rd, a->vm, pass, MO_64);
     }
     tcg_temp_free_i64(rm);
     tcg_temp_free_i64(rd);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
double-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |  8 ++--
 target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
 2 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static inline void neon_load_reg64(TCGv_i64 var, int reg)
+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
-static inline void neon_store_reg64(TCGv_i64 var, int reg)
+static inline void vfp_store_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
 static inline void vfp_load_reg32(TCGv_i32 var, int reg)
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         tcg_gen_ext_i32_i64(nf, cpu_NF);
         tcg_gen_ext_i32_i64(vf, cpu_VF);
 
-        neon_load_reg64(frn, rn);
-        neon_load_reg64(frm, rm);
+        vfp_load_reg64(frn, rn);
+        vfp_load_reg64(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i64(tmp);
             break;
         }
-        neon_store_reg64(dest, rd);
+        vfp_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i64 tcg_res;
         tcg_op = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
-        neon_load_reg64(tcg_op, rm);
+        vfp_load_reg64(tcg_op, rm);
         gen_helper_rintd(tcg_res, tcg_op, fpst);
-        neon_store_reg64(tcg_res, rd);
+        vfp_store_reg64(tcg_res, rd);
         tcg_temp_free_i64(tcg_op);
         tcg_temp_free_i64(tcg_res);
     } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
-        neon_load_reg64(tcg_double, rm);
+        vfp_load_reg64(tcg_double, rm);
         if (is_signed) {
             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
     tmp = tcg_temp_new_i64();
     if (a->l) {
         gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-        neon_store_reg64(tmp, a->vd);
+        vfp_store_reg64(tmp, a->vd);
     } else {
-        neon_load_reg64(tmp, a->vd);
+        vfp_load_reg64(tmp, a->vd);
         gen_aa32_st64(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i64(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-            neon_store_reg64(tmp, a->vd + i);
+            vfp_store_reg64(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg64(tmp, a->vd + i);
+            vfp_load_reg64(tmp, a->vd + i);
             gen_aa32_st64(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     fd = tcg_temp_new_i64();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg64(f0, vn);
-    neon_load_reg64(f1, vm);
+    vfp_load_reg64(f0, vn);
+    vfp_load_reg64(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg64(fd, vd);
+            vfp_load_reg64(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vn = vfp_advance_dreg(vn, delta_d);
-        neon_load_reg64(f0, vn);
+        vfp_load_reg64(f0, vn);
         if (delta_m) {
             vm = vfp_advance_dreg(vm, delta_m);
-            neon_load_reg64(f1, vm);
+            vfp_load_reg64(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i64();
     fd = tcg_temp_new_i64();
 
-    neon_load_reg64(f0, vm);
+    vfp_load_reg64(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_dreg(vd, delta_d);
-                neon_store_reg64(fd, vd);
+                vfp_store_reg64(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vd = vfp_advance_dreg(vm, delta_m);
-        neon_load_reg64(f0, vm);
+        vfp_load_reg64(f0, vm);
     }
 
     tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i64();
 
-    neon_load_reg64(vn, a->vn);
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vn, a->vn);
+    vfp_load_reg64(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negd(vn, vn);
     }
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negd(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
 
     for (;;) {
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
     vd = tcg_temp_new_i64();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i64(vm, 0);
     } else {
-        neon_load_reg64(vm, a->vm);
+        vfp_load_reg64(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     vd = tcg_temp_new_i64();
     gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     tmp = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
     tcg_temp_free_i64(vm);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rintd(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd_exact(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
     vd = tcg_temp_new_i64();
     vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
 
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
         /* u32 -> f64 */
         gen_helper_vfp_uitod(vd, vm, fpst);
     }
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
 
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i64();
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i64(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

In both cases, we can sink the write-back and perform
the accumulate into the normal destination temps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.c.inc | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     if (accfn) {
         tmp = tcg_temp_new_i64();
         read_neon_element64(tmp, a->vd, 0, MO_64);
-        accfn(tmp, tmp, rd0);
-        write_neon_element64(tmp, a->vd, 0, MO_64);
+        accfn(rd0, tmp, rd0);
         read_neon_element64(tmp, a->vd, 1, MO_64);
-        accfn(tmp, tmp, rd1);
-        write_neon_element64(tmp, a->vd, 1, MO_64);
+        accfn(rd1, tmp, rd1);
         tcg_temp_free_i64(tmp);
-    } else {
-        write_neon_element64(rd0, a->vd, 0, MO_64);
-        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
+    write_neon_element64(rd0, a->vd, 0, MO_64);
+    write_neon_element64(rd1, a->vd, 1, MO_64);
     tcg_temp_free_i64(rd0);
     tcg_temp_free_i64(rd1);
 
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
         read_neon_element64(t64, a->vd, 0, MO_64);
-        accfn(t64, t64, rn0_64);
-        write_neon_element64(t64, a->vd, 0, MO_64);
+        accfn(rn0_64, t64, rn0_64);
         read_neon_element64(t64, a->vd, 1, MO_64);
-        accfn(t64, t64, rn1_64);
-        write_neon_element64(t64, a->vd, 1, MO_64);
+        accfn(rn1_64, t64, rn1_64);
         tcg_temp_free_i64(t64);
-    } else {
-        write_neon_element64(rn0_64, a->vd, 0, MO_64);
-        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
+
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
     return true;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can use proper widening loads to extend 32-bit inputs,
and skip the "widenfn" step.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  6 +++
 target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
 2 files changed, 43 insertions(+), 29 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
     long off = neon_element_offset(reg, ele, memop);
 
     switch (memop) {
+    case MO_SL:
+        tcg_gen_ld32s_i64(dest, cpu_env, off);
+        break;
+    case MO_UL:
+        tcg_gen_ld32u_i64(dest, cpu_env, off);
+        break;
     case MO_Q:
         tcg_gen_ld_i64(dest, cpu_env, off);
         break;
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
 static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
                            NeonGenWidenFn *widenfn,
                            NeonGenTwo64OpFn *opfn,
-                           bool src1_wide)
+                           int src1_mop, int src2_mop)
 {
     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
     TCGv_i64 rn0_64, rn1_64, rm_64;
-    TCGv_i32 rm;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
         return false;
     }
 
-    if (!widenfn || !opfn) {
+    if (!opfn) {
         /* size == 3 case, which is an entirely different insn group */
         return false;
     }
 
-    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rn1_64 = tcg_temp_new_i64();
     rm_64 = tcg_temp_new_i64();
 
-    if (src1_wide) {
-        read_neon_element64(rn0_64, a->vn, 0, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 0, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 0, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 0, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn0_64, rn0_64, rm_64);
 
     /*
      * Load second pass inputs before storing the first pass result, to
      * avoid incorrect results if a narrow input overlaps with the result.
      */
-    if (src1_wide) {
-        read_neon_element64(rn1_64, a->vn, 1, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 1, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 1, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 1, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
     write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
     write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     return true;
 }
 
-#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
     static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
     {                                                                   \
         static NeonGenWidenFn * const widenfn[] = {                     \
             gen_helper_neon_widen_##S##8,                               \
             gen_helper_neon_widen_##S##16,                              \
-            tcg_gen_##EXT##_i32_i64,                                    \
-            NULL,                                                       \
+            NULL, NULL,                                                 \
         };                                                              \
         static NeonGenTwo64OpFn * const addfn[] = {                     \
             gen_helper_neon_##OP##l_u16,                                \
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
             tcg_gen_##OP##_i64,                                         \
             NULL,                                                       \
         };                                                              \
-        return do_prewiden_3d(s, a, widenfn[a->size],                   \
-                              addfn[a->size], SRC1WIDE);                \
+        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
+        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
+                              SRC1WIDE ? MO_Q : narrow_mop,             \
+                              narrow_mop);                              \
     }
 
-DO_PREWIDEN(VADDL_S, s, ext, add, false)
-DO_PREWIDEN(VADDL_U, u, extu, add, false)
-DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
-DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
-DO_PREWIDEN(VADDW_S, s, ext, add, true)
-DO_PREWIDEN(VADDW_U, u, extu, add, true)
-DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
-DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
+DO_PREWIDEN(VADDL_U, u, add, false, 0)
+DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
+DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
+DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
+DO_PREWIDEN(VADDW_U, u, add, true, 0)
+DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
+DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
 
 static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
                          NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
-- 
2.20.1

In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
meant we were using the H4() address swizzler macro rather than the
H2() which is required for 2-byte data.  This had no effect on
little-endian hosts but meant we put the result data into the
destination Dreg in the wrong order on big-endian hosts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
         r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
                                                                         \
-        d[H4(0)] = r0;                                                  \
-        d[H4(1)] = r1;                                                  \
-        d[H4(2)] = r2;                                                  \
-        d[H4(3)] = r3;                                                  \
+        d[H2(0)] = r0;                                                  \
+        d[H2(1)] = r1;                                                  \
+        d[H2(2)] = r2;                                                  \
+        d[H2(3)] = r3;                                                  \
     }
 
 DO_NEON_PAIRWISE(neon_padd, add)
-- 
2.20.1

The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.

For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     int8_t *n = vn;
-    int8_t *m_indexed = (int8_t *)vm + index * 4;
+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     uint8_t *n = vn;
-    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
+    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

HCR should be applied when NS is set, not when it is cleared.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 
 /*
  * Non-IS variants of TLB operations are upgraded to
- * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
+ * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
  * force broadcast of these operations.
  */
 static bool tlb_force_broadcast(CPUARMState *env)
 {
-    return (env->cp15.hcr_el2 & HCR_FB) &&
-        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
+    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
 }
 
 static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
future HCR_EL2.TLOR when S-EL2 is enabled.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
 #endif
 
 /* Shared logic between LORID and the rest of the LOR* registers.
- * Secure state has already been delt with.
+ * Secure state exclusion has already been dealt with.
  */
-static CPAccessResult access_lor_ns(CPUARMState *env)
+static CPAccessResult access_lor_ns(CPUARMState *env,
+                                    const ARMCPRegInfo *ri, bool isread)
 {
     int el = arm_current_el(env);
 
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
     return CP_ACCESS_OK;
 }
 
-static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
-                                   bool isread)
-{
-    if (arm_is_secure_below_el3(env)) {
-        /* Access ok in secure mode.  */
-        return CP_ACCESS_OK;
-    }
-    return access_lor_ns(env);
-}
-
 static CPAccessResult access_lor_other(CPUARMState *env,
                                        const ARMCPRegInfo *ri, bool isread)
 {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
         /* Access denied in secure mode.  */
         return CP_ACCESS_TRAP;
     }
-    return access_lor_ns(env);
+    return access_lor_ns(env, ri, isread);
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
-      .access = PL1_R, .accessfn = access_lorid,
+      .access = PL1_R, .accessfn = access_lor_ns,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     REGINFO_SENTINEL
 };
-- 
2.20.1

If we're using the capstone disassembler, disassembly of a run of
instructions more than 32 bytes long disassembles the wrong data for
instructions beyond the 32 byte mark:

(qemu) xp /16x 0x100
0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
(qemu) xp /16i 0x100
0x00000100: 00000005 andeq r0, r0, r5
0x00000104: 54410001 strbpl r0, [r1], #-1
0x00000108: 00000001 andeq r0, r0, r1
0x0000010c: 00001000 andeq r1, r0, r0
0x00000110: 00000000 andeq r0, r0, r0
0x00000114: 00000004 andeq r0, r0, r4
0x00000118: 54410002 strbpl r0, [r1], #-2
0x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x00000120: 54410001 strbpl r0, [r1], #-1
0x00000124: 00000001 andeq r0, r0, r1
0x00000128: 00001000 andeq r1, r0, r0
0x0000012c: 00000000 andeq r0, r0, r0
0x00000130: 00000004 andeq r0, r0, r4
0x00000134: 54410002 strbpl r0, [r1], #-2
0x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x0000013c: 00000000 andeq r0, r0, r0

Here the disassembly of 0x120..0x13f is using the data that is in
0x104..0x123.

This is caused by passing the wrong value to the read_memory_func().
The intention is that at this point in the loop the 'cap_buf' buffer
already contains 'csize' bytes of data for the instruction at guest
addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
time through the loop 'csize' happens to be zero, so the initial read
of 32 bytes into cap_buf is correct and as long as the disassembly
never needs to read more data we return the correct information.

Use the correct guest address in the call to read_memory_func().

Cc: qemu-stable@nongnu.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
---
 disas/capstone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/disas/capstone.c b/disas/capstone.c
index XXXXXXX..XXXXXXX 100644
--- a/disas/capstone.c
+++ b/disas/capstone.c
@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
 
         /* Make certain that we can make progress.  */
         assert(tsize != 0);
-        info->read_memory_func(pc, cap_buf + csize, tsize, info);
+        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
         csize += tsize;
 
         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):

CID 1432363 (#1 of 1): Unintentional integer overflow:

overflow_before_widen:
    Potentially overflowing expression 1 << scale with type int
    (32 bits, signed) is evaluated using 32-bit arithmetic, and
    then used in a context that expects an expression of type
    hwaddr (64 bits, unsigned).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20201030144617.1535064-1-philmd@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/smmuv3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bitops.h"
 #include "hw/irq.h"
 #include "hw/sysbus.h"
 #include "migration/vmstate.h"
@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
         scale = CMD_SCALE(cmd);
         num = CMD_NUM(cmd);
         ttl = CMD_TTL(cmd);
-        num_pages = (num + 1) * (1 << (scale));
+        num_pages = (num + 1) * BIT_ULL(scale);
     }
 
     if (type == SMMU_CMD_TLBI_NH_VA) {
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
that SVE will not trap to EL3.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030151541.11976-1-remi@remlab.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/boot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
                     if (cpu_isar_feature(aa64_mte, cpu)) {
                         env->cp15.scr_el3 |= SCR_ATA;
                     }
+                    if (cpu_isar_feature(aa64_sve, cpu)) {
+                        env->cp15.cptr_el[3] |= CPTR_EZ;
+                    }
                     /* AArch64 kernels never boot in secure mode */
                     assert(!info->secure_boot);
                     /* This hook is only supported for AArch32 currently:
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to surface after checking that the omap_lcd is valid
and move surface_bits_per_pixel(surface) to after the surface assignment.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: AlexChen <alex.chen@huawei.com>
Message-id: 5F9CDB8A.9000001@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/omap_lcdc.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/omap_lcdc.c
+++ b/hw/display/omap_lcdc.c
@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
 static void omap_update_display(void *opaque)
 {
     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
-    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
+    DisplaySurface *surface;
     draw_line_func draw_line;
     int size, height, first, last;
     int width, linesize, step, bpp, frame_offset;
     hwaddr frame_base;
 
-    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
-        !surface_bits_per_pixel(surface)) {
+    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
+        return;
+    }
+
+    surface = qemu_console_surface(omap_lcd->con);
+    if (!surface_bits_per_pixel(surface)) {
         return;
     }
 
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In exynos4210_fimd_update(), the pointer s is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to global_width after checking that the s is valid.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 5F9F8D88.9030102@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/exynos4210_fimd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/exynos4210_fimd.c
+++ b/hw/display/exynos4210_fimd.c
@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
     bool blend = false;
     uint8_t *host_fb_addr;
     bool is_dirty = false;
-    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
+    int global_width;
 
     if (!s || !s->console || !s->enabled ||
         surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
         return;
     }
+
+    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
     exynos4210_update_resolution(s);
     surface = qemu_console_surface(s->console);
 
-- 
2.20.1

In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
This is incorrect when the security state being queried is not the
current one, because arm_current_el() uses the current security state
to determine which of the banked CONTROL.nPRIV bits to look at.
The effect was that if (for instance) Secure state was in privileged
mode but Non-Secure was not then we would return the wrong MMU index.

The only places where we are using this function in a way that could
trigger this bug are for the stack loads during a v8M function-return
and for the instruction fetch of a v8M SG insn.

Fix the bug by expanding out the M-profile version of the
arm_current_el() logic inline so it can use the passed in secstate
rather than env->v7m.secure.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
---
 target/arm/m_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
 /* Return the MMU index for a v7M CPU in the specified security state */
 ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
 {
-    bool priv = arm_current_el(env) != 0;
+    bool priv = arm_v7m_is_handler_mode(env) ||
+        !(env->v7m.control[secstate] & 1);
 
     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
 }
-- 
2.20.1

On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
libraries for gio-2.0 which don't actually work when compiling
statically. (Specifically, the returned library string includes
-lmount, but not -lblkid which -lmount depends upon, so linking
fails due to missing symbols.)

Check that the libraries work, and don't enable gio if they don't,
in the same way we do for gnutls.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
---
 configure | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index XXXXXXX..XXXXXXX 100755
--- a/configure
+++ b/configure
@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
 fi
 
 if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
-    gio=yes
     gio_cflags=$($pkg_config --cflags gio-2.0)
     gio_libs=$($pkg_config --libs gio-2.0)
     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
     if [ ! -x "$gdbus_codegen" ]; then
         gdbus_codegen=
     fi
+    # Check that the libraries actually work -- Ubuntu 18.04 ships
+    # with pkg-config --static --libs data for gio-2.0 that is missing
+    # -lblkid and will give a link error.
+    write_c_skeleton
+    if compile_prog "" "gio_libs" ; then
+        gio=yes
+    else
+        gio=no
+    fi
 else
     gio=no
 fi
-- 
2.20.1

In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
into the GICv3CPUState struct's maintenance_irq field.  This will
only work if the board happens to have already wired up the CPU
maintenance IRQ before the GIC was realized.  Unfortunately this is
not the case for the 'virt' board, and so the value that gets copied
is NULL (since a qemu_irq is really a pointer to an IRQState struct
under the hood).  The effect is that the CPU interface code never
actually raises the maintenance interrupt line.

Instead, since the GICv3CPUState has a pointer to the CPUState, make
the dereference at the point where we want to raise the interrupt, to
avoid an implicit requirement on board code to wire things up in a
particular order.

Reported-by: Jose Martins <josemartins90@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
Reviewed-by: Luc Michel <luc@lmichel.fr>
---
 include/hw/intc/arm_gicv3_common.h | 1 -
 hw/intc/arm_gicv3_cpuif.c          | 5 ++---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
     qemu_irq parent_fiq;
     qemu_irq parent_virq;
     qemu_irq parent_vfiq;
-    qemu_irq maintenance_irq;
 
     /* Redistributor */
     uint32_t level;                  /* Current IRQ level */
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
     int irqlevel = 0;
     int fiqlevel = 0;
     int maintlevel = 0;
+    ARMCPU *cpu = ARM_CPU(cs->cpu);
 
     idx = hppvi_index(cs);
     trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
 
     qemu_set_irq(cs->parent_vfiq, fiqlevel);
     qemu_set_irq(cs->parent_virq, irqlevel);
-    qemu_set_irq(cs->maintenance_irq, maintlevel);
+    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
 }
 
 static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
             && cpu->gic_num_lrs) {
             int j;
 
-            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
-
             cs->num_list_regs = cpu->gic_num_lrs;
             cs->vpribits = cpu->gic_vpribits;
             cs->vprebits = cpu->gic_vprebits;
-- 
2.20.1

The kerneldoc script currently emits Sphinx markup for a macro with
arguments that uses the c:function directive. This is correct for
Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
documentation of macros with arguments and c:function is not picky
about the syntax of what it is passed. However, in Sphinx 3 the
c:macro directive was enhanced to support macros with arguments,
and c:function was made more picky about what syntax it accepted.

When kerneldoc is told that it needs to produce output for Sphinx
3 or later, make it emit c:function only for functions and c:macro
for macros with arguments. We assume that anything with a return
type is a function and anything without is a macro.

This fixes the Sphinx error:

/home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
If declarator-id with parameters (e.g., 'void f(int arg)'):
  Invalid C declaration: Expected identifier in nested name. [error at 25]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    -------------------------^
If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
  Error in declarator or parameters
  Invalid C declaration: Expecting "(" in parameters. [error at 39]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    ---------------------------------------^

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
---
 scripts/kernel-doc | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index XXXXXXX..XXXXXXX 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
 	output_highlight_rst($args{'purpose'});
 	$start = "\n\n**Syntax**\n\n  ``";
     } else {
-	print ".. c:function:: ";
+        if ((split(/\./, $sphinx_version))[0] >= 3) {
+            # Sphinx 3 and later distinguish macros and functions and
+            # complain if you use c:function with something that's not
+            # syntactically valid as a function declaration.
+            # We assume that anything with a return type is a function
+            # and anything without is a macro.
+            if ($args{'functiontype'} ne "") {
+                print ".. c:function:: ";
+            } else {
+                print ".. c:macro:: ";
+            }
+        } else {
+            # Older Sphinx don't support documenting macros that take
+            # arguments with c:macro, and don't complain about the use
+            # of c:function for this.
+            print ".. c:function:: ";
+        }
     }
     if ($args{'functiontype'} ne "") {
 	$start .= $args{'functiontype'} . " " . $args{'function'} . " (";
-- 
2.20.1

Sphinx 3.2 is pickier than earlier versions about the option:: markup,
and complains about our usage in qemu-option-trace.rst:

../../docs/qemu-option-trace.rst.inc:4:Malformed option description
  '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
  "/opt args" or "+opt args"

In this file, we're really trying to document the different parts of
the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
have already introduced with an option:: markup.  So it's not right
to use option:: here anyway.  Switch to a different markup
(definition lists) which gives about the same formatted output.

(Unlike option::, this markup doesn't produce index entries; but
at the moment we don't do anything much with indexes anyway, and
in any case I think it doesn't make much sense to have individual
index entries for the sub-parts of the --trace option.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
---
 docs/qemu-option-trace.rst.inc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
index XXXXXXX..XXXXXXX 100644
--- a/docs/qemu-option-trace.rst.inc
+++ b/docs/qemu-option-trace.rst.inc
@@ -XXX,XX +XXX,XX @@
 
 Specify tracing options.
 
-.. option:: [enable=]PATTERN
+``[enable=]PATTERN``
 
   Immediately enable events matching *PATTERN*
   (either event name or a globbing pattern).  This option is only
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
 
   Use :option:`-trace help` to print a list of names of trace points.
 
-.. option:: events=FILE
+``events=FILE``
 
   Immediately enable events listed in *FILE*.
   The file must contain one event name (as listed in the ``trace-events-all``
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
   available if QEMU has been compiled with the ``simple``, ``log`` or
   ``ftrace`` tracing backend.
 
-.. option:: file=FILE
+``file=FILE``
 
   Log output traces to *FILE*.
   This option is only available if QEMU has been compiled with
-- 
2.20.1

The randomness tests in the NPCM7xx RNG test fail intermittently
but fairly frequently. On my machine running the test in a loop:
 while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done

will fail in less than a minute with an error like:
ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)

(Failures have been observed on all 4 of the randomness tests,
not just first_byte_runs.)

It's not clear why these tests are failing like this, but intermittent
failures make CI and merge testing awkward, so disable running them
unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
running the test suite, until we work out the cause.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
---
 tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/npcm7xx_rng-test.c
+++ b/tests/qtest/npcm7xx_rng-test.c
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
 
     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
-    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
-    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
-    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
-    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    /*
+     * These tests fail intermittently; only run them on explicit
+     * request until we figure out why.
+     */
+    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
+        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
+        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    }
 
     qtest_start("-machine npcm750-evb");
     ret = g_test_run();
-- 
2.20.1

The following changes since commit bf4460a8d9a86f6cfe05d7a7f470c48e3a93d8b2:

Merge tag 'pull-tcg-20230123' of https://gitlab.com/rth7680/qemu into staging (2023-02-03 09:30:45 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230203

for you to fetch changes up to bb18151d8bd9bedc497ee9d4e8d81b39a4e5bbf6:

target/arm: Enable FEAT_FGT on '-cpu max' (2023-02-03 12:59:24 +0000)

----------------------------------------------------------------
target-arm queue:
 * Fix physical address resolution for Stage2
 * pl011: refactoring, implement reset method
 * Support GICv3 with hvf acceleration
 * sbsa-ref: remove cortex-a76 from list of supported cpus
 * Correct syndrome for ATS12NSO* traps at Secure EL1
 * Fix priority of HSTR_EL2 traps vs UNDEFs
 * Implement FEAT_FGT for '-cpu max'

----------------------------------------------------------------
Alexander Graf (3):
      hvf: arm: Add support for GICv3
      hw/arm/virt: Consolidate GIC finalize logic
      hw/arm/virt: Make accels in GIC finalize logic explicit

Evgeny Iakovlev (4):
      hw/char/pl011: refactor FIFO depth handling code
      hw/char/pl011: add post_load hook for backwards-compatibility
      hw/char/pl011: implement a reset method
      hw/char/pl011: better handling of FIFO flags on LCR reset

Marcin Juszkiewicz (1):
      sbsa-ref: remove cortex-a76 from list of supported cpus

Peter Maydell (23):
      target/arm: Name AT_S1E1RP and AT_S1E1WP cpregs correctly
      target/arm: Correct syndrome for ATS12NSO* at Secure EL1
      target/arm: Remove CP_ACCESS_TRAP_UNCATEGORIZED_{EL2, EL3}
      target/arm: Move do_coproc_insn() syndrome calculation earlier
      target/arm: All UNDEF-at-EL0 traps take priority over HSTR_EL2 traps
      target/arm: Make HSTR_EL2 traps take priority over UNDEF-at-EL1
      target/arm: Disable HSTR_EL2 traps if EL2 is not enabled
      target/arm: Define the FEAT_FGT registers
      target/arm: Implement FGT trapping infrastructure
      target/arm: Mark up sysregs for HFGRTR bits 0..11
      target/arm: Mark up sysregs for HFGRTR bits 12..23
      target/arm: Mark up sysregs for HFGRTR bits 24..35
      target/arm: Mark up sysregs for HFGRTR bits 36..63
      target/arm: Mark up sysregs for HDFGRTR bits 0..11
      target/arm: Mark up sysregs for HDFGRTR bits 12..63
      target/arm: Mark up sysregs for HFGITR bits 0..11
      target/arm: Mark up sysregs for HFGITR bits 12..17
      target/arm: Mark up sysregs for HFGITR bits 18..47
      target/arm: Mark up sysregs for HFGITR bits 48..63
      target/arm: Implement the HFGITR_EL2.ERET trap
      target/arm: Implement the HFGITR_EL2.SVC_EL0 and SVC_EL1 traps
      target/arm: Implement MDCR_EL2.TDCC and MDCR_EL3.TDCC traps
      target/arm: Enable FEAT_FGT on '-cpu max'

Richard Henderson (2):
      hw/arm: Use TYPE_ARM_SMMUV3
      target/arm: Fix physical address resolution for Stage2

From: Richard Henderson <richard.henderson@linaro.org>

Use the macro instead of two explicit string literals.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20230124232059.4017615-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/sbsa-ref.c | 3 ++-
 hw/arm/virt.c     | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -XXX,XX +XXX,XX @@
 #include "exec/hwaddr.h"
 #include "kvm_arm.h"
 #include "hw/arm/boot.h"
+#include "hw/arm/smmuv3.h"
 #include "hw/block/flash.h"
 #include "hw/boards.h"
 #include "hw/ide/internal.h"
@@ -XXX,XX +XXX,XX @@ static void create_smmu(const SBSAMachineState *sms, PCIBus *bus)
     DeviceState *dev;
     int i;
 
-    dev = qdev_new("arm-smmuv3");
+    dev = qdev_new(TYPE_ARM_SMMUV3);
 
     object_property_set_link(OBJECT(dev), "primary-bus", OBJECT(bus),
                              &error_abort);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void create_smmu(const VirtMachineState *vms,
         return;
     }
 
-    dev = qdev_new("arm-smmuv3");
+    dev = qdev_new(TYPE_ARM_SMMUV3);
 
     object_property_set_link(OBJECT(dev), "primary-bus", OBJECT(bus),
                              &error_abort);
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Conversion to probe_access_full missed applying the page offset.

Cc: qemu-stable@nongnu.org
Reported-by: Sid Manning <sidneym@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230126233134.103193-1-richard.henderson@linaro.org
Fixes: f3639a64f602 ("target/arm: Use softmmu tlbs for page table walking")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
         if (unlikely(flags & TLB_INVALID_MASK)) {
             goto fail;
         }
-        ptw->out_phys = full->phys_addr;
+        ptw->out_phys = full->phys_addr | (addr & ~TARGET_PAGE_MASK);
         ptw->out_rw = full->prot & PAGE_WRITE;
         pte_attrs = full->pte_attrs;
         pte_secure = full->attrs.secure;
-- 
2.34.1

From: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>

PL011 can be in either of 2 modes depending guest config: FIFO and
single register. The last mode could be viewed as a 1-element-deep FIFO.

Current code open-codes a bunch of depth-dependent logic. Refactor FIFO
depth handling code to isolate calculating current FIFO depth.

One functional (albeit guest-invisible) side-effect of this change is
that previously we would always increment s->read_pos in UARTDR read
handler even if FIFO was disabled, now we are limiting read_pos to not
exceed FIFO depth (read_pos itself is reset to 0 if user disables FIFO).

Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230123162304.26254-2-eiakovlev@linux.microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/char/pl011.h |  5 ++++-
 hw/char/pl011.c         | 30 ++++++++++++++++++------------
 2 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/include/hw/char/pl011.h b/include/hw/char/pl011.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/char/pl011.h
+++ b/include/hw/char/pl011.h
@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(PL011State, PL011)
 /* This shares the same struct (and cast macro) as the base pl011 device */
 #define TYPE_PL011_LUMINARY "pl011_luminary"
 
+/* Depth of UART FIFO in bytes, when FIFO mode is enabled (else depth == 1) */
+#define PL011_FIFO_DEPTH 16
+
 struct PL011State {
     SysBusDevice parent_obj;
 
@@ -XXX,XX +XXX,XX @@ struct PL011State {
     uint32_t dmacr;
     uint32_t int_enabled;
     uint32_t int_level;
-    uint32_t read_fifo[16];
+    uint32_t read_fifo[PL011_FIFO_DEPTH];
     uint32_t ilpr;
     uint32_t ibrd;
     uint32_t fbrd;
diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -XXX,XX +XXX,XX @@ static void pl011_update(PL011State *s)
     }
 }
 
+static bool pl011_is_fifo_enabled(PL011State *s)
+{
+    return (s->lcr & 0x10) != 0;
+}
+
+static inline unsigned pl011_get_fifo_depth(PL011State *s)
+{
+    /* Note: FIFO depth is expected to be power-of-2 */
+    return pl011_is_fifo_enabled(s) ? PL011_FIFO_DEPTH : 1;
+}
+
 static uint64_t pl011_read(void *opaque, hwaddr offset,
                            unsigned size)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t pl011_read(void *opaque, hwaddr offset,
         c = s->read_fifo[s->read_pos];
         if (s->read_count > 0) {
             s->read_count--;
-            if (++s->read_pos == 16)
-                s->read_pos = 0;
+            s->read_pos = (s->read_pos + 1) & (pl011_get_fifo_depth(s) - 1);
         }
         if (s->read_count == 0) {
             s->flags |= PL011_FLAG_RXFE;
@@ -XXX,XX +XXX,XX @@ static int pl011_can_receive(void *opaque)
     PL011State *s = (PL011State *)opaque;
     int r;
 
-    if (s->lcr & 0x10) {
-        r = s->read_count < 16;
-    } else {
-        r = s->read_count < 1;
-    }
+    r = s->read_count < pl011_get_fifo_depth(s);
     trace_pl011_can_receive(s->lcr, s->read_count, r);
     return r;
 }
@@ -XXX,XX +XXX,XX @@ static void pl011_put_fifo(void *opaque, uint32_t value)
 {
     PL011State *s = (PL011State *)opaque;
     int slot;
+    unsigned pipe_depth;
 
-    slot = s->read_pos + s->read_count;
-    if (slot >= 16)
-        slot -= 16;
+    pipe_depth = pl011_get_fifo_depth(s);
+    slot = (s->read_pos + s->read_count) & (pipe_depth - 1);
     s->read_fifo[slot] = value;
     s->read_count++;
     s->flags &= ~PL011_FLAG_RXFE;
     trace_pl011_put_fifo(value, s->read_count);
-    if (!(s->lcr & 0x10) || s->read_count == 16) {
+    if (s->read_count == pipe_depth) {
         trace_pl011_put_fifo_full();
         s->flags |= PL011_FLAG_RXFF;
     }
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_pl011 = {
         VMSTATE_UINT32(dmacr, PL011State),
         VMSTATE_UINT32(int_enabled, PL011State),
         VMSTATE_UINT32(int_level, PL011State),
-        VMSTATE_UINT32_ARRAY(read_fifo, PL011State, 16),
+        VMSTATE_UINT32_ARRAY(read_fifo, PL011State, PL011_FIFO_DEPTH),
         VMSTATE_UINT32(ilpr, PL011State),
         VMSTATE_UINT32(ibrd, PL011State),
         VMSTATE_UINT32(fbrd, PL011State),
-- 
2.34.1

From: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>

Previous change slightly modified the way we handle data writes when
FIFO is disabled. Previously we kept incrementing read_pos and were
storing data at that position, although we only have a
single-register-deep FIFO now. Then we changed it to always store data
at pos 0.

If guest disables FIFO and the proceeds to read data, it will work out
fine, because we still read from current read_pos before setting it to
0.

However, to make code less fragile, introduce a post_load hook for
PL011State and move fixup read FIFO state when FIFO is disabled. Since
we are introducing a post_load hook, also do some sanity checking on
untrusted incoming input state.

Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
Message-id: 20230123162304.26254-3-eiakovlev@linux.microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/char/pl011.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_pl011_clock = {
     }
 };
 
+static int pl011_post_load(void *opaque, int version_id)
+{
+    PL011State* s = opaque;
+
+    /* Sanity-check input state */
+    if (s->read_pos >= ARRAY_SIZE(s->read_fifo) ||
+        s->read_count > ARRAY_SIZE(s->read_fifo)) {
+        return -1;
+    }
+
+    if (!pl011_is_fifo_enabled(s) && s->read_count > 0 && s->read_pos > 0) {
+        /*
+         * Older versions of PL011 didn't ensure that the single
+         * character in the FIFO in FIFO-disabled mode is in
+         * element 0 of the array; convert to follow the current
+         * code's assumptions.
+         */
+        s->read_fifo[0] = s->read_fifo[s->read_pos];
+        s->read_pos = 0;
+    }
+
+    return 0;
+}
+
 static const VMStateDescription vmstate_pl011 = {
     .name = "pl011",
     .version_id = 2,
     .minimum_version_id = 2,
+    .post_load = pl011_post_load,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(readbuff, PL011State),
         VMSTATE_UINT32(flags, PL011State),
-- 
2.34.1

From: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>

PL011 currently lacks a reset method. Implement it.

Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230123162304.26254-4-eiakovlev@linux.microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/char/pl011.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -XXX,XX +XXX,XX @@ static void pl011_init(Object *obj)
     s->clk = qdev_init_clock_in(DEVICE(obj), "clk", pl011_clock_update, s,
                                 ClockUpdate);
 
-    s->read_trigger = 1;
-    s->ifl = 0x12;
-    s->cr = 0x300;
-    s->flags = 0x90;
-
     s->id = pl011_id_arm;
 }
 
@@ -XXX,XX +XXX,XX @@ static void pl011_realize(DeviceState *dev, Error **errp)
                              pl011_event, NULL, s, NULL, true);
 }
 
+static void pl011_reset(DeviceState *dev)
+{
+    PL011State *s = PL011(dev);
+
+    s->lcr = 0;
+    s->rsr = 0;
+    s->dmacr = 0;
+    s->int_enabled = 0;
+    s->int_level = 0;
+    s->ilpr = 0;
+    s->ibrd = 0;
+    s->fbrd = 0;
+    s->read_pos = 0;
+    s->read_count = 0;
+    s->read_trigger = 1;
+    s->ifl = 0x12;
+    s->cr = 0x300;
+    s->flags = 0x90;
+}
+
 static void pl011_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
 
     dc->realize = pl011_realize;
+    dc->reset = pl011_reset;
     dc->vmsd = &vmstate_pl011;
     device_class_set_props(dc, pl011_properties);
 }
-- 
2.34.1

From: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>

Current FIFO handling code does not reset RXFE/RXFF flags when guest
resets FIFO by writing to UARTLCR register, although internal FIFO state
is reset to 0 read count. Actual guest-visible flag update will happen
only on next data read or write attempt. As a result of that any guest
that expects RXFE flag to be set (and RXFF to be cleared) after resetting
FIFO will never see that happen.

Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230123162304.26254-5-eiakovlev@linux.microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/char/pl011.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -XXX,XX +XXX,XX @@ static inline unsigned pl011_get_fifo_depth(PL011State *s)
     return pl011_is_fifo_enabled(s) ? PL011_FIFO_DEPTH : 1;
 }
 
+static inline void pl011_reset_fifo(PL011State *s)
+{
+    s->read_count = 0;
+    s->read_pos = 0;
+
+    /* Reset FIFO flags */
+    s->flags &= ~(PL011_FLAG_RXFF | PL011_FLAG_TXFF);
+    s->flags |= PL011_FLAG_RXFE | PL011_FLAG_TXFE;
+}
+
 static uint64_t pl011_read(void *opaque, hwaddr offset,
                            unsigned size)
 {
@@ -XXX,XX +XXX,XX @@ static void pl011_write(void *opaque, hwaddr offset,
     case 11: /* UARTLCR_H */
         /* Reset the FIFO state on FIFO enable or disable */
         if ((s->lcr ^ value) & 0x10) {
-            s->read_count = 0;
-            s->read_pos = 0;
+            pl011_reset_fifo(s);
         }
         if ((s->lcr ^ value) & 0x1) {
             int break_enable = value & 0x1;
@@ -XXX,XX +XXX,XX @@ static void pl011_reset(DeviceState *dev)
     s->ilpr = 0;
     s->ibrd = 0;
     s->fbrd = 0;
-    s->read_pos = 0;
-    s->read_count = 0;
     s->read_trigger = 1;
     s->ifl = 0x12;
     s->cr = 0x300;
-    s->flags = 0x90;
+    s->flags = 0;
+    pl011_reset_fifo(s);
 }
 
 static void pl011_class_init(ObjectClass *oc, void *data)
-- 
2.34.1

From: Alexander Graf <agraf@csgraf.de>

We currently only support GICv2 emulation. To also support GICv3, we will
need to pass a few system registers into their respective handler functions.

This patch adds support for HVF to call into the TCG callbacks for GICv3
system register handlers. This is safe because the GICv3 TCG code is generic
as long as we limit ourselves to EL0 and EL1 - which are the only modes
supported by HVF.

To make sure nobody trips over that, we also annotate callbacks that don't
work in HVF mode, such as EL state change hooks.

With GICv3 support in place, we can run with more than 8 vCPUs.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-id: 20230128224459.70676-1-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/intc/arm_gicv3_cpuif.c   |  16 +++-
 target/arm/hvf/hvf.c        | 151 ++++++++++++++++++++++++++++++++++++
 target/arm/hvf/trace-events |   2 +
 3 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/irq.h"
 #include "cpu.h"
 #include "target/arm/cpregs.h"
+#include "sysemu/tcg.h"
+#include "sysemu/qtest.h"
 
 /*
  * Special case return value from hppvi_index(); must be larger than
@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
          * which case we'd get the wrong value.
          * So instead we define the regs with no ri->opaque info, and
          * get back to the GICv3CPUState from the CPUARMState.
+         *
+         * These CP regs callbacks can be called from either TCG or HVF code.
          */
         define_arm_cp_regs(cpu, gicv3_cpuif_reginfo);
 
@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
                 define_arm_cp_regs(cpu, gicv3_cpuif_ich_apxr23_reginfo);
             }
         }
-        arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs);
+        if (tcg_enabled() || qtest_enabled()) {
+            /*
+             * We can only trap EL changes with TCG. However the GIC interrupt
+             * state only changes on EL changes involving EL2 or EL3, so for
+             * the non-TCG case this is OK, as EL2 and EL3 can't exist.
+             */
+            arm_register_el_change_hook(cpu, gicv3_cpuif_el_change_hook, cs);
+        } else {
+            assert(!arm_feature(&cpu->env, ARM_FEATURE_EL2));
+            assert(!arm_feature(&cpu->env, ARM_FEATURE_EL3));
+        }
     }
 }
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 #define SYSREG_PMCCNTR_EL0    SYSREG(3, 3, 9, 13, 0)
 #define SYSREG_PMCCFILTR_EL0  SYSREG(3, 3, 14, 15, 7)
 
+#define SYSREG_ICC_AP0R0_EL1     SYSREG(3, 0, 12, 8, 4)
+#define SYSREG_ICC_AP0R1_EL1     SYSREG(3, 0, 12, 8, 5)
+#define SYSREG_ICC_AP0R2_EL1     SYSREG(3, 0, 12, 8, 6)
+#define SYSREG_ICC_AP0R3_EL1     SYSREG(3, 0, 12, 8, 7)
+#define SYSREG_ICC_AP1R0_EL1     SYSREG(3, 0, 12, 9, 0)
+#define SYSREG_ICC_AP1R1_EL1     SYSREG(3, 0, 12, 9, 1)
+#define SYSREG_ICC_AP1R2_EL1     SYSREG(3, 0, 12, 9, 2)
+#define SYSREG_ICC_AP1R3_EL1     SYSREG(3, 0, 12, 9, 3)
+#define SYSREG_ICC_ASGI1R_EL1    SYSREG(3, 0, 12, 11, 6)
+#define SYSREG_ICC_BPR0_EL1      SYSREG(3, 0, 12, 8, 3)
+#define SYSREG_ICC_BPR1_EL1      SYSREG(3, 0, 12, 12, 3)
+#define SYSREG_ICC_CTLR_EL1      SYSREG(3, 0, 12, 12, 4)
+#define SYSREG_ICC_DIR_EL1       SYSREG(3, 0, 12, 11, 1)
+#define SYSREG_ICC_EOIR0_EL1     SYSREG(3, 0, 12, 8, 1)
+#define SYSREG_ICC_EOIR1_EL1     SYSREG(3, 0, 12, 12, 1)
+#define SYSREG_ICC_HPPIR0_EL1    SYSREG(3, 0, 12, 8, 2)
+#define SYSREG_ICC_HPPIR1_EL1    SYSREG(3, 0, 12, 12, 2)
+#define SYSREG_ICC_IAR0_EL1      SYSREG(3, 0, 12, 8, 0)
+#define SYSREG_ICC_IAR1_EL1      SYSREG(3, 0, 12, 12, 0)
+#define SYSREG_ICC_IGRPEN0_EL1   SYSREG(3, 0, 12, 12, 6)
+#define SYSREG_ICC_IGRPEN1_EL1   SYSREG(3, 0, 12, 12, 7)
+#define SYSREG_ICC_PMR_EL1       SYSREG(3, 0, 4, 6, 0)
+#define SYSREG_ICC_RPR_EL1       SYSREG(3, 0, 12, 11, 3)
+#define SYSREG_ICC_SGI0R_EL1     SYSREG(3, 0, 12, 11, 7)
+#define SYSREG_ICC_SGI1R_EL1     SYSREG(3, 0, 12, 11, 5)
+#define SYSREG_ICC_SRE_EL1       SYSREG(3, 0, 12, 12, 5)
+
 #define WFX_IS_WFE (1 << 0)
 
 #define TMR_CTL_ENABLE  (1 << 0)
@@ -XXX,XX +XXX,XX @@ static bool is_id_sysreg(uint32_t reg)
            SYSREG_CRM(reg) < 8;
 }
 
+static uint32_t hvf_reg2cp_reg(uint32_t reg)
+{
+    return ENCODE_AA64_CP_REG(CP_REG_ARM64_SYSREG_CP,
+                              (reg >> SYSREG_CRN_SHIFT) & SYSREG_CRN_MASK,
+                              (reg >> SYSREG_CRM_SHIFT) & SYSREG_CRM_MASK,
+                              (reg >> SYSREG_OP0_SHIFT) & SYSREG_OP0_MASK,
+                              (reg >> SYSREG_OP1_SHIFT) & SYSREG_OP1_MASK,
+                              (reg >> SYSREG_OP2_SHIFT) & SYSREG_OP2_MASK);
+}
+
+static bool hvf_sysreg_read_cp(CPUState *cpu, uint32_t reg, uint64_t *val)
+{
+    ARMCPU *arm_cpu = ARM_CPU(cpu);
+    CPUARMState *env = &arm_cpu->env;
+    const ARMCPRegInfo *ri;
+
+    ri = get_arm_cp_reginfo(arm_cpu->cp_regs, hvf_reg2cp_reg(reg));
+    if (ri) {
+        if (ri->accessfn) {
+            if (ri->accessfn(env, ri, true) != CP_ACCESS_OK) {
+                return false;
+            }
+        }
+        if (ri->type & ARM_CP_CONST) {
+            *val = ri->resetvalue;
+        } else if (ri->readfn) {
+            *val = ri->readfn(env, ri);
+        } else {
+            *val = CPREG_FIELD64(env, ri);
+        }
+        trace_hvf_vgic_read(ri->name, *val);
+        return true;
+    }
+
+    return false;
+}
+
 static int hvf_sysreg_read(CPUState *cpu, uint32_t reg, uint32_t rt)
 {
     ARMCPU *arm_cpu = ARM_CPU(cpu);
@@ -XXX,XX +XXX,XX @@ static int hvf_sysreg_read(CPUState *cpu, uint32_t reg, uint32_t rt)
     case SYSREG_OSDLR_EL1:
         /* Dummy register */
         break;
+    case SYSREG_ICC_AP0R0_EL1:
+    case SYSREG_ICC_AP0R1_EL1:
+    case SYSREG_ICC_AP0R2_EL1:
+    case SYSREG_ICC_AP0R3_EL1:
+    case SYSREG_ICC_AP1R0_EL1:
+    case SYSREG_ICC_AP1R1_EL1:
+    case SYSREG_ICC_AP1R2_EL1:
+    case SYSREG_ICC_AP1R3_EL1:
+    case SYSREG_ICC_ASGI1R_EL1:
+    case SYSREG_ICC_BPR0_EL1:
+    case SYSREG_ICC_BPR1_EL1:
+    case SYSREG_ICC_DIR_EL1:
+    case SYSREG_ICC_EOIR0_EL1:
+    case SYSREG_ICC_EOIR1_EL1:
+    case SYSREG_ICC_HPPIR0_EL1:
+    case SYSREG_ICC_HPPIR1_EL1:
+    case SYSREG_ICC_IAR0_EL1:
+    case SYSREG_ICC_IAR1_EL1:
+    case SYSREG_ICC_IGRPEN0_EL1:
+    case SYSREG_ICC_IGRPEN1_EL1:
+    case SYSREG_ICC_PMR_EL1:
+    case SYSREG_ICC_SGI0R_EL1:
+    case SYSREG_ICC_SGI1R_EL1:
+    case SYSREG_ICC_SRE_EL1:
+    case SYSREG_ICC_CTLR_EL1:
+        /* Call the TCG sysreg handler. This is only safe for GICv3 regs. */
+        if (!hvf_sysreg_read_cp(cpu, reg, &val)) {
+            hvf_raise_exception(cpu, EXCP_UDEF, syn_uncategorized());
+        }
+        break;
     default:
         if (is_id_sysreg(reg)) {
             /* ID system registers read as RES0 */
@@ -XXX,XX +XXX,XX @@ static void pmswinc_write(CPUARMState *env, uint64_t value)
     }
 }
 
+static bool hvf_sysreg_write_cp(CPUState *cpu, uint32_t reg, uint64_t val)
+{
+    ARMCPU *arm_cpu = ARM_CPU(cpu);
+    CPUARMState *env = &arm_cpu->env;
+    const ARMCPRegInfo *ri;
+
+    ri = get_arm_cp_reginfo(arm_cpu->cp_regs, hvf_reg2cp_reg(reg));
+
+    if (ri) {
+        if (ri->accessfn) {
+            if (ri->accessfn(env, ri, false) != CP_ACCESS_OK) {
+                return false;
+            }
+        }
+        if (ri->writefn) {
+            ri->writefn(env, ri, val);
+        } else {
+            CPREG_FIELD64(env, ri) = val;
+        }
+
+        trace_hvf_vgic_write(ri->name, val);
+        return true;
+    }
+
+    return false;
+}
+
 static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val)
 {
     ARMCPU *arm_cpu = ARM_CPU(cpu);
@@ -XXX,XX +XXX,XX @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val)
     case SYSREG_OSDLR_EL1:
         /* Dummy register */
         break;
+    case SYSREG_ICC_AP0R0_EL1:
+    case SYSREG_ICC_AP0R1_EL1:
+    case SYSREG_ICC_AP0R2_EL1:
+    case SYSREG_ICC_AP0R3_EL1:
+    case SYSREG_ICC_AP1R0_EL1:
+    case SYSREG_ICC_AP1R1_EL1:
+    case SYSREG_ICC_AP1R2_EL1:
+    case SYSREG_ICC_AP1R3_EL1:
+    case SYSREG_ICC_ASGI1R_EL1:
+    case SYSREG_ICC_BPR0_EL1:
+    case SYSREG_ICC_BPR1_EL1:
+    case SYSREG_ICC_CTLR_EL1:
+    case SYSREG_ICC_DIR_EL1:
+    case SYSREG_ICC_EOIR0_EL1:
+    case SYSREG_ICC_EOIR1_EL1:
+    case SYSREG_ICC_HPPIR0_EL1:
+    case SYSREG_ICC_HPPIR1_EL1:
+    case SYSREG_ICC_IAR0_EL1:
+    case SYSREG_ICC_IAR1_EL1:
+    case SYSREG_ICC_IGRPEN0_EL1:
+    case SYSREG_ICC_IGRPEN1_EL1:
+    case SYSREG_ICC_PMR_EL1:
+    case SYSREG_ICC_SGI0R_EL1:
+    case SYSREG_ICC_SGI1R_EL1:
+    case SYSREG_ICC_SRE_EL1:
+        /* Call the TCG sysreg handler. This is only safe for GICv3 regs. */
+        if (!hvf_sysreg_write_cp(cpu, reg, val)) {
+            hvf_raise_exception(cpu, EXCP_UDEF, syn_uncategorized());
+        }
+        break;
     default:
         cpu_synchronize_state(cpu);
         trace_hvf_unhandled_sysreg_write(env->pc, reg,
diff --git a/target/arm/hvf/trace-events b/target/arm/hvf/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/hvf/trace-events
+++ b/target/arm/hvf/trace-events
@@ -XXX,XX +XXX,XX @@ hvf_unknown_hvc(uint64_t x0) "unknown HVC! 0x%016"PRIx64
 hvf_unknown_smc(uint64_t x0) "unknown SMC! 0x%016"PRIx64
 hvf_exit(uint64_t syndrome, uint32_t ec, uint64_t pc) "exit: 0x%"PRIx64" [ec=0x%x pc=0x%"PRIx64"]"
 hvf_psci_call(uint64_t x0, uint64_t x1, uint64_t x2, uint64_t x3, uint32_t cpuid) "PSCI Call x0=0x%016"PRIx64" x1=0x%016"PRIx64" x2=0x%016"PRIx64" x3=0x%016"PRIx64" cpu=0x%x"
+hvf_vgic_write(const char *name, uint64_t val) "vgic write to %s [val=0x%016"PRIx64"]"
+hvf_vgic_read(const char *name, uint64_t val) "vgic read from %s [val=0x%016"PRIx64"]"
-- 
2.34.1

From: Alexander Graf <agraf@csgraf.de>

Up to now, the finalize_gic_version() code open coded what is essentially
a support bitmap match between host/emulation environment and desired
target GIC type.

This open coding leads to undesirable side effects. For example, a VM with
KVM and -smp 10 will automatically choose GICv3 while the same command
line with TCG will stay on GICv2 and fail the launch.

This patch combines the TCG and KVM matching code paths by making
everything a 2 pass process. First, we determine which GIC versions the
current environment is able to support, then we go through a single
state machine to determine which target GIC mode that means for us.

After this patch, the only user noticable changes should be consolidated
error messages as well as TCG -M virt supporting -smp > 8 automatically.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Message-id: 20221223090107.98888-2-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/virt.h |  15 ++--
 hw/arm/virt.c         | 198 ++++++++++++++++++++++--------------------
 2 files changed, 112 insertions(+), 101 deletions(-)

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -XXX,XX +XXX,XX @@ typedef enum VirtMSIControllerType {
 } VirtMSIControllerType;
 
 typedef enum VirtGICType {
-    VIRT_GIC_VERSION_MAX,
-    VIRT_GIC_VERSION_HOST,
-    VIRT_GIC_VERSION_2,
-    VIRT_GIC_VERSION_3,
-    VIRT_GIC_VERSION_4,
+    VIRT_GIC_VERSION_MAX = 0,
+    VIRT_GIC_VERSION_HOST = 1,
+    /* The concrete GIC values have to match the GIC version number */
+    VIRT_GIC_VERSION_2 = 2,
+    VIRT_GIC_VERSION_3 = 3,
+    VIRT_GIC_VERSION_4 = 4,
     VIRT_GIC_VERSION_NOSEL,
 } VirtGICType;
 
+#define VIRT_GIC_VERSION_2_MASK BIT(VIRT_GIC_VERSION_2)
+#define VIRT_GIC_VERSION_3_MASK BIT(VIRT_GIC_VERSION_3)
+#define VIRT_GIC_VERSION_4_MASK BIT(VIRT_GIC_VERSION_4)
+
 struct VirtMachineClass {
     MachineClass parent;
     bool disallow_affinity_adjustment;
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
     }
 }
 
+static VirtGICType finalize_gic_version_do(const char *accel_name,
+                                           VirtGICType gic_version,
+                                           int gics_supported,
+                                           unsigned int max_cpus)
+{
+    /* Convert host/max/nosel to GIC version number */
+    switch (gic_version) {
+    case VIRT_GIC_VERSION_HOST:
+        if (!kvm_enabled()) {
+            error_report("gic-version=host requires KVM");
+            exit(1);
+        }
+
+        /* For KVM, gic-version=host means gic-version=max */
+        return finalize_gic_version_do(accel_name, VIRT_GIC_VERSION_MAX,
+                                       gics_supported, max_cpus);
+    case VIRT_GIC_VERSION_MAX:
+        if (gics_supported & VIRT_GIC_VERSION_4_MASK) {
+            gic_version = VIRT_GIC_VERSION_4;
+        } else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
+            gic_version = VIRT_GIC_VERSION_3;
+        } else {
+            gic_version = VIRT_GIC_VERSION_2;
+        }
+        break;
+    case VIRT_GIC_VERSION_NOSEL:
+        if ((gics_supported & VIRT_GIC_VERSION_2_MASK) &&
+            max_cpus <= GIC_NCPU) {
+            gic_version = VIRT_GIC_VERSION_2;
+        } else if (gics_supported & VIRT_GIC_VERSION_3_MASK) {
+            /*
+             * in case the host does not support v2 emulation or
+             * the end-user requested more than 8 VCPUs we now default
+             * to v3. In any case defaulting to v2 would be broken.
+             */
+            gic_version = VIRT_GIC_VERSION_3;
+        } else if (max_cpus > GIC_NCPU) {
+            error_report("%s only supports GICv2 emulation but more than 8 "
+                         "vcpus are requested", accel_name);
+            exit(1);
+        }
+        break;
+    case VIRT_GIC_VERSION_2:
+    case VIRT_GIC_VERSION_3:
+    case VIRT_GIC_VERSION_4:
+        break;
+    }
+
+    /* Check chosen version is effectively supported */
+    switch (gic_version) {
+    case VIRT_GIC_VERSION_2:
+        if (!(gics_supported & VIRT_GIC_VERSION_2_MASK)) {
+            error_report("%s does not support GICv2 emulation", accel_name);
+            exit(1);
+        }
+        break;
+    case VIRT_GIC_VERSION_3:
+        if (!(gics_supported & VIRT_GIC_VERSION_3_MASK)) {
+            error_report("%s does not support GICv3 emulation", accel_name);
+            exit(1);
+        }
+        break;
+    case VIRT_GIC_VERSION_4:
+        if (!(gics_supported & VIRT_GIC_VERSION_4_MASK)) {
+            error_report("%s does not support GICv4 emulation, is virtualization=on?",
+                         accel_name);
+            exit(1);
+        }
+        break;
+    default:
+        error_report("logic error in finalize_gic_version");
+        exit(1);
+        break;
+    }
+
+    return gic_version;
+}
+
 /*
  * finalize_gic_version - Determines the final gic_version
  * according to the gic-version property
@@ -XXX,XX +XXX,XX @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
  */
 static void finalize_gic_version(VirtMachineState *vms)
 {
+    const char *accel_name = current_accel_name();
     unsigned int max_cpus = MACHINE(vms)->smp.max_cpus;
+    int gics_supported = 0;
 
-    if (kvm_enabled()) {
-        int probe_bitmap;
+    /* Determine which GIC versions the current environment supports */
+    if (kvm_enabled() && kvm_irqchip_in_kernel()) {
+        int probe_bitmap = kvm_arm_vgic_probe();
 
-        if (!kvm_irqchip_in_kernel()) {
-            switch (vms->gic_version) {
-            case VIRT_GIC_VERSION_HOST:
-                warn_report(
-                    "gic-version=host not relevant with kernel-irqchip=off "
-                     "as only userspace GICv2 is supported. Using v2 ...");
-                return;
-            case VIRT_GIC_VERSION_MAX:
-            case VIRT_GIC_VERSION_NOSEL:
-                vms->gic_version = VIRT_GIC_VERSION_2;
-                return;
-            case VIRT_GIC_VERSION_2:
-                return;
-            case VIRT_GIC_VERSION_3:
-                error_report(
-                    "gic-version=3 is not supported with kernel-irqchip=off");
-                exit(1);
-            case VIRT_GIC_VERSION_4:
-                error_report(
-                    "gic-version=4 is not supported with kernel-irqchip=off");
-                exit(1);
-            }
-        }
-
-        probe_bitmap = kvm_arm_vgic_probe();
         if (!probe_bitmap) {
             error_report("Unable to determine GIC version supported by host");
             exit(1);
         }
 
-        switch (vms->gic_version) {
-        case VIRT_GIC_VERSION_HOST:
-        case VIRT_GIC_VERSION_MAX:
-            if (probe_bitmap & KVM_ARM_VGIC_V3) {
-                vms->gic_version = VIRT_GIC_VERSION_3;
-            } else {
-                vms->gic_version = VIRT_GIC_VERSION_2;
-            }
-            return;
-        case VIRT_GIC_VERSION_NOSEL:
-            if ((probe_bitmap & KVM_ARM_VGIC_V2) && max_cpus <= GIC_NCPU) {
-                vms->gic_version = VIRT_GIC_VERSION_2;
-            } else if (probe_bitmap & KVM_ARM_VGIC_V3) {
-                /*
-                 * in case the host does not support v2 in-kernel emulation or
-                 * the end-user requested more than 8 VCPUs we now default
-                 * to v3. In any case defaulting to v2 would be broken.
-                 */
-                vms->gic_version = VIRT_GIC_VERSION_3;
-            } else if (max_cpus > GIC_NCPU) {
-                error_report("host only supports in-kernel GICv2 emulation "
-                             "but more than 8 vcpus are requested");
-                exit(1);
-            }
-            break;
-        case VIRT_GIC_VERSION_2:
-        case VIRT_GIC_VERSION_3:
-            break;
-        case VIRT_GIC_VERSION_4:
-            error_report("gic-version=4 is not supported with KVM");
-            exit(1);
+        if (probe_bitmap & KVM_ARM_VGIC_V2) {
+            gics_supported |= VIRT_GIC_VERSION_2_MASK;
         }
-
-        /* Check chosen version is effectively supported by the host */
-        if (vms->gic_version == VIRT_GIC_VERSION_2 &&
-            !(probe_bitmap & KVM_ARM_VGIC_V2)) {
-            error_report("host does not support in-kernel GICv2 emulation");
-            exit(1);
-        } else if (vms->gic_version == VIRT_GIC_VERSION_3 &&
-                   !(probe_bitmap & KVM_ARM_VGIC_V3)) {
-            error_report("host does not support in-kernel GICv3 emulation");
-            exit(1);
+        if (probe_bitmap & KVM_ARM_VGIC_V3) {
+            gics_supported |= VIRT_GIC_VERSION_3_MASK;
         }
-        return;
-    }
-
-    /* TCG mode */
-    switch (vms->gic_version) {
-    case VIRT_GIC_VERSION_NOSEL:
-        vms->gic_version = VIRT_GIC_VERSION_2;
-        break;
-    case VIRT_GIC_VERSION_MAX:
+    } else if (kvm_enabled() && !kvm_irqchip_in_kernel()) {
+        /* KVM w/o kernel irqchip can only deal with GICv2 */
+        gics_supported |= VIRT_GIC_VERSION_2_MASK;
+        accel_name = "KVM with kernel-irqchip=off";
+    } else {
+        gics_supported |= VIRT_GIC_VERSION_2_MASK;
         if (module_object_class_by_name("arm-gicv3")) {
-            /* CONFIG_ARM_GICV3_TCG was set */
+            gics_supported |= VIRT_GIC_VERSION_3_MASK;
             if (vms->virt) {
                 /* GICv4 only makes sense if CPU has EL2 */
-                vms->gic_version = VIRT_GIC_VERSION_4;
-            } else {
-                vms->gic_version = VIRT_GIC_VERSION_3;
+                gics_supported |= VIRT_GIC_VERSION_4_MASK;
             }
-        } else {
-            vms->gic_version = VIRT_GIC_VERSION_2;
         }
-        break;
-    case VIRT_GIC_VERSION_HOST:
-        error_report("gic-version=host requires KVM");
-        exit(1);
-    case VIRT_GIC_VERSION_4:
-        if (!vms->virt) {
-            error_report("gic-version=4 requires virtualization enabled");
-            exit(1);
-        }
-        break;
-    case VIRT_GIC_VERSION_2:
-    case VIRT_GIC_VERSION_3:
-        break;
     }
+
+    /*
+     * Then convert helpers like host/max to concrete GIC versions and ensure
+     * the desired version is supported
+     */
+    vms->gic_version = finalize_gic_version_do(accel_name, vms->gic_version,
+                                               gics_supported, max_cpus);
 }
 
 /*
-- 
2.34.1

From: Alexander Graf <agraf@csgraf.de>

Let's explicitly list out all accelerators that we support when trying to
determine the supported set of GIC versions. KVM was already separate, so
the only missing one is HVF which simply reuses all of TCG's emulation
code and thus has the same compatibility matrix.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221223090107.98888-3-agraf@csgraf.de
[PMM: Added qtest to the list of accelerators]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/numa.h"
 #include "sysemu/runstate.h"
 #include "sysemu/tpm.h"
+#include "sysemu/tcg.h"
 #include "sysemu/kvm.h"
 #include "sysemu/hvf.h"
+#include "sysemu/qtest.h"
 #include "hw/loader.h"
 #include "qapi/error.h"
 #include "qemu/bitops.h"
@@ -XXX,XX +XXX,XX @@ static void finalize_gic_version(VirtMachineState *vms)
         /* KVM w/o kernel irqchip can only deal with GICv2 */
         gics_supported |= VIRT_GIC_VERSION_2_MASK;
         accel_name = "KVM with kernel-irqchip=off";
-    } else {
+    } else if (tcg_enabled() || hvf_enabled() || qtest_enabled())  {
         gics_supported |= VIRT_GIC_VERSION_2_MASK;
         if (module_object_class_by_name("arm-gicv3")) {
             gics_supported |= VIRT_GIC_VERSION_3_MASK;
@@ -XXX,XX +XXX,XX @@ static void finalize_gic_version(VirtMachineState *vms)
                 gics_supported |= VIRT_GIC_VERSION_4_MASK;
             }
         }
+    } else {
+        error_report("Unsupported accelerator, can not determine GIC support");
+        exit(1);
     }
 
     /*
-- 
2.34.1

The encodings 0,0,C7,C9,0 and 0,0,C7,C9,1 are AT SP1E1RP and AT
S1E1WP, but our ARMCPRegInfo definitions for them incorrectly name
them AT S1E1R and AT S1E1W (which are entirely different
instructions).  Fix the names.

(This has no guest-visible effect as the names are for debug purposes
only.)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vhe_reginfo[] = {
 
 #ifndef CONFIG_USER_ONLY
 static const ARMCPRegInfo ats1e1_reginfo[] = {
-    { .name = "AT_S1E1R", .state = ARM_CP_STATE_AA64,
+    { .name = "AT_S1E1RP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 0,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
       .writefn = ats_write64 },
-    { .name = "AT_S1E1W", .state = ARM_CP_STATE_AA64,
+    { .name = "AT_S1E1WP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 1,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
       .writefn = ats_write64 },
-- 
2.34.1

The AArch32 ATS12NSO* address translation operations are supposed to
trap to either EL2 or EL3 if they're executed at Secure EL1 (which
can only happen if EL3 is AArch64).  We implement this, but we got
the syndrome value wrong: like other traps to EL2 or EL3 on an
AArch32 cpreg access, they should report the 0x3 syndrome, not the
0x0 'uncategorized' syndrome.  This is clear in the access pseudocode
for these instructions.

Fix the syndrome value for these operations by correcting the
returned value from the ats_access() function.

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult ats_access(CPUARMState *env, const ARMCPRegInfo *ri,
         if (arm_current_el(env) == 1) {
             if (arm_is_secure_below_el3(env)) {
                 if (env->cp15.scr_el3 & SCR_EEL2) {
-                    return CP_ACCESS_TRAP_UNCATEGORIZED_EL2;
+                    return CP_ACCESS_TRAP_EL2;
                 }
-                return CP_ACCESS_TRAP_UNCATEGORIZED_EL3;
+                return CP_ACCESS_TRAP_EL3;
             }
             return CP_ACCESS_TRAP_UNCATEGORIZED;
         }
-- 
2.34.1

We added the CPAccessResult values CP_ACCESS_TRAP_UNCATEGORIZED_EL2
and CP_ACCESS_TRAP_UNCATEGORIZED_EL3 purely in order to use them in
the ats_access() function, but doing so was incorrect (a bug fixed in
a previous commit).  There aren't any cases where we want an access
function to be able to request a trap to EL2 or EL3 with a zero
syndrome value, so remove these enum values.

As well as cleaning up dead code, the motivation here is that
we'd like to implement fine-grained-trap handling in
helper_access_check_cp_reg(). Although the fine-grained traps
to EL2 are always lower priority than trap-to-same-EL and
higher priority than trap-to-EL3, they are in the middle of
various other kinds of trap-to-EL2. Knowing that a trap-to-EL2
must always for us have the same syndrome (ie that an access
function will return CP_ACCESS_TRAP_EL2 and there is no other
kind of trap-to-EL2 enum value) means we don't have to try
to choose which of the two syndrome values to report if the
access would trap to EL2 both for the fine-grained-trap and
because the access function requires it.

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum CPAccessResult {
      * Access fails and results in an exception syndrome 0x0 ("uncategorized").
      * Note that this is not a catch-all case -- the set of cases which may
      * result in this failure is specifically defined by the architecture.
+     * This trap is always to the usual target EL, never directly to a
+     * specified target EL.
      */
     CP_ACCESS_TRAP_UNCATEGORIZED = (2 << 2),
-    CP_ACCESS_TRAP_UNCATEGORIZED_EL2 = CP_ACCESS_TRAP_UNCATEGORIZED | 2,
-    CP_ACCESS_TRAP_UNCATEGORIZED_EL3 = CP_ACCESS_TRAP_UNCATEGORIZED | 3,
 } CPAccessResult;
 
 typedef struct ARMCPRegInfo ARMCPRegInfo;
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, uint32_t key,
     case CP_ACCESS_TRAP:
         break;
     case CP_ACCESS_TRAP_UNCATEGORIZED:
+        /* Only CP_ACCESS_TRAP traps are direct to a specified EL */
+        assert((res & CP_ACCESS_EL_MASK) == 0);
         if (cpu_isar_feature(aa64_ids, cpu) && isread &&
             arm_cpreg_in_idspace(ri)) {
             /*
-- 
2.34.1

Rearrange the code in do_coproc_insn() so that we calculate the
syndrome value for a potential trap early; we're about to add a
second check that wants this value earlier than where it is currently
determined.

(Specifically, a trap to EL2 because of HSTR_EL2 should take
priority over an UNDEF to EL1, even when the UNDEF is because
the register does not exist at all or because its ri->access
bits non-configurably fail the access. So the check we put in
for HSTR_EL2 trapping at EL1 (which needs the syndrome) is
going to have to be done before the check "is the ARMCPRegInfo
pointer NULL".)

This commit is just code motion; the change to HSTR_EL2
handling that will use the 'syndrome' variable is in a
subsequent commit.

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void do_coproc_insn(DisasContext *s, int cpnum, int is64,
     const ARMCPRegInfo *ri = get_arm_cp_reginfo(s->cp_regs, key);
     TCGv_ptr tcg_ri = NULL;
     bool need_exit_tb;
+    uint32_t syndrome;
+
+    /*
+     * Note that since we are an implementation which takes an
+     * exception on a trapped conditional instruction only if the
+     * instruction passes its condition code check, we can take
+     * advantage of the clause in the ARM ARM that allows us to set
+     * the COND field in the instruction to 0xE in all cases.
+     * We could fish the actual condition out of the insn (ARM)
+     * or the condexec bits (Thumb) but it isn't necessary.
+     */
+    switch (cpnum) {
+    case 14:
+        if (is64) {
+            syndrome = syn_cp14_rrt_trap(1, 0xe, opc1, crm, rt, rt2,
+                                         isread, false);
+        } else {
+            syndrome = syn_cp14_rt_trap(1, 0xe, opc1, opc2, crn, crm,
+                                        rt, isread, false);
+        }
+        break;
+    case 15:
+        if (is64) {
+            syndrome = syn_cp15_rrt_trap(1, 0xe, opc1, crm, rt, rt2,
+                                         isread, false);
+        } else {
+            syndrome = syn_cp15_rt_trap(1, 0xe, opc1, opc2, crn, crm,
+                                        rt, isread, false);
+        }
+        break;
+    default:
+        /*
+         * ARMv8 defines that only coprocessors 14 and 15 exist,
+         * so this can only happen if this is an ARMv7 or earlier CPU,
+         * in which case the syndrome information won't actually be
+         * guest visible.
+         */
+        assert(!arm_dc_feature(s, ARM_FEATURE_V8));
+        syndrome = syn_uncategorized();
+        break;
+    }
 
     if (!ri) {
         /*
@@ -XXX,XX +XXX,XX @@ static void do_coproc_insn(DisasContext *s, int cpnum, int is64,
          * Note that on XScale all cp0..c13 registers do an access check
          * call in order to handle c15_cpar.
          */
-        uint32_t syndrome;
-
-        /*
-         * Note that since we are an implementation which takes an
-         * exception on a trapped conditional instruction only if the
-         * instruction passes its condition code check, we can take
-         * advantage of the clause in the ARM ARM that allows us to set
-         * the COND field in the instruction to 0xE in all cases.
-         * We could fish the actual condition out of the insn (ARM)
-         * or the condexec bits (Thumb) but it isn't necessary.
-         */
-        switch (cpnum) {
-        case 14:
-            if (is64) {
-                syndrome = syn_cp14_rrt_trap(1, 0xe, opc1, crm, rt, rt2,
-                                             isread, false);
-            } else {
-                syndrome = syn_cp14_rt_trap(1, 0xe, opc1, opc2, crn, crm,
-                                            rt, isread, false);
-            }
-            break;
-        case 15:
-            if (is64) {
-                syndrome = syn_cp15_rrt_trap(1, 0xe, opc1, crm, rt, rt2,
-                                             isread, false);
-            } else {
-                syndrome = syn_cp15_rt_trap(1, 0xe, opc1, opc2, crn, crm,
-                                            rt, isread, false);
-            }
-            break;
-        default:
-            /*
-             * ARMv8 defines that only coprocessors 14 and 15 exist,
-             * so this can only happen if this is an ARMv7 or earlier CPU,
-             * in which case the syndrome information won't actually be
-             * guest visible.
-             */
-            assert(!arm_dc_feature(s, ARM_FEATURE_V8));
-            syndrome = syn_uncategorized();
-            break;
-        }
-
         gen_set_condexec(s);
         gen_update_pc(s, 0);
         tcg_ri = tcg_temp_new_ptr();
-- 
2.34.1

The HSTR_EL2 register has a collection of trap bits which allow
trapping to EL2 for AArch32 EL0 or EL1 accesses to coprocessor
registers.  The specification of these bits is that when the bit is
set we should trap
 * EL1 accesses
 * EL0 accesses, if the access is not UNDEFINED when the
   trap bit is 0

In other words, all UNDEF traps from EL0 to EL1 take precedence over
the HSTR_EL2 trap to EL2.  (Since this is all AArch32, the only kind
of trap-to-EL1 is the UNDEF.)

Our implementation doesn't quite get this right -- we check for traps
in the order:
 * no such register
 * ARMCPRegInfo::access bits
 * HSTR_EL2 trap bits
 * ARMCPRegInfo::accessfn

So UNDEFs that happen because of the access bits or because the
register doesn't exist at all correctly take priority over the
HSTR_EL2 trap, but where a register can UNDEF at EL0 because of the
accessfn we are incorrectly always taking the HSTR_EL2 trap.  There
aren't many of these, but one example is the PMCR; if you look at the
access pseudocode for this register you can see that UNDEFs taken
because of the value of PMUSERENR.EN are checked before the HSTR_EL2
bit.

Rearrange helper_access_check_cp_reg() so that we always call the
accessfn, and use its return value if it indicates that the access
traps to EL0 rather than continuing to do the HSTR_EL2 check.

The semantics of HSTR_EL2 require that it traps cpreg accesses
to EL2 for:
 * EL1 accesses
 * EL0 accesses, if the access is not UNDEFINED when the
   trap bit is 0

(You can see this in the I_ZFGJP priority ordering, where HSTR_EL2
traps from EL1 to EL2 are priority 12, UNDEFs are priority 13, and
HSTR_EL2 traps from EL0 are priority 15.)

However, we don't get this right for EL1 accesses which UNDEF because
the register doesn't exist at all or because its ri->access bits
non-configurably forbid the access.  At EL1, check for the HSTR_EL2
trap early, before either of these UNDEF reasons.

We have to retain the HSTR_EL2 check in access_check_cp_reg(),
because at EL0 any kind of UNDEF-to-EL1 (including "no such
register", "bad ri->access" and "ri->accessfn returns 'trap to EL1'")
takes precedence over the trap to EL2.  But we only need to do that
check for EL0 now.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230130182459.3309057-7-peter.maydell@linaro.org
Message-id: 20230127175507.2895013-7-peter.maydell@linaro.org
---
 target/arm/op_helper.c |  6 +++++-
 target/arm/translate.c | 28 +++++++++++++++++++++++++++-
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, uint32_t key,
         goto fail;
     }
 
-    if (!is_a64(env) && arm_current_el(env) < 2 && ri->cp == 15 &&
+    /*
+     * HSTR_EL2 traps from EL1 are checked earlier, in generated code;
+     * we only need to check here for traps from EL0.
+     */
+    if (!is_a64(env) && arm_current_el(env) == 0 && ri->cp == 15 &&
         (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
         uint32_t mask = 1 << ri->crn;
 
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void do_coproc_insn(DisasContext *s, int cpnum, int is64,
         break;
     }
 
+    if (s->hstr_active && cpnum == 15 && s->current_el == 1) {
+        /*
+         * At EL1, check for a HSTR_EL2 trap, which must take precedence
+         * over the UNDEF for "no such register" or the UNDEF for "access
+         * permissions forbid this EL1 access". HSTR_EL2 traps from EL0
+         * only happen if the cpreg doesn't UNDEF at EL0, so we do those in
+         * access_check_cp_reg(), after the checks for whether the access
+         * configurably trapped to EL1.
+         */
+        uint32_t maskbit = is64 ? crm : crn;
+
+        if (maskbit != 4 && maskbit != 14) {
+            /* T4 and T14 are RES0 so never cause traps */
+            TCGv_i32 t;
+            DisasLabel over = gen_disas_label(s);
+
+            t = load_cpu_offset(offsetoflow32(CPUARMState, cp15.hstr_el2));
+            tcg_gen_andi_i32(t, t, 1u << maskbit);
+            tcg_gen_brcondi_i32(TCG_COND_EQ, t, 0, over.label);
+            tcg_temp_free_i32(t);
+
+            gen_exception_insn(s, 0, EXCP_UDEF, syndrome);
+            set_disas_label(s, over);
+        }
+    }
+
     if (!ri) {
         /*
          * Unknown register; this might be a guest error or a QEMU
@@ -XXX,XX +XXX,XX @@ static void do_coproc_insn(DisasContext *s, int cpnum, int is64,
         return;
     }
 
-    if (s->hstr_active || ri->accessfn ||
+    if ((s->hstr_active && s->current_el == 0) || ri->accessfn ||
         (arm_dc_feature(s, ARM_FEATURE_XSCALE) && cpnum < 14)) {
         /*
          * Emit code to perform further access permissions checks at
-- 
2.34.1

The HSTR_EL2 register is not supposed to have an effect unless EL2 is
enabled in the current security state.  We weren't checking for this,
which meant that if the guest set up the HSTR_EL2 register we would
incorrectly trap even for accesses from Secure EL0 and EL1.

Add the missing checks. (Other places where we look at HSTR_EL2
for the not-in-v8A bits TTEE and TJDBX are already checking that
we are in NS EL0 or EL1, so there we alredy know EL2 is enabled.)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el,
         DP_TBFLAG_A32(flags, VFPEN, 1);
     }
 
-    if (el < 2 && env->cp15.hstr_el2 &&
+    if (el < 2 && env->cp15.hstr_el2 && arm_is_el2_enabled(env) &&
         (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
         DP_TBFLAG_A32(flags, HSTR_ACTIVE, 1);
     }
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, uint32_t key,
      * we only need to check here for traps from EL0.
      */
     if (!is_a64(env) && arm_current_el(env) == 0 && ri->cp == 15 &&
+        arm_is_el2_enabled(env) &&
         (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
         uint32_t mask = 1 << ri->crn;
 
-- 
2.34.1

Define the system registers which are provided by the
FEAT_FGT fine-grained trap architectural feature:
 HFGRTR_EL2, HFGWTR_EL2, HDFGRTR_EL2, HDFGWTR_EL2, HFGITR_EL2

All these registers are a set of bit fields, where each bit is set
for a trap and clear to not trap on a particular system register
access.  The R and W register pairs are for system registers,
allowing trapping to be done separately for reads and writes; the I
register is for system instructions where trapping is on instruction
execution.

The data storage in the CPU state struct is arranged as a set of
arrays rather than separate fields so that when we're looking up the
bits for a system register access we can just index into the array
rather than having to use a switch to select a named struct member.
The later FEAT_FGT2 will add extra elements to these arrays.

The field definitions for the new registers are in cpregs.h because
in practice the code that needs them is code that also needs
the cpregs information; cpu.h is included in a lot more files.
We're also going to add some FGT-specific definitions to cpregs.h
in the next commit.

We do not implement HAFGRTR_EL2, because we don't implement
FEAT_AMUv1.

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum CPAccessResult {
     CP_ACCESS_TRAP_UNCATEGORIZED = (2 << 2),
 } CPAccessResult;
 
+/* Indexes into fgt_read[] */
+#define FGTREG_HFGRTR 0
+#define FGTREG_HDFGRTR 1
+/* Indexes into fgt_write[] */
+#define FGTREG_HFGWTR 0
+#define FGTREG_HDFGWTR 1
+/* Indexes into fgt_exec[] */
+#define FGTREG_HFGITR 0
+
+FIELD(HFGRTR_EL2, AFSR0_EL1, 0, 1)
+FIELD(HFGRTR_EL2, AFSR1_EL1, 1, 1)
+FIELD(HFGRTR_EL2, AIDR_EL1, 2, 1)
+FIELD(HFGRTR_EL2, AMAIR_EL1, 3, 1)
+FIELD(HFGRTR_EL2, APDAKEY, 4, 1)
+FIELD(HFGRTR_EL2, APDBKEY, 5, 1)
+FIELD(HFGRTR_EL2, APGAKEY, 6, 1)
+FIELD(HFGRTR_EL2, APIAKEY, 7, 1)
+FIELD(HFGRTR_EL2, APIBKEY, 8, 1)
+FIELD(HFGRTR_EL2, CCSIDR_EL1, 9, 1)
+FIELD(HFGRTR_EL2, CLIDR_EL1, 10, 1)
+FIELD(HFGRTR_EL2, CONTEXTIDR_EL1, 11, 1)
+FIELD(HFGRTR_EL2, CPACR_EL1, 12, 1)
+FIELD(HFGRTR_EL2, CSSELR_EL1, 13, 1)
+FIELD(HFGRTR_EL2, CTR_EL0, 14, 1)
+FIELD(HFGRTR_EL2, DCZID_EL0, 15, 1)
+FIELD(HFGRTR_EL2, ESR_EL1, 16, 1)
+FIELD(HFGRTR_EL2, FAR_EL1, 17, 1)
+FIELD(HFGRTR_EL2, ISR_EL1, 18, 1)
+FIELD(HFGRTR_EL2, LORC_EL1, 19, 1)
+FIELD(HFGRTR_EL2, LOREA_EL1, 20, 1)
+FIELD(HFGRTR_EL2, LORID_EL1, 21, 1)
+FIELD(HFGRTR_EL2, LORN_EL1, 22, 1)
+FIELD(HFGRTR_EL2, LORSA_EL1, 23, 1)
+FIELD(HFGRTR_EL2, MAIR_EL1, 24, 1)
+FIELD(HFGRTR_EL2, MIDR_EL1, 25, 1)
+FIELD(HFGRTR_EL2, MPIDR_EL1, 26, 1)
+FIELD(HFGRTR_EL2, PAR_EL1, 27, 1)
+FIELD(HFGRTR_EL2, REVIDR_EL1, 28, 1)
+FIELD(HFGRTR_EL2, SCTLR_EL1, 29, 1)
+FIELD(HFGRTR_EL2, SCXTNUM_EL1, 30, 1)
+FIELD(HFGRTR_EL2, SCXTNUM_EL0, 31, 1)
+FIELD(HFGRTR_EL2, TCR_EL1, 32, 1)
+FIELD(HFGRTR_EL2, TPIDR_EL1, 33, 1)
+FIELD(HFGRTR_EL2, TPIDRRO_EL0, 34, 1)
+FIELD(HFGRTR_EL2, TPIDR_EL0, 35, 1)
+FIELD(HFGRTR_EL2, TTBR0_EL1, 36, 1)
+FIELD(HFGRTR_EL2, TTBR1_EL1, 37, 1)
+FIELD(HFGRTR_EL2, VBAR_EL1, 38, 1)
+FIELD(HFGRTR_EL2, ICC_IGRPENN_EL1, 39, 1)
+FIELD(HFGRTR_EL2, ERRIDR_EL1, 40, 1)
+FIELD(HFGRTR_EL2, ERRSELR_EL1, 41, 1)
+FIELD(HFGRTR_EL2, ERXFR_EL1, 42, 1)
+FIELD(HFGRTR_EL2, ERXCTLR_EL1, 43, 1)
+FIELD(HFGRTR_EL2, ERXSTATUS_EL1, 44, 1)
+FIELD(HFGRTR_EL2, ERXMISCN_EL1, 45, 1)
+FIELD(HFGRTR_EL2, ERXPFGF_EL1, 46, 1)
+FIELD(HFGRTR_EL2, ERXPFGCTL_EL1, 47, 1)
+FIELD(HFGRTR_EL2, ERXPFGCDN_EL1, 48, 1)
+FIELD(HFGRTR_EL2, ERXADDR_EL1, 49, 1)
+FIELD(HFGRTR_EL2, NACCDATA_EL1, 50, 1)
+/* 51-53: RES0 */
+FIELD(HFGRTR_EL2, NSMPRI_EL1, 54, 1)
+FIELD(HFGRTR_EL2, NTPIDR2_EL0, 55, 1)
+/* 56-63: RES0 */
+
+/* These match HFGRTR but bits for RO registers are RES0 */
+FIELD(HFGWTR_EL2, AFSR0_EL1, 0, 1)
+FIELD(HFGWTR_EL2, AFSR1_EL1, 1, 1)
+FIELD(HFGWTR_EL2, AMAIR_EL1, 3, 1)
+FIELD(HFGWTR_EL2, APDAKEY, 4, 1)
+FIELD(HFGWTR_EL2, APDBKEY, 5, 1)
+FIELD(HFGWTR_EL2, APGAKEY, 6, 1)
+FIELD(HFGWTR_EL2, APIAKEY, 7, 1)
+FIELD(HFGWTR_EL2, APIBKEY, 8, 1)
+FIELD(HFGWTR_EL2, CONTEXTIDR_EL1, 11, 1)
+FIELD(HFGWTR_EL2, CPACR_EL1, 12, 1)
+FIELD(HFGWTR_EL2, CSSELR_EL1, 13, 1)
+FIELD(HFGWTR_EL2, ESR_EL1, 16, 1)
+FIELD(HFGWTR_EL2, FAR_EL1, 17, 1)
+FIELD(HFGWTR_EL2, LORC_EL1, 19, 1)
+FIELD(HFGWTR_EL2, LOREA_EL1, 20, 1)
+FIELD(HFGWTR_EL2, LORN_EL1, 22, 1)
+FIELD(HFGWTR_EL2, LORSA_EL1, 23, 1)
+FIELD(HFGWTR_EL2, MAIR_EL1, 24, 1)
+FIELD(HFGWTR_EL2, PAR_EL1, 27, 1)
+FIELD(HFGWTR_EL2, SCTLR_EL1, 29, 1)
+FIELD(HFGWTR_EL2, SCXTNUM_EL1, 30, 1)
+FIELD(HFGWTR_EL2, SCXTNUM_EL0, 31, 1)
+FIELD(HFGWTR_EL2, TCR_EL1, 32, 1)
+FIELD(HFGWTR_EL2, TPIDR_EL1, 33, 1)
+FIELD(HFGWTR_EL2, TPIDRRO_EL0, 34, 1)
+FIELD(HFGWTR_EL2, TPIDR_EL0, 35, 1)
+FIELD(HFGWTR_EL2, TTBR0_EL1, 36, 1)
+FIELD(HFGWTR_EL2, TTBR1_EL1, 37, 1)
+FIELD(HFGWTR_EL2, VBAR_EL1, 38, 1)
+FIELD(HFGWTR_EL2, ICC_IGRPENN_EL1, 39, 1)
+FIELD(HFGWTR_EL2, ERRSELR_EL1, 41, 1)
+FIELD(HFGWTR_EL2, ERXCTLR_EL1, 43, 1)
+FIELD(HFGWTR_EL2, ERXSTATUS_EL1, 44, 1)
+FIELD(HFGWTR_EL2, ERXMISCN_EL1, 45, 1)
+FIELD(HFGWTR_EL2, ERXPFGCTL_EL1, 47, 1)
+FIELD(HFGWTR_EL2, ERXPFGCDN_EL1, 48, 1)
+FIELD(HFGWTR_EL2, ERXADDR_EL1, 49, 1)
+FIELD(HFGWTR_EL2, NACCDATA_EL1, 50, 1)
+FIELD(HFGWTR_EL2, NSMPRI_EL1, 54, 1)
+FIELD(HFGWTR_EL2, NTPIDR2_EL0, 55, 1)
+
+FIELD(HFGITR_EL2, ICIALLUIS, 0, 1)
+FIELD(HFGITR_EL2, ICIALLU, 1, 1)
+FIELD(HFGITR_EL2, ICIVAU, 2, 1)
+FIELD(HFGITR_EL2, DCIVAC, 3, 1)
+FIELD(HFGITR_EL2, DCISW, 4, 1)
+FIELD(HFGITR_EL2, DCCSW, 5, 1)
+FIELD(HFGITR_EL2, DCCISW, 6, 1)
+FIELD(HFGITR_EL2, DCCVAU, 7, 1)
+FIELD(HFGITR_EL2, DCCVAP, 8, 1)
+FIELD(HFGITR_EL2, DCCVADP, 9, 1)
+FIELD(HFGITR_EL2, DCCIVAC, 10, 1)
+FIELD(HFGITR_EL2, DCZVA, 11, 1)
+FIELD(HFGITR_EL2, ATS1E1R, 12, 1)
+FIELD(HFGITR_EL2, ATS1E1W, 13, 1)
+FIELD(HFGITR_EL2, ATS1E0R, 14, 1)
+FIELD(HFGITR_EL2, ATS1E0W, 15, 1)
+FIELD(HFGITR_EL2, ATS1E1RP, 16, 1)
+FIELD(HFGITR_EL2, ATS1E1WP, 17, 1)
+FIELD(HFGITR_EL2, TLBIVMALLE1OS, 18, 1)
+FIELD(HFGITR_EL2, TLBIVAE1OS, 19, 1)
+FIELD(HFGITR_EL2, TLBIASIDE1OS, 20, 1)
+FIELD(HFGITR_EL2, TLBIVAAE1OS, 21, 1)
+FIELD(HFGITR_EL2, TLBIVALE1OS, 22, 1)
+FIELD(HFGITR_EL2, TLBIVAALE1OS, 23, 1)
+FIELD(HFGITR_EL2, TLBIRVAE1OS, 24, 1)
+FIELD(HFGITR_EL2, TLBIRVAAE1OS, 25, 1)
+FIELD(HFGITR_EL2, TLBIRVALE1OS, 26, 1)
+FIELD(HFGITR_EL2, TLBIRVAALE1OS, 27, 1)
+FIELD(HFGITR_EL2, TLBIVMALLE1IS, 28, 1)
+FIELD(HFGITR_EL2, TLBIVAE1IS, 29, 1)
+FIELD(HFGITR_EL2, TLBIASIDE1IS, 30, 1)
+FIELD(HFGITR_EL2, TLBIVAAE1IS, 31, 1)
+FIELD(HFGITR_EL2, TLBIVALE1IS, 32, 1)
+FIELD(HFGITR_EL2, TLBIVAALE1IS, 33, 1)
+FIELD(HFGITR_EL2, TLBIRVAE1IS, 34, 1)
+FIELD(HFGITR_EL2, TLBIRVAAE1IS, 35, 1)
+FIELD(HFGITR_EL2, TLBIRVALE1IS, 36, 1)
+FIELD(HFGITR_EL2, TLBIRVAALE1IS, 37, 1)
+FIELD(HFGITR_EL2, TLBIRVAE1, 38, 1)
+FIELD(HFGITR_EL2, TLBIRVAAE1, 39, 1)
+FIELD(HFGITR_EL2, TLBIRVALE1, 40, 1)
+FIELD(HFGITR_EL2, TLBIRVAALE1, 41, 1)
+FIELD(HFGITR_EL2, TLBIVMALLE1, 42, 1)
+FIELD(HFGITR_EL2, TLBIVAE1, 43, 1)
+FIELD(HFGITR_EL2, TLBIASIDE1, 44, 1)
+FIELD(HFGITR_EL2, TLBIVAAE1, 45, 1)
+FIELD(HFGITR_EL2, TLBIVALE1, 46, 1)
+FIELD(HFGITR_EL2, TLBIVAALE1, 47, 1)
+FIELD(HFGITR_EL2, CFPRCTX, 48, 1)
+FIELD(HFGITR_EL2, DVPRCTX, 49, 1)
+FIELD(HFGITR_EL2, CPPRCTX, 50, 1)
+FIELD(HFGITR_EL2, ERET, 51, 1)
+FIELD(HFGITR_EL2, SVC_EL0, 52, 1)
+FIELD(HFGITR_EL2, SVC_EL1, 53, 1)
+FIELD(HFGITR_EL2, DCCVAC, 54, 1)
+FIELD(HFGITR_EL2, NBRBINJ, 55, 1)
+FIELD(HFGITR_EL2, NBRBIALL, 56, 1)
+
+FIELD(HDFGRTR_EL2, DBGBCRN_EL1, 0, 1)
+FIELD(HDFGRTR_EL2, DBGBVRN_EL1, 1, 1)
+FIELD(HDFGRTR_EL2, DBGWCRN_EL1, 2, 1)
+FIELD(HDFGRTR_EL2, DBGWVRN_EL1, 3, 1)
+FIELD(HDFGRTR_EL2, MDSCR_EL1, 4, 1)
+FIELD(HDFGRTR_EL2, DBGCLAIM, 5, 1)
+FIELD(HDFGRTR_EL2, DBGAUTHSTATUS_EL1, 6, 1)
+FIELD(HDFGRTR_EL2, DBGPRCR_EL1, 7, 1)
+/* 8: RES0: OSLAR_EL1 is WO */
+FIELD(HDFGRTR_EL2, OSLSR_EL1, 9, 1)
+FIELD(HDFGRTR_EL2, OSECCR_EL1, 10, 1)
+FIELD(HDFGRTR_EL2, OSDLR_EL1, 11, 1)
+FIELD(HDFGRTR_EL2, PMEVCNTRN_EL0, 12, 1)
+FIELD(HDFGRTR_EL2, PMEVTYPERN_EL0, 13, 1)
+FIELD(HDFGRTR_EL2, PMCCFILTR_EL0, 14, 1)
+FIELD(HDFGRTR_EL2, PMCCNTR_EL0, 15, 1)
+FIELD(HDFGRTR_EL2, PMCNTEN, 16, 1)
+FIELD(HDFGRTR_EL2, PMINTEN, 17, 1)
+FIELD(HDFGRTR_EL2, PMOVS, 18, 1)
+FIELD(HDFGRTR_EL2, PMSELR_EL0, 19, 1)
+/* 20: RES0: PMSWINC_EL0 is WO */
+/* 21: RES0: PMCR_EL0 is WO */
+FIELD(HDFGRTR_EL2, PMMIR_EL1, 22, 1)
+FIELD(HDFGRTR_EL2, PMBLIMITR_EL1, 23, 1)
+FIELD(HDFGRTR_EL2, PMBPTR_EL1, 24, 1)
+FIELD(HDFGRTR_EL2, PMBSR_EL1, 25, 1)
+FIELD(HDFGRTR_EL2, PMSCR_EL1, 26, 1)
+FIELD(HDFGRTR_EL2, PMSEVFR_EL1, 27, 1)
+FIELD(HDFGRTR_EL2, PMSFCR_EL1, 28, 1)
+FIELD(HDFGRTR_EL2, PMSICR_EL1, 29, 1)
+FIELD(HDFGRTR_EL2, PMSIDR_EL1, 30, 1)
+FIELD(HDFGRTR_EL2, PMSIRR_EL1, 31, 1)
+FIELD(HDFGRTR_EL2, PMSLATFR_EL1, 32, 1)
+FIELD(HDFGRTR_EL2, TRC, 33, 1)
+FIELD(HDFGRTR_EL2, TRCAUTHSTATUS, 34, 1)
+FIELD(HDFGRTR_EL2, TRCAUXCTLR, 35, 1)
+FIELD(HDFGRTR_EL2, TRCCLAIM, 36, 1)
+FIELD(HDFGRTR_EL2, TRCCNTVRn, 37, 1)
+/* 38, 39: RES0 */
+FIELD(HDFGRTR_EL2, TRCID, 40, 1)
+FIELD(HDFGRTR_EL2, TRCIMSPECN, 41, 1)
+/* 42: RES0: TRCOSLAR is WO */
+FIELD(HDFGRTR_EL2, TRCOSLSR, 43, 1)
+FIELD(HDFGRTR_EL2, TRCPRGCTLR, 44, 1)
+FIELD(HDFGRTR_EL2, TRCSEQSTR, 45, 1)
+FIELD(HDFGRTR_EL2, TRCSSCSRN, 46, 1)
+FIELD(HDFGRTR_EL2, TRCSTATR, 47, 1)
+FIELD(HDFGRTR_EL2, TRCVICTLR, 48, 1)
+/* 49: RES0: TRFCR_EL1 is WO */
+FIELD(HDFGRTR_EL2, TRBBASER_EL1, 50, 1)
+FIELD(HDFGRTR_EL2, TRBIDR_EL1, 51, 1)
+FIELD(HDFGRTR_EL2, TRBLIMITR_EL1, 52, 1)
+FIELD(HDFGRTR_EL2, TRBMAR_EL1, 53, 1)
+FIELD(HDFGRTR_EL2, TRBPTR_EL1, 54, 1)
+FIELD(HDFGRTR_EL2, TRBSR_EL1, 55, 1)
+FIELD(HDFGRTR_EL2, TRBTRG_EL1, 56, 1)
+FIELD(HDFGRTR_EL2, PMUSERENR_EL0, 57, 1)
+FIELD(HDFGRTR_EL2, PMCEIDN_EL0, 58, 1)
+FIELD(HDFGRTR_EL2, NBRBIDR, 59, 1)
+FIELD(HDFGRTR_EL2, NBRBCTL, 60, 1)
+FIELD(HDFGRTR_EL2, NBRBDATA, 61, 1)
+FIELD(HDFGRTR_EL2, NPMSNEVFR_EL1, 62, 1)
+FIELD(HDFGRTR_EL2, PMBIDR_EL1, 63, 1)
+
+/*
+ * These match HDFGRTR_EL2, but bits for RO registers are RES0.
+ * A few bits are for WO registers, where the HDFGRTR_EL2 bit is RES0.
+ */
+FIELD(HDFGWTR_EL2, DBGBCRN_EL1, 0, 1)
+FIELD(HDFGWTR_EL2, DBGBVRN_EL1, 1, 1)
+FIELD(HDFGWTR_EL2, DBGWCRN_EL1, 2, 1)
+FIELD(HDFGWTR_EL2, DBGWVRN_EL1, 3, 1)
+FIELD(HDFGWTR_EL2, MDSCR_EL1, 4, 1)
+FIELD(HDFGWTR_EL2, DBGCLAIM, 5, 1)
+FIELD(HDFGWTR_EL2, DBGPRCR_EL1, 7, 1)
+FIELD(HDFGWTR_EL2, OSLAR_EL1, 8, 1)
+FIELD(HDFGWTR_EL2, OSLSR_EL1, 9, 1)
+FIELD(HDFGWTR_EL2, OSECCR_EL1, 10, 1)
+FIELD(HDFGWTR_EL2, OSDLR_EL1, 11, 1)
+FIELD(HDFGWTR_EL2, PMEVCNTRN_EL0, 12, 1)
+FIELD(HDFGWTR_EL2, PMEVTYPERN_EL0, 13, 1)
+FIELD(HDFGWTR_EL2, PMCCFILTR_EL0, 14, 1)
+FIELD(HDFGWTR_EL2, PMCCNTR_EL0, 15, 1)
+FIELD(HDFGWTR_EL2, PMCNTEN, 16, 1)
+FIELD(HDFGWTR_EL2, PMINTEN, 17, 1)
+FIELD(HDFGWTR_EL2, PMOVS, 18, 1)
+FIELD(HDFGWTR_EL2, PMSELR_EL0, 19, 1)
+FIELD(HDFGWTR_EL2, PMSWINC_EL0, 20, 1)
+FIELD(HDFGWTR_EL2, PMCR_EL0, 21, 1)
+FIELD(HDFGWTR_EL2, PMBLIMITR_EL1, 23, 1)
+FIELD(HDFGWTR_EL2, PMBPTR_EL1, 24, 1)
+FIELD(HDFGWTR_EL2, PMBSR_EL1, 25, 1)
+FIELD(HDFGWTR_EL2, PMSCR_EL1, 26, 1)
+FIELD(HDFGWTR_EL2, PMSEVFR_EL1, 27, 1)
+FIELD(HDFGWTR_EL2, PMSFCR_EL1, 28, 1)
+FIELD(HDFGWTR_EL2, PMSICR_EL1, 29, 1)
+FIELD(HDFGWTR_EL2, PMSIRR_EL1, 31, 1)
+FIELD(HDFGWTR_EL2, PMSLATFR_EL1, 32, 1)
+FIELD(HDFGWTR_EL2, TRC, 33, 1)
+FIELD(HDFGWTR_EL2, TRCAUXCTLR, 35, 1)
+FIELD(HDFGWTR_EL2, TRCCLAIM, 36, 1)
+FIELD(HDFGWTR_EL2, TRCCNTVRn, 37, 1)
+FIELD(HDFGWTR_EL2, TRCIMSPECN, 41, 1)
+FIELD(HDFGWTR_EL2, TRCOSLAR, 42, 1)
+FIELD(HDFGWTR_EL2, TRCPRGCTLR, 44, 1)
+FIELD(HDFGWTR_EL2, TRCSEQSTR, 45, 1)
+FIELD(HDFGWTR_EL2, TRCSSCSRN, 46, 1)
+FIELD(HDFGWTR_EL2, TRCVICTLR, 48, 1)
+FIELD(HDFGWTR_EL2, TRFCR_EL1, 49, 1)
+FIELD(HDFGWTR_EL2, TRBBASER_EL1, 50, 1)
+FIELD(HDFGWTR_EL2, TRBLIMITR_EL1, 52, 1)
+FIELD(HDFGWTR_EL2, TRBMAR_EL1, 53, 1)
+FIELD(HDFGWTR_EL2, TRBPTR_EL1, 54, 1)
+FIELD(HDFGWTR_EL2, TRBSR_EL1, 55, 1)
+FIELD(HDFGWTR_EL2, TRBTRG_EL1, 56, 1)
+FIELD(HDFGWTR_EL2, PMUSERENR_EL0, 57, 1)
+FIELD(HDFGWTR_EL2, NBRBCTL, 60, 1)
+FIELD(HDFGWTR_EL2, NBRBDATA, 61, 1)
+FIELD(HDFGWTR_EL2, NPMSNEVFR_EL1, 62, 1)
+
 typedef struct ARMCPRegInfo ARMCPRegInfo;
 
 /*
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         uint64_t disr_el1;
         uint64_t vdisr_el2;
         uint64_t vsesr_el2;
+
+        /*
+         * Fine-Grained Trap registers. We store these as arrays so the
+         * access checking code doesn't have to manually select
+         * HFGRTR_EL2 vs HFDFGRTR_EL2 etc when looking up the bit to test.
+         * FEAT_FGT2 will add more elements to these arrays.
+         */
+        uint64_t fgt_read[2]; /* HFGRTR, HDFGRTR */
+        uint64_t fgt_write[2]; /* HFGWTR, HDFGWTR */
+        uint64_t fgt_exec[1]; /* HFGITR */
     } cp15;
 
     struct {
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_tgran64_2(const ARMISARegisters *id)
     return t >= 2 || (t == 0 && isar_feature_aa64_tgran64(id));
 }
 
+static inline bool isar_feature_aa64_fgt(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, FGT) != 0;
+}
+
 static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
         if (cpu_isar_feature(aa64_hcx, cpu)) {
             valid_mask |= SCR_HXEN;
         }
+        if (cpu_isar_feature(aa64_fgt, cpu)) {
+            valid_mask |= SCR_FGTEN;
+        }
     } else {
         valid_mask &= ~(SCR_RW | SCR_ST);
         if (cpu_isar_feature(aa32_ras, cpu)) {
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo scxtnum_reginfo[] = {
       .access = PL3_RW,
       .fieldoffset = offsetof(CPUARMState, scxtnum_el[3]) },
 };
+
+static CPAccessResult access_fgt(CPUARMState *env, const ARMCPRegInfo *ri,
+                                 bool isread)
+{
+    if (arm_current_el(env) == 2 &&
+        arm_feature(env, ARM_FEATURE_EL3) && !(env->cp15.scr_el3 & SCR_FGTEN)) {
+        return CP_ACCESS_TRAP_EL3;
+    }
+    return CP_ACCESS_OK;
+}
+
+static const ARMCPRegInfo fgt_reginfo[] = {
+    { .name = "HFGRTR_EL2", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 4,
+      .access = PL2_RW, .accessfn = access_fgt,
+      .fieldoffset = offsetof(CPUARMState, cp15.fgt_read[FGTREG_HFGRTR]) },
+    { .name = "HFGWTR_EL2", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 5,
+      .access = PL2_RW, .accessfn = access_fgt,
+      .fieldoffset = offsetof(CPUARMState, cp15.fgt_write[FGTREG_HFGWTR]) },
+    { .name = "HDFGRTR_EL2", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 4, .crn = 3, .crm = 1, .opc2 = 4,
+      .access = PL2_RW, .accessfn = access_fgt,
+      .fieldoffset = offsetof(CPUARMState, cp15.fgt_read[FGTREG_HDFGRTR]) },
+    { .name = "HDFGWTR_EL2", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 4, .crn = 3, .crm = 1, .opc2 = 5,
+      .access = PL2_RW, .accessfn = access_fgt,
+      .fieldoffset = offsetof(CPUARMState, cp15.fgt_write[FGTREG_HDFGWTR]) },
+    { .name = "HFGITR_EL2", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 6,
+      .access = PL2_RW, .accessfn = access_fgt,
+      .fieldoffset = offsetof(CPUARMState, cp15.fgt_exec[FGTREG_HFGITR]) },
+};
 #endif /* TARGET_AARCH64 */
 
 static CPAccessResult access_predinv(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
     if (cpu_isar_feature(aa64_scxtnum, cpu)) {
         define_arm_cp_regs(cpu, scxtnum_reginfo);
     }
+
+    if (cpu_isar_feature(aa64_fgt, cpu)) {
+        define_arm_cp_regs(cpu, fgt_reginfo);
+    }
 #endif
 
     if (cpu_isar_feature(any_predinv, cpu)) {
-- 
2.34.1

Implement the machinery for fine-grained traps on normal sysregs.
Any sysreg with a fine-grained trap will set the new field to
indicate which FGT register bit it should trap on.

FGT traps only happen when an AArch64 EL2 enables them for
an AArch64 EL1. They therefore are only relevant for AArch32
cpregs when the cpreg can be accessed from EL0. The logic
in access_check_cp_reg() will check this, so it is safe to
add a .fgt marking to an ARM_CP_STATE_BOTH ARMCPRegInfo.

The DO_BIT and DO_REV_BIT macros define enum constants FGT_##bitname
which can be used to specify the FGT bit, eg
   .fgt = FGT_AFSR0_EL1
(We assume that there is no bit name duplication across the FGT
registers, for brevity's sake.)

Subsequent commits will add the .fgt fields to the relevant register
definitions and define the FGT_nnn values for them.

Note that some of the FGT traps are for instructions that we don't
handle via the cpregs mechanisms (mostly these are instruction traps).
Those we will have to handle separately.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Fuad Tabba <tabba@google.com>
Message-id: 20230130182459.3309057-10-peter.maydell@linaro.org
Message-id: 20230127175507.2895013-10-peter.maydell@linaro.org
---
 target/arm/cpregs.h        | 72 ++++++++++++++++++++++++++++++++++++++
 target/arm/cpu.h           |  1 +
 target/arm/internals.h     | 20 +++++++++++
 target/arm/translate.h     |  2 ++
 target/arm/helper.c        |  9 +++++
 target/arm/op_helper.c     | 30 ++++++++++++++++
 target/arm/translate-a64.c |  3 +-
 target/arm/translate.c     |  2 ++
 8 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ FIELD(HDFGWTR_EL2, NBRBCTL, 60, 1)
 FIELD(HDFGWTR_EL2, NBRBDATA, 61, 1)
 FIELD(HDFGWTR_EL2, NPMSNEVFR_EL1, 62, 1)
 
+/* Which fine-grained trap bit register to check, if any */
+FIELD(FGT, TYPE, 10, 3)
+FIELD(FGT, REV, 9, 1) /* Is bit sense reversed? */
+FIELD(FGT, IDX, 6, 3) /* Index within a uint64_t[] array */
+FIELD(FGT, BITPOS, 0, 6) /* Bit position within the uint64_t */
+
+/*
+ * Macros to define FGT_##bitname enum constants to use in ARMCPRegInfo::fgt
+ * fields. We assume for brevity's sake that there are no duplicated
+ * bit names across the various FGT registers.
+ */
+#define DO_BIT(REG, BITNAME)                                    \
+    FGT_##BITNAME = FGT_##REG | R_##REG##_EL2_##BITNAME##_SHIFT
+
+/* Some bits have reversed sense, so 0 means trap and 1 means not */
+#define DO_REV_BIT(REG, BITNAME)                                        \
+    FGT_##BITNAME = FGT_##REG | FGT_REV | R_##REG##_EL2_##BITNAME##_SHIFT
+
+typedef enum FGTBit {
+    /*
+     * These bits tell us which register arrays to use:
+     * if FGT_R is set then reads are checked against fgt_read[];
+     * if FGT_W is set then writes are checked against fgt_write[];
+     * if FGT_EXEC is set then all accesses are checked against fgt_exec[].
+     *
+     * For almost all bits in the R/W register pairs, the bit exists in
+     * both registers for a RW register, in HFGRTR/HDFGRTR for a RO register
+     * with the corresponding HFGWTR/HDFGTWTR bit being RES0, and vice-versa
+     * for a WO register. There are unfortunately a couple of exceptions
+     * (PMCR_EL0, TRFCR_EL1) where the register being trapped is RW but
+     * the FGT system only allows trapping of writes, not reads.
+     *
+     * Note that we arrange these bits so that a 0 FGTBit means "no trap".
+     */
+    FGT_R = 1 << R_FGT_TYPE_SHIFT,
+    FGT_W = 2 << R_FGT_TYPE_SHIFT,
+    FGT_EXEC = 4 << R_FGT_TYPE_SHIFT,
+    FGT_RW = FGT_R | FGT_W,
+    /* Bit to identify whether trap bit is reversed sense */
+    FGT_REV = R_FGT_REV_MASK,
+
+    /*
+     * If a bit exists in HFGRTR/HDFGRTR then either the register being
+     * trapped is RO or the bit also exists in HFGWTR/HDFGWTR, so we either
+     * want to trap for both reads and writes or else it's harmless to mark
+     * it as trap-on-writes.
+     * If a bit exists only in HFGWTR/HDFGWTR then either the register being
+     * trapped is WO, or else it is one of the two oddball special cases
+     * which are RW but have only a write trap. We mark these as only
+     * FGT_W so we get the right behaviour for those special cases.
+     * (If a bit was added in future that provided only a read trap for an
+     * RW register we'd need to do something special to get the FGT_R bit
+     * only. But this seems unlikely to happen.)
+     *
+     * So for the DO_BIT/DO_REV_BIT macros: use FGT_HFGRTR/FGT_HDFGRTR if
+     * the bit exists in that register. Otherwise use FGT_HFGWTR/FGT_HDFGWTR.
+     */
+    FGT_HFGRTR = FGT_RW | (FGTREG_HFGRTR << R_FGT_IDX_SHIFT),
+    FGT_HFGWTR = FGT_W | (FGTREG_HFGWTR << R_FGT_IDX_SHIFT),
+    FGT_HDFGRTR = FGT_RW | (FGTREG_HDFGRTR << R_FGT_IDX_SHIFT),
+    FGT_HDFGWTR = FGT_W | (FGTREG_HDFGWTR << R_FGT_IDX_SHIFT),
+    FGT_HFGITR = FGT_EXEC | (FGTREG_HFGITR << R_FGT_IDX_SHIFT),
+} FGTBit;
+
+#undef DO_BIT
+#undef DO_REV_BIT
+
 typedef struct ARMCPRegInfo ARMCPRegInfo;
 
 /*
@@ -XXX,XX +XXX,XX @@ struct ARMCPRegInfo {
     CPAccessRights access;
     /* Security state: ARM_CP_SECSTATE_* bits/values */
     CPSecureState secure;
+    /*
+     * Which fine-grained trap register bit to check, if any. This
+     * value encodes both the trap register and bit within it.
+     */
+    FGTBit fgt;
     /*
      * The opaque pointer passed to define_arm_cp_regs_with_opaque() when
      * this register was defined: can be used to hand data through to the
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_ANY, FPEXC_EL, 8, 2)
 /* Memory operations require alignment: SCTLR_ELx.A or CCR.UNALIGN_TRP */
 FIELD(TBFLAG_ANY, ALIGN_MEM, 10, 1)
 FIELD(TBFLAG_ANY, PSTATE__IL, 11, 1)
+FIELD(TBFLAG_ANY, FGT_ACTIVE, 12, 1)
 
 /*
  * Bit usage when in AArch32 state, both A- and M-profile.
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t arm_mdcr_el2_eff(CPUARMState *env)
     ((1 << (1 - 1)) | (1 << (2 - 1)) |                  \
      (1 << (4 - 1)) | (1 << (8 - 1)) | (1 << (16 - 1)))
 
+/*
+ * Return true if it is possible to take a fine-grained-trap to EL2.
+ */
+static inline bool arm_fgt_active(CPUARMState *env, int el)
+{
+    /*
+     * The Arm ARM only requires the "{E2H,TGE} != {1,1}" test for traps
+     * that can affect EL0, but it is harmless to do the test also for
+     * traps on registers that are only accessible at EL1 because if the test
+     * returns true then we can't be executing at EL1 anyway.
+     * FGT traps only happen when EL2 is enabled and EL1 is AArch64;
+     * traps from AArch32 only happen for the EL0 is AArch32 case.
+     */
+    return cpu_isar_feature(aa64_fgt, env_archcpu(env)) &&
+        el < 2 && arm_is_el2_enabled(env) &&
+        arm_el_is_aa64(env, 1) &&
+        (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE) &&
+        (!arm_feature(env, ARM_FEATURE_EL3) || (env->cp15.scr_el3 & SCR_FGTEN));
+}
+
 #endif
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool is_nonstreaming;
     /* True if MVE insns are definitely not predicated by VPR or LTPSIZE */
     bool mve_no_pred;
+    /* True if fine-grained traps are active */
+    bool fgt_active;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_common(CPUARMState *env, int fp_el,
     if (arm_singlestep_active(env)) {
         DP_TBFLAG_ANY(flags, SS_ACTIVE, 1);
     }
+
     return flags;
 }
 
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el,
         DP_TBFLAG_A32(flags, HSTR_ACTIVE, 1);
     }
 
+    if (arm_fgt_active(env, el)) {
+        DP_TBFLAG_ANY(flags, FGT_ACTIVE, 1);
+    }
+
     if (env->uncached_cpsr & CPSR_IL) {
         DP_TBFLAG_ANY(flags, PSTATE__IL, 1);
     }
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
         DP_TBFLAG_ANY(flags, PSTATE__IL, 1);
     }
 
+    if (arm_fgt_active(env, el)) {
+        DP_TBFLAG_ANY(flags, FGT_ACTIVE, 1);
+    }
+
     if (cpu_isar_feature(aa64_mte, env_archcpu(env))) {
         /*
          * Set MTE_ACTIVE if any access may be Checked, and leave clear
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ const void *HELPER(access_check_cp_reg)(CPUARMState *env, uint32_t key,
         }
     }
 
+    /*
+     * Fine-grained traps also are lower priority than undef-to-EL1,
+     * higher priority than trap-to-EL3, and we don't care about priority
+     * order with other EL2 traps because the syndrome value is the same.
+     */
+    if (arm_fgt_active(env, arm_current_el(env))) {
+        uint64_t trapword = 0;
+        unsigned int idx = FIELD_EX32(ri->fgt, FGT, IDX);
+        unsigned int bitpos = FIELD_EX32(ri->fgt, FGT, BITPOS);
+        bool rev = FIELD_EX32(ri->fgt, FGT, REV);
+        bool trapbit;
+
+        if (ri->fgt & FGT_EXEC) {
+            assert(idx < ARRAY_SIZE(env->cp15.fgt_exec));
+            trapword = env->cp15.fgt_exec[idx];
+        } else if (isread && (ri->fgt & FGT_R)) {
+            assert(idx < ARRAY_SIZE(env->cp15.fgt_read));
+            trapword = env->cp15.fgt_read[idx];
+        } else if (!isread && (ri->fgt & FGT_W)) {
+            assert(idx < ARRAY_SIZE(env->cp15.fgt_write));
+            trapword = env->cp15.fgt_write[idx];
+        }
+
+        trapbit = extract64(trapword, bitpos, 1);
+        if (trapbit != rev) {
+            res = CP_ACCESS_TRAP_EL2;
+            goto fail;
+        }
+    }
+
     if (likely(res == CP_ACCESS_OK)) {
         return ri;
     }
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
         return;
     }
 
-    if (ri->accessfn) {
+    if (ri->accessfn || (ri->fgt && s->fgt_active)) {
         /* Emit code to perform further access permissions checks at
          * runtime; this may result in an exception.
          */
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->fp_excp_el = EX_TBFLAG_ANY(tb_flags, FPEXC_EL);
     dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
     dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
+    dc->fgt_active = EX_TBFLAG_ANY(tb_flags, FGT_ACTIVE);
     dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL);
     dc->sme_excp_el = EX_TBFLAG_A64(tb_flags, SMEEXC_EL);
     dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16;
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void do_coproc_insn(DisasContext *s, int cpnum, int is64,
     }
 
     if ((s->hstr_active && s->current_el == 0) || ri->accessfn ||
+        (ri->fgt && s->fgt_active) ||
         (arm_dc_feature(s, ARM_FEATURE_XSCALE) && cpnum < 14)) {
         /*
          * Emit code to perform further access permissions checks at
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     dc->fp_excp_el = EX_TBFLAG_ANY(tb_flags, FPEXC_EL);
     dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
     dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
+    dc->fgt_active = EX_TBFLAG_ANY(tb_flags, FGT_ACTIVE);
 
     if (arm_feature(env, ARM_FEATURE_M)) {
         dc->vfp_enabled = 1;
-- 
2.34.1

Mark up the sysreg definitions for the registers trapped
by HFGRTR/HFGWTR bits 0..11.

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
     FGT_HDFGRTR = FGT_RW | (FGTREG_HDFGRTR << R_FGT_IDX_SHIFT),
     FGT_HDFGWTR = FGT_W | (FGTREG_HDFGWTR << R_FGT_IDX_SHIFT),
     FGT_HFGITR = FGT_EXEC | (FGTREG_HFGITR << R_FGT_IDX_SHIFT),
+
+    /* Trap bits in HFGRTR_EL2 / HFGWTR_EL2, starting from bit 0. */
+    DO_BIT(HFGRTR, AFSR0_EL1),
+    DO_BIT(HFGRTR, AFSR1_EL1),
+    DO_BIT(HFGRTR, AIDR_EL1),
+    DO_BIT(HFGRTR, AMAIR_EL1),
+    DO_BIT(HFGRTR, APDAKEY),
+    DO_BIT(HFGRTR, APDBKEY),
+    DO_BIT(HFGRTR, APGAKEY),
+    DO_BIT(HFGRTR, APIAKEY),
+    DO_BIT(HFGRTR, APIBKEY),
+    DO_BIT(HFGRTR, CCSIDR_EL1),
+    DO_BIT(HFGRTR, CLIDR_EL1),
+    DO_BIT(HFGRTR, CONTEXTIDR_EL1),
 } FGTBit;
 
 #undef DO_BIT
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo cp_reginfo[] = {
     { .name = "CONTEXTIDR_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 13, .crm = 0, .opc2 = 1,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fgt = FGT_CONTEXTIDR_EL1,
       .secure = ARM_CP_SECSTATE_NS,
       .fieldoffset = offsetof(CPUARMState, cp15.contextidr_el[1]),
       .resetvalue = 0, .writefn = contextidr_write, .raw_writefn = raw_write, },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
       .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 0,
       .access = PL1_R,
       .accessfn = access_tid4,
+      .fgt = FGT_CCSIDR_EL1,
       .readfn = ccsidr_read, .type = ARM_CP_NO_RAW },
     { .name = "CSSELR", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 2, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
       .opc0 = 3, .opc1 = 1, .crn = 0, .crm = 0, .opc2 = 7,
       .access = PL1_R, .type = ARM_CP_CONST,
       .accessfn = access_aa64_tid1,
+      .fgt = FGT_AIDR_EL1,
       .resetvalue = 0 },
     /*
      * Auxiliary fault status registers: these also are IMPDEF, and we
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
     { .name = "AFSR0_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 5, .crm = 1, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fgt = FGT_AFSR0_EL1,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "AFSR1_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 5, .crm = 1, .opc2 = 1,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fgt = FGT_AFSR1_EL1,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     /*
      * MAIR can just read-as-written because we don't implement caches
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lpae_cp_reginfo[] = {
     { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .crn = 10, .crm = 3, .opc1 = 0, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fgt = FGT_AMAIR_EL1,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     /* AMAIR1 is mapped to AMAIR_EL1[63:32] */
     { .name = "AMAIR1", .cp = 15, .crn = 10, .crm = 3, .opc1 = 0, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo pauth_reginfo[] = {
     { .name = "APDAKEYLO_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_pauth,
+      .fgt = FGT_APDAKEY,
       .fieldoffset = offsetof(CPUARMState, keys.apda.lo) },
     { .name = "APDAKEYHI_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 1,
       .access = PL1_RW, .accessfn = access_pauth,
+      .fgt = FGT_APDAKEY,
       .fieldoffset = offsetof(CPUARMState, keys.apda.hi) },
     { .name = "APDBKEYLO_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 2,
       .access = PL1_RW, .accessfn = access_pauth,
+      .fgt = FGT_APDBKEY,
       .fieldoffset = offsetof(CPUARMState, keys.apdb.lo) },
     { .name = "APDBKEYHI_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 3,
       .access = PL1_RW, .accessfn = access_pauth,
+      .fgt = FGT_APDBKEY,
       .fieldoffset = offsetof(CPUARMState, keys.apdb.hi) },
     { .name = "APGAKEYLO_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 3, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_pauth,
+      .fgt = FGT_APGAKEY,
       .fieldoffset = offsetof(CPUARMState, keys.apga.lo) },
     { .name = "APGAKEYHI_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 3, .opc2 = 1,
       .access = PL1_RW, .accessfn = access_pauth,
+      .fgt = FGT_APGAKEY,
       .fieldoffset = offsetof(CPUARMState, keys.apga.hi) },
     { .name = "APIAKEYLO_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_pauth,
+      .fgt = FGT_APIAKEY,
       .fieldoffset = offsetof(CPUARMState, keys.apia.lo) },
     { .name = "APIAKEYHI_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 1,
       .access = PL1_RW, .accessfn = access_pauth,
+      .fgt = FGT_APIAKEY,
       .fieldoffset = offsetof(CPUARMState, keys.apia.hi) },
     { .name = "APIBKEYLO_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 2,
       .access = PL1_RW, .accessfn = access_pauth,
+      .fgt = FGT_APIBKEY,
       .fieldoffset = offsetof(CPUARMState, keys.apib.lo) },
     { .name = "APIBKEYHI_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 3,
       .access = PL1_RW, .accessfn = access_pauth,
+      .fgt = FGT_APIBKEY,
       .fieldoffset = offsetof(CPUARMState, keys.apib.hi) },
 };
 
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 1,
             .access = PL1_R, .type = ARM_CP_CONST,
             .accessfn = access_tid4,
+            .fgt = FGT_CLIDR_EL1,
             .resetvalue = cpu->clidr
         };
         define_one_arm_cp_reg(cpu, &clidr);
-- 
2.34.1

Mark up the sysreg definitions for the registers trapped
by HFGRTR/HFGWTR bits 12..23.

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
     DO_BIT(HFGRTR, CCSIDR_EL1),
     DO_BIT(HFGRTR, CLIDR_EL1),
     DO_BIT(HFGRTR, CONTEXTIDR_EL1),
+    DO_BIT(HFGRTR, CPACR_EL1),
+    DO_BIT(HFGRTR, CSSELR_EL1),
+    DO_BIT(HFGRTR, CTR_EL0),
+    DO_BIT(HFGRTR, DCZID_EL0),
+    DO_BIT(HFGRTR, ESR_EL1),
+    DO_BIT(HFGRTR, FAR_EL1),
+    DO_BIT(HFGRTR, ISR_EL1),
+    DO_BIT(HFGRTR, LORC_EL1),
+    DO_BIT(HFGRTR, LOREA_EL1),
+    DO_BIT(HFGRTR, LORID_EL1),
+    DO_BIT(HFGRTR, LORN_EL1),
+    DO_BIT(HFGRTR, LORSA_EL1),
 } FGTBit;
 
 #undef DO_BIT
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
       .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0, },
     { .name = "CPACR", .state = ARM_CP_STATE_BOTH, .opc0 = 3,
       .crn = 1, .crm = 0, .opc1 = 0, .opc2 = 2, .accessfn = cpacr_access,
+      .fgt = FGT_CPACR_EL1,
       .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.cpacr_el1),
       .resetfn = cpacr_reset, .writefn = cpacr_write, .readfn = cpacr_read },
 };
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
       .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 2, .opc2 = 0,
       .access = PL1_RW,
       .accessfn = access_tid4,
+      .fgt = FGT_CSSELR_EL1,
       .writefn = csselr_write, .resetvalue = 0,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.csselr_s),
                              offsetof(CPUARMState, cp15.csselr_ns) } },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
       .resetfn = arm_cp_reset_ignore },
     { .name = "ISR_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 1, .opc2 = 0,
+      .fgt = FGT_ISR_EL1,
       .type = ARM_CP_NO_RAW, .access = PL1_R, .readfn = isr_read },
     /* 32 bit ITLB invalidates */
     { .name = "ITLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_pmsa_cp_reginfo[] = {
     { .name = "FAR_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .crn = 6, .crm = 0, .opc1 = 0, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fgt = FGT_FAR_EL1,
       .fieldoffset = offsetof(CPUARMState, cp15.far_el[1]),
       .resetvalue = 0, },
 };
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
     { .name = "ESR_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .crn = 5, .crm = 2, .opc1 = 0, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fgt = FGT_ESR_EL1,
       .fieldoffset = offsetof(CPUARMState, cp15.esr_el[1]), .resetvalue = 0, },
     { .name = "TTBR0_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "DCZID_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .opc2 = 7, .crn = 0, .crm = 0,
       .access = PL0_R, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_DCZID_EL0,
       .readfn = aa64_dczid_read },
     { .name = "DC_ZVA", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 4, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
     { .name = "LORSA_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_lor_other,
+      .fgt = FGT_LORSA_EL1,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "LOREA_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 1,
       .access = PL1_RW, .accessfn = access_lor_other,
+      .fgt = FGT_LOREA_EL1,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "LORN_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 2,
       .access = PL1_RW, .accessfn = access_lor_other,
+      .fgt = FGT_LORN_EL1,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "LORC_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 3,
       .access = PL1_RW, .accessfn = access_lor_other,
+      .fgt = FGT_LORC_EL1,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
       .access = PL1_R, .accessfn = access_lor_ns,
+      .fgt = FGT_LORID_EL1,
       .type = ARM_CP_CONST, .resetvalue = 0 },
 };
 
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "CTR_EL0", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 3, .opc2 = 1, .crn = 0, .crm = 0,
               .access = PL0_R, .accessfn = ctr_el0_access,
+              .fgt = FGT_CTR_EL0,
               .type = ARM_CP_CONST, .resetvalue = cpu->ctr },
             /* TCMTR and TLBTR exist in v8 but have no 64-bit versions */
             { .name = "TCMTR",
-- 
2.34.1

Mark up the sysreg definitions for the registers trapped
by HFGRTR/HFGWTR bits 24..35.

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
     DO_BIT(HFGRTR, LORID_EL1),
     DO_BIT(HFGRTR, LORN_EL1),
     DO_BIT(HFGRTR, LORSA_EL1),
+    DO_BIT(HFGRTR, MAIR_EL1),
+    DO_BIT(HFGRTR, MIDR_EL1),
+    DO_BIT(HFGRTR, MPIDR_EL1),
+    DO_BIT(HFGRTR, PAR_EL1),
+    DO_BIT(HFGRTR, REVIDR_EL1),
+    DO_BIT(HFGRTR, SCTLR_EL1),
+    DO_BIT(HFGRTR, SCXTNUM_EL1),
+    DO_BIT(HFGRTR, SCXTNUM_EL0),
+    DO_BIT(HFGRTR, TCR_EL1),
+    DO_BIT(HFGRTR, TPIDR_EL1),
+    DO_BIT(HFGRTR, TPIDRRO_EL0),
+    DO_BIT(HFGRTR, TPIDR_EL0),
 } FGTBit;
 
 #undef DO_BIT
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
     { .name = "MAIR_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fgt = FGT_MAIR_EL1,
       .fieldoffset = offsetof(CPUARMState, cp15.mair_el[1]),
       .resetvalue = 0 },
     { .name = "MAIR_EL3", .state = ARM_CP_STATE_AA64,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6k_cp_reginfo[] = {
     { .name = "TPIDR_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .opc2 = 2, .crn = 13, .crm = 0,
       .access = PL0_RW,
+      .fgt = FGT_TPIDR_EL0,
       .fieldoffset = offsetof(CPUARMState, cp15.tpidr_el[0]), .resetvalue = 0 },
     { .name = "TPIDRURW", .cp = 15, .crn = 13, .crm = 0, .opc1 = 0, .opc2 = 2,
       .access = PL0_RW,
+      .fgt = FGT_TPIDR_EL0,
       .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.tpidrurw_s),
                              offsetoflow32(CPUARMState, cp15.tpidrurw_ns) },
       .resetfn = arm_cp_reset_ignore },
     { .name = "TPIDRRO_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .opc2 = 3, .crn = 13, .crm = 0,
       .access = PL0_R | PL1_W,
+      .fgt = FGT_TPIDRRO_EL0,
       .fieldoffset = offsetof(CPUARMState, cp15.tpidrro_el[0]),
       .resetvalue = 0},
     { .name = "TPIDRURO", .cp = 15, .crn = 13, .crm = 0, .opc1 = 0, .opc2 = 3,
       .access = PL0_R | PL1_W,
+      .fgt = FGT_TPIDRRO_EL0,
       .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.tpidruro_s),
                              offsetoflow32(CPUARMState, cp15.tpidruro_ns) },
       .resetfn = arm_cp_reset_ignore },
     { .name = "TPIDR_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .opc2 = 4, .crn = 13, .crm = 0,
       .access = PL1_RW,
+      .fgt = FGT_TPIDR_EL1,
       .fieldoffset = offsetof(CPUARMState, cp15.tpidr_el[1]), .resetvalue = 0 },
     { .name = "TPIDRPRW", .opc1 = 0, .cp = 15, .crn = 13, .crm = 0, .opc2 = 4,
       .access = PL1_RW,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
     { .name = "TCR_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .crn = 2, .crm = 0, .opc1 = 0, .opc2 = 2,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fgt = FGT_TCR_EL1,
       .writefn = vmsa_tcr_el12_write,
       .raw_writefn = raw_write,
       .resetvalue = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .type = ARM_CP_ALIAS,
       .opc0 = 3, .opc1 = 0, .crn = 7, .crm = 4, .opc2 = 0,
       .access = PL1_RW, .resetvalue = 0,
+      .fgt = FGT_PAR_EL1,
       .fieldoffset = offsetof(CPUARMState, cp15.par_el[1]),
       .writefn = par_write },
 #endif
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo scxtnum_reginfo[] = {
     { .name = "SCXTNUM_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 13, .crm = 0, .opc2 = 7,
       .access = PL0_RW, .accessfn = access_scxtnum,
+      .fgt = FGT_SCXTNUM_EL0,
       .fieldoffset = offsetof(CPUARMState, scxtnum_el[0]) },
     { .name = "SCXTNUM_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 13, .crm = 0, .opc2 = 7,
       .access = PL1_RW, .accessfn = access_scxtnum,
+      .fgt = FGT_SCXTNUM_EL1,
       .fieldoffset = offsetof(CPUARMState, scxtnum_el[1]) },
     { .name = "SCXTNUM_EL2", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 4, .crn = 13, .crm = 0, .opc2 = 7,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "MIDR_EL1", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 0, .opc2 = 0,
               .access = PL1_R, .type = ARM_CP_NO_RAW, .resetvalue = cpu->midr,
+              .fgt = FGT_MIDR_EL1,
               .fieldoffset = offsetof(CPUARMState, cp15.c0_cpuid),
               .readfn = midr_read },
             /* crn = 0 op1 = 0 crm = 0 op2 = 7 : AArch32 aliases of MIDR */
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 0, .opc2 = 6,
               .access = PL1_R,
               .accessfn = access_aa64_tid1,
+              .fgt = FGT_REVIDR_EL1,
               .type = ARM_CP_CONST, .resetvalue = cpu->revidr },
         };
         ARMCPRegInfo id_v8_midr_alias_cp_reginfo = {
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
         ARMCPRegInfo mpidr_cp_reginfo[] = {
             { .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5,
+              .fgt = FGT_MPIDR_EL1,
               .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW },
         };
 #ifdef CONFIG_USER_ONLY
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             .name = "SCTLR", .state = ARM_CP_STATE_BOTH,
             .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 0,
             .access = PL1_RW, .accessfn = access_tvm_trvm,
+            .fgt = FGT_SCTLR_EL1,
             .bank_fieldoffsets = { offsetof(CPUARMState, cp15.sctlr_s),
                                    offsetof(CPUARMState, cp15.sctlr_ns) },
             .writefn = sctlr_write, .resetvalue = cpu->reset_sctlr,
-- 
2.34.1

Mark up the sysreg definitions for the registers trapped
by HFGRTR/HFGWTR bits 36..63.

Of these, some correspond to RAS registers which we implement as
always-UNDEF: these don't need any extra handling for FGT because the
UNDEF-to-EL1 always takes priority over any theoretical
FGT-trap-to-EL2.

Bit 50 (NACCDATA_EL1) is for the ACCDATA_EL1 register which is part
of the FEAT_LS64_ACCDATA feature which we don't yet implement.

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
     DO_BIT(HFGRTR, TPIDR_EL1),
     DO_BIT(HFGRTR, TPIDRRO_EL0),
     DO_BIT(HFGRTR, TPIDR_EL0),
+    DO_BIT(HFGRTR, TTBR0_EL1),
+    DO_BIT(HFGRTR, TTBR1_EL1),
+    DO_BIT(HFGRTR, VBAR_EL1),
+    DO_BIT(HFGRTR, ICC_IGRPENN_EL1),
+    DO_BIT(HFGRTR, ERRIDR_EL1),
+    DO_REV_BIT(HFGRTR, NSMPRI_EL1),
+    DO_REV_BIT(HFGRTR, NTPIDR2_EL0),
 } FGTBit;
 
 #undef DO_BIT
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo gicv3_cpuif_reginfo[] = {
       .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 6,
       .type = ARM_CP_IO | ARM_CP_NO_RAW,
       .access = PL1_RW, .accessfn = gicv3_fiq_access,
+      .fgt = FGT_ICC_IGRPENN_EL1,
       .readfn = icc_igrpen_read,
       .writefn = icc_igrpen_write,
     },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo gicv3_cpuif_reginfo[] = {
       .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 7,
       .type = ARM_CP_IO | ARM_CP_NO_RAW,
       .access = PL1_RW, .accessfn = gicv3_irq_access,
+      .fgt = FGT_ICC_IGRPENN_EL1,
       .readfn = icc_igrpen_read,
       .writefn = icc_igrpen_write,
     },
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
     { .name = "TTBR0_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fgt = FGT_TTBR0_EL1,
       .writefn = vmsa_ttbr_write, .resetvalue = 0,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
                              offsetof(CPUARMState, cp15.ttbr0_ns) } },
     { .name = "TTBR1_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 1,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fgt = FGT_TTBR1_EL1,
       .writefn = vmsa_ttbr_write, .resetvalue = 0,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr1_s),
                              offsetof(CPUARMState, cp15.ttbr1_ns) } },
@@ -XXX,XX +XXX,XX @@ static void disr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t val)
  *   ERRSELR_EL1
  * may generate UNDEFINED, which is the effect we get by not
  * listing them at all.
+ *
+ * These registers have fine-grained trap bits, but UNDEF-to-EL1
+ * is higher priority than FGT-to-EL2 so we do not need to list them
+ * in order to check for an FGT.
  */
 static const ARMCPRegInfo minimal_ras_reginfo[] = {
     { .name = "DISR_EL1", .state = ARM_CP_STATE_BOTH,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo minimal_ras_reginfo[] = {
     { .name = "ERRIDR_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 5, .crm = 3, .opc2 = 0,
       .access = PL1_R, .accessfn = access_terr,
+      .fgt = FGT_ERRIDR_EL1,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "VDISR_EL2", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 4, .crn = 12, .crm = 1, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo sme_reginfo[] = {
     { .name = "TPIDR2_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 13, .crm = 0, .opc2 = 5,
       .access = PL0_RW, .accessfn = access_tpidr2,
+      .fgt = FGT_NTPIDR2_EL0,
       .fieldoffset = offsetof(CPUARMState, cp15.tpidr2_el0) },
     { .name = "SVCR", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 4, .crm = 2, .opc2 = 2,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo sme_reginfo[] = {
     { .name = "SMPRI_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 2, .opc2 = 4,
       .access = PL1_RW, .accessfn = access_esm,
+      .fgt = FGT_NSMPRI_EL1,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "SMPRIMAP_EL2", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 2, .opc2 = 5,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "VBAR", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .crn = 12, .crm = 0, .opc1 = 0, .opc2 = 0,
               .access = PL1_RW, .writefn = vbar_write,
+              .fgt = FGT_VBAR_EL1,
               .bank_fieldoffsets = { offsetof(CPUARMState, cp15.vbar_s),
                                      offsetof(CPUARMState, cp15.vbar_ns) },
               .resetvalue = 0 },
-- 
2.34.1

Mark up the sysreg definitons for the registers trapped
by HDFGRTR/HDFGWTR bits 0..11. These cover various debug
related registers.

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
     DO_BIT(HFGRTR, ERRIDR_EL1),
     DO_REV_BIT(HFGRTR, NSMPRI_EL1),
     DO_REV_BIT(HFGRTR, NTPIDR2_EL0),
+
+    /* Trap bits in HDFGRTR_EL2 / HDFGWTR_EL2, starting from bit 0. */
+    DO_BIT(HDFGRTR, DBGBCRN_EL1),
+    DO_BIT(HDFGRTR, DBGBVRN_EL1),
+    DO_BIT(HDFGRTR, DBGWCRN_EL1),
+    DO_BIT(HDFGRTR, DBGWVRN_EL1),
+    DO_BIT(HDFGRTR, MDSCR_EL1),
+    DO_BIT(HDFGRTR, DBGCLAIM),
+    DO_BIT(HDFGWTR, OSLAR_EL1),
+    DO_BIT(HDFGRTR, OSLSR_EL1),
+    DO_BIT(HDFGRTR, OSECCR_EL1),
+    DO_BIT(HDFGRTR, OSDLR_EL1),
 } FGTBit;
 
 #undef DO_BIT
diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/debug_helper.c
+++ b/target/arm/debug_helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
     { .name = "MDSCR_EL1", .state = ARM_CP_STATE_BOTH,
       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 2,
       .access = PL1_RW, .accessfn = access_tda,
+      .fgt = FGT_MDSCR_EL1,
       .fieldoffset = offsetof(CPUARMState, cp15.mdscr_el1),
       .resetvalue = 0 },
     /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
     { .name = "OSECCR_EL1", .state = ARM_CP_STATE_BOTH, .cp = 14,
       .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 2,
       .access = PL1_RW, .accessfn = access_tda,
+      .fgt = FGT_OSECCR_EL1,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     /*
      * DBGDSCRint[15,12,5:2] map to MDSCR_EL1[15,12,5:2].  Map all bits as
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 4,
       .access = PL1_W, .type = ARM_CP_NO_RAW,
       .accessfn = access_tdosa,
+      .fgt = FGT_OSLAR_EL1,
       .writefn = oslar_write },
     { .name = "OSLSR_EL1", .state = ARM_CP_STATE_BOTH,
       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 4,
       .access = PL1_R, .resetvalue = 10,
       .accessfn = access_tdosa,
+      .fgt = FGT_OSLSR_EL1,
       .fieldoffset = offsetof(CPUARMState, cp15.oslsr_el1) },
     /* Dummy OSDLR_EL1: 32-bit Linux will read this */
     { .name = "OSDLR_EL1", .state = ARM_CP_STATE_BOTH,
       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 1, .crm = 3, .opc2 = 4,
       .access = PL1_RW, .accessfn = access_tdosa,
+      .fgt = FGT_OSDLR_EL1,
       .writefn = osdlr_write,
       .fieldoffset = offsetof(CPUARMState, cp15.osdlr_el1) },
     /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 6,
       .type = ARM_CP_ALIAS,
       .access = PL1_RW, .accessfn = access_tda,
+      .fgt = FGT_DBGCLAIM,
       .writefn = dbgclaimset_write, .readfn = dbgclaimset_read },
     { .name = "DBGCLAIMCLR_EL1", .state = ARM_CP_STATE_BOTH,
       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 6,
       .access = PL1_RW, .accessfn = access_tda,
+      .fgt = FGT_DBGCLAIM,
       .writefn = dbgclaimclr_write, .raw_writefn = raw_write,
       .fieldoffset = offsetof(CPUARMState, cp15.dbgclaim) },
 };
@@ -XXX,XX +XXX,XX @@ void define_debug_regs(ARMCPU *cpu)
             { .name = dbgbvr_el1_name, .state = ARM_CP_STATE_BOTH,
               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 4,
               .access = PL1_RW, .accessfn = access_tda,
+              .fgt = FGT_DBGBVRN_EL1,
               .fieldoffset = offsetof(CPUARMState, cp15.dbgbvr[i]),
               .writefn = dbgbvr_write, .raw_writefn = raw_write
             },
             { .name = dbgbcr_el1_name, .state = ARM_CP_STATE_BOTH,
               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 5,
               .access = PL1_RW, .accessfn = access_tda,
+              .fgt = FGT_DBGBCRN_EL1,
               .fieldoffset = offsetof(CPUARMState, cp15.dbgbcr[i]),
               .writefn = dbgbcr_write, .raw_writefn = raw_write
             },
@@ -XXX,XX +XXX,XX @@ void define_debug_regs(ARMCPU *cpu)
             { .name = dbgwvr_el1_name, .state = ARM_CP_STATE_BOTH,
               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 6,
               .access = PL1_RW, .accessfn = access_tda,
+              .fgt = FGT_DBGWVRN_EL1,
               .fieldoffset = offsetof(CPUARMState, cp15.dbgwvr[i]),
               .writefn = dbgwvr_write, .raw_writefn = raw_write
             },
             { .name = dbgwcr_el1_name, .state = ARM_CP_STATE_BOTH,
               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 7,
               .access = PL1_RW, .accessfn = access_tda,
+              .fgt = FGT_DBGWCRN_EL1,
               .fieldoffset = offsetof(CPUARMState, cp15.dbgwcr[i]),
               .writefn = dbgwcr_write, .raw_writefn = raw_write
             },
-- 
2.34.1

Mark up the sysreg definitions for the registers trapped
by HDFGRTR/HDFGWTR bits 12..x.

Bits 12..22 and bit 58 are for PMU registers.

The remaining bits in HDFGRTR/HDFGWTR are for traps on
registers that are part of features we don't implement:

Bits 23..32 and 63 : FEAT_SPE
Bits 33..48 : FEAT_ETE
Bits 50..56 : FEAT_TRBE
Bits 59..61 : FEAT_BRBE
Bit 62 : FEAT_SPEv1p2.

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
     DO_BIT(HDFGRTR, OSLSR_EL1),
     DO_BIT(HDFGRTR, OSECCR_EL1),
     DO_BIT(HDFGRTR, OSDLR_EL1),
+    DO_BIT(HDFGRTR, PMEVCNTRN_EL0),
+    DO_BIT(HDFGRTR, PMEVTYPERN_EL0),
+    DO_BIT(HDFGRTR, PMCCFILTR_EL0),
+    DO_BIT(HDFGRTR, PMCCNTR_EL0),
+    DO_BIT(HDFGRTR, PMCNTEN),
+    DO_BIT(HDFGRTR, PMINTEN),
+    DO_BIT(HDFGRTR, PMOVS),
+    DO_BIT(HDFGRTR, PMSELR_EL0),
+    DO_BIT(HDFGWTR, PMSWINC_EL0),
+    DO_BIT(HDFGWTR, PMCR_EL0),
+    DO_BIT(HDFGRTR, PMMIR_EL1),
+    DO_BIT(HDFGRTR, PMCEIDN_EL0),
 } FGTBit;
 
 #undef DO_BIT
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
       .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmcnten),
       .writefn = pmcntenset_write,
       .accessfn = pmreg_access,
+      .fgt = FGT_PMCNTEN,
       .raw_writefn = raw_write },
     { .name = "PMCNTENSET_EL0", .state = ARM_CP_STATE_AA64, .type = ARM_CP_IO,
       .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 1,
       .access = PL0_RW, .accessfn = pmreg_access,
+      .fgt = FGT_PMCNTEN,
       .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcnten), .resetvalue = 0,
       .writefn = pmcntenset_write, .raw_writefn = raw_write },
     { .name = "PMCNTENCLR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 2,
       .access = PL0_RW,
       .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmcnten),
       .accessfn = pmreg_access,
+      .fgt = FGT_PMCNTEN,
       .writefn = pmcntenclr_write,
       .type = ARM_CP_ALIAS | ARM_CP_IO },
     { .name = "PMCNTENCLR_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 2,
       .access = PL0_RW, .accessfn = pmreg_access,
+      .fgt = FGT_PMCNTEN,
       .type = ARM_CP_ALIAS | ARM_CP_IO,
       .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcnten),
       .writefn = pmcntenclr_write },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
       .access = PL0_RW, .type = ARM_CP_IO,
       .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmovsr),
       .accessfn = pmreg_access,
+      .fgt = FGT_PMOVS,
       .writefn = pmovsr_write,
       .raw_writefn = raw_write },
     { .name = "PMOVSCLR_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 3,
       .access = PL0_RW, .accessfn = pmreg_access,
+      .fgt = FGT_PMOVS,
       .type = ARM_CP_ALIAS | ARM_CP_IO,
       .fieldoffset = offsetof(CPUARMState, cp15.c9_pmovsr),
       .writefn = pmovsr_write,
       .raw_writefn = raw_write },
     { .name = "PMSWINC", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 4,
       .access = PL0_W, .accessfn = pmreg_access_swinc,
+      .fgt = FGT_PMSWINC_EL0,
       .type = ARM_CP_NO_RAW | ARM_CP_IO,
       .writefn = pmswinc_write },
     { .name = "PMSWINC_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 4,
       .access = PL0_W, .accessfn = pmreg_access_swinc,
+      .fgt = FGT_PMSWINC_EL0,
       .type = ARM_CP_NO_RAW | ARM_CP_IO,
       .writefn = pmswinc_write },
     { .name = "PMSELR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 5,
       .access = PL0_RW, .type = ARM_CP_ALIAS,
+      .fgt = FGT_PMSELR_EL0,
       .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmselr),
       .accessfn = pmreg_access_selr, .writefn = pmselr_write,
       .raw_writefn = raw_write},
     { .name = "PMSELR_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 5,
       .access = PL0_RW, .accessfn = pmreg_access_selr,
+      .fgt = FGT_PMSELR_EL0,
       .fieldoffset = offsetof(CPUARMState, cp15.c9_pmselr),
       .writefn = pmselr_write, .raw_writefn = raw_write, },
     { .name = "PMCCNTR", .cp = 15, .crn = 9, .crm = 13, .opc1 = 0, .opc2 = 0,
       .access = PL0_RW, .resetvalue = 0, .type = ARM_CP_ALIAS | ARM_CP_IO,
+      .fgt = FGT_PMCCNTR_EL0,
       .readfn = pmccntr_read, .writefn = pmccntr_write32,
       .accessfn = pmreg_access_ccntr },
     { .name = "PMCCNTR_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 13, .opc2 = 0,
       .access = PL0_RW, .accessfn = pmreg_access_ccntr,
+      .fgt = FGT_PMCCNTR_EL0,
       .type = ARM_CP_IO,
       .fieldoffset = offsetof(CPUARMState, cp15.c15_ccnt),
       .readfn = pmccntr_read, .writefn = pmccntr_write,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
     { .name = "PMCCFILTR", .cp = 15, .opc1 = 0, .crn = 14, .crm = 15, .opc2 = 7,
       .writefn = pmccfiltr_write_a32, .readfn = pmccfiltr_read_a32,
       .access = PL0_RW, .accessfn = pmreg_access,
+      .fgt = FGT_PMCCFILTR_EL0,
       .type = ARM_CP_ALIAS | ARM_CP_IO,
       .resetvalue = 0, },
     { .name = "PMCCFILTR_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 15, .opc2 = 7,
       .writefn = pmccfiltr_write, .raw_writefn = raw_write,
       .access = PL0_RW, .accessfn = pmreg_access,
+      .fgt = FGT_PMCCFILTR_EL0,
       .type = ARM_CP_IO,
       .fieldoffset = offsetof(CPUARMState, cp15.pmccfiltr_el0),
       .resetvalue = 0, },
     { .name = "PMXEVTYPER", .cp = 15, .crn = 9, .crm = 13, .opc1 = 0, .opc2 = 1,
       .access = PL0_RW, .type = ARM_CP_NO_RAW | ARM_CP_IO,
       .accessfn = pmreg_access,
+      .fgt = FGT_PMEVTYPERN_EL0,
       .writefn = pmxevtyper_write, .readfn = pmxevtyper_read },
     { .name = "PMXEVTYPER_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 13, .opc2 = 1,
       .access = PL0_RW, .type = ARM_CP_NO_RAW | ARM_CP_IO,
       .accessfn = pmreg_access,
+      .fgt = FGT_PMEVTYPERN_EL0,
       .writefn = pmxevtyper_write, .readfn = pmxevtyper_read },
     { .name = "PMXEVCNTR", .cp = 15, .crn = 9, .crm = 13, .opc1 = 0, .opc2 = 2,
       .access = PL0_RW, .type = ARM_CP_NO_RAW | ARM_CP_IO,
       .accessfn = pmreg_access_xevcntr,
+      .fgt = FGT_PMEVCNTRN_EL0,
       .writefn = pmxevcntr_write, .readfn = pmxevcntr_read },
     { .name = "PMXEVCNTR_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 13, .opc2 = 2,
       .access = PL0_RW, .type = ARM_CP_NO_RAW | ARM_CP_IO,
       .accessfn = pmreg_access_xevcntr,
+      .fgt = FGT_PMEVCNTRN_EL0,
       .writefn = pmxevcntr_write, .readfn = pmxevcntr_read },
     { .name = "PMUSERENR", .cp = 15, .crn = 9, .crm = 14, .opc1 = 0, .opc2 = 0,
       .access = PL0_R | PL1_RW, .accessfn = access_tpm,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
       .writefn = pmuserenr_write, .raw_writefn = raw_write },
     { .name = "PMINTENSET", .cp = 15, .crn = 9, .crm = 14, .opc1 = 0, .opc2 = 1,
       .access = PL1_RW, .accessfn = access_tpm,
+      .fgt = FGT_PMINTEN,
       .type = ARM_CP_ALIAS | ARM_CP_IO,
       .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pminten),
       .resetvalue = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
     { .name = "PMINTENSET_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 1,
       .access = PL1_RW, .accessfn = access_tpm,
+      .fgt = FGT_PMINTEN,
       .type = ARM_CP_IO,
       .fieldoffset = offsetof(CPUARMState, cp15.c9_pminten),
       .writefn = pmintenset_write, .raw_writefn = raw_write,
       .resetvalue = 0x0 },
     { .name = "PMINTENCLR", .cp = 15, .crn = 9, .crm = 14, .opc1 = 0, .opc2 = 2,
       .access = PL1_RW, .accessfn = access_tpm,
+      .fgt = FGT_PMINTEN,
       .type = ARM_CP_ALIAS | ARM_CP_IO | ARM_CP_NO_RAW,
       .fieldoffset = offsetof(CPUARMState, cp15.c9_pminten),
       .writefn = pmintenclr_write, },
     { .name = "PMINTENCLR_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 2,
       .access = PL1_RW, .accessfn = access_tpm,
+      .fgt = FGT_PMINTEN,
       .type = ARM_CP_ALIAS | ARM_CP_IO | ARM_CP_NO_RAW,
       .fieldoffset = offsetof(CPUARMState, cp15.c9_pminten),
       .writefn = pmintenclr_write },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo pmovsset_cp_reginfo[] = {
     /* PMOVSSET is not implemented in v7 before v7ve */
     { .name = "PMOVSSET", .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 3,
       .access = PL0_RW, .accessfn = pmreg_access,
+      .fgt = FGT_PMOVS,
       .type = ARM_CP_ALIAS | ARM_CP_IO,
       .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmovsr),
       .writefn = pmovsset_write,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo pmovsset_cp_reginfo[] = {
     { .name = "PMOVSSET_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 14, .opc2 = 3,
       .access = PL0_RW, .accessfn = pmreg_access,
+      .fgt = FGT_PMOVS,
       .type = ARM_CP_ALIAS | ARM_CP_IO,
       .fieldoffset = offsetof(CPUARMState, cp15.c9_pmovsr),
       .writefn = pmovsset_write,
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
     ARMCPRegInfo pmcr = {
         .name = "PMCR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 0,
         .access = PL0_RW,
+        .fgt = FGT_PMCR_EL0,
         .type = ARM_CP_IO | ARM_CP_ALIAS,
         .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmcr),
         .accessfn = pmreg_access, .writefn = pmcr_write,
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
         .name = "PMCR_EL0", .state = ARM_CP_STATE_AA64,
         .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 0,
         .access = PL0_RW, .accessfn = pmreg_access,
+        .fgt = FGT_PMCR_EL0,
         .type = ARM_CP_IO,
         .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcr),
         .resetvalue = cpu->isar.reset_pmcr_el0,
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
             { .name = pmevcntr_name, .cp = 15, .crn = 14,
               .crm = 8 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
               .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
+              .fgt = FGT_PMEVCNTRN_EL0,
               .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
               .accessfn = pmreg_access_xevcntr },
             { .name = pmevcntr_el0_name, .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 8 | (3 & (i >> 3)),
               .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access_xevcntr,
               .type = ARM_CP_IO,
+              .fgt = FGT_PMEVCNTRN_EL0,
               .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
               .raw_readfn = pmevcntr_rawread,
               .raw_writefn = pmevcntr_rawwrite },
             { .name = pmevtyper_name, .cp = 15, .crn = 14,
               .crm = 12 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
               .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
+              .fgt = FGT_PMEVTYPERN_EL0,
               .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
               .accessfn = pmreg_access },
             { .name = pmevtyper_el0_name, .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 12 | (3 & (i >> 3)),
               .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
+              .fgt = FGT_PMEVTYPERN_EL0,
               .type = ARM_CP_IO,
               .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
               .raw_writefn = pmevtyper_rawwrite },
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
             { .name = "PMCEID2", .state = ARM_CP_STATE_AA32,
               .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 4,
               .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
+              .fgt = FGT_PMCEIDN_EL0,
               .resetvalue = extract64(cpu->pmceid0, 32, 32) },
             { .name = "PMCEID3", .state = ARM_CP_STATE_AA32,
               .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 5,
               .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
+              .fgt = FGT_PMCEIDN_EL0,
               .resetvalue = extract64(cpu->pmceid1, 32, 32) },
         };
         define_arm_cp_regs(cpu, v81_pmu_regs);
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
             .name = "PMMIR_EL1", .state = ARM_CP_STATE_BOTH,
             .opc0 = 3, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 6,
             .access = PL1_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
+            .fgt = FGT_PMMIR_EL1,
             .resetvalue = 0
         };
         define_one_arm_cp_reg(cpu, &v84_pmmir);
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "PMCEID0", .state = ARM_CP_STATE_AA32,
               .cp = 15, .opc1 = 0, .crn = 9, .crm = 12, .opc2 = 6,
               .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
+              .fgt = FGT_PMCEIDN_EL0,
               .resetvalue = extract64(cpu->pmceid0, 0, 32) },
             { .name = "PMCEID0_EL0", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 6,
               .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
+              .fgt = FGT_PMCEIDN_EL0,
               .resetvalue = cpu->pmceid0 },
             { .name = "PMCEID1", .state = ARM_CP_STATE_AA32,
               .cp = 15, .opc1 = 0, .crn = 9, .crm = 12, .opc2 = 7,
               .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
+              .fgt = FGT_PMCEIDN_EL0,
               .resetvalue = extract64(cpu->pmceid1, 0, 32) },
             { .name = "PMCEID1_EL0", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 7,
               .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
+              .fgt = FGT_PMCEIDN_EL0,
               .resetvalue = cpu->pmceid1 },
         };
 #ifdef CONFIG_USER_ONLY
-- 
2.34.1

Mark up the sysreg definitions for the system instructions
trapped by HFGITR bits 0..11. These bits cover various
cache maintenance operations.

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
     DO_BIT(HDFGWTR, PMCR_EL0),
     DO_BIT(HDFGRTR, PMMIR_EL1),
     DO_BIT(HDFGRTR, PMCEIDN_EL0),
+
+    /* Trap bits in HFGITR_EL2, starting from bit 0 */
+    DO_BIT(HFGITR, ICIALLUIS),
+    DO_BIT(HFGITR, ICIALLU),
+    DO_BIT(HFGITR, ICIVAU),
+    DO_BIT(HFGITR, DCIVAC),
+    DO_BIT(HFGITR, DCISW),
+    DO_BIT(HFGITR, DCCSW),
+    DO_BIT(HFGITR, DCCISW),
+    DO_BIT(HFGITR, DCCVAU),
+    DO_BIT(HFGITR, DCCVAP),
+    DO_BIT(HFGITR, DCCVADP),
+    DO_BIT(HFGITR, DCCIVAC),
+    DO_BIT(HFGITR, DCZVA),
 } FGTBit;
 
 #undef DO_BIT
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
 #ifndef CONFIG_USER_ONLY
       /* Avoid overhead of an access check that always passes in user-mode */
       .accessfn = aa64_zva_access,
+      .fgt = FGT_DCZVA,
 #endif
     },
     { .name = "CURRENTEL", .state = ARM_CP_STATE_AA64,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "IC_IALLUIS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
       .access = PL1_W, .type = ARM_CP_NOP,
+      .fgt = FGT_ICIALLUIS,
       .accessfn = access_ticab },
     { .name = "IC_IALLU", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 0,
       .access = PL1_W, .type = ARM_CP_NOP,
+      .fgt = FGT_ICIALLU,
       .accessfn = access_tocu },
     { .name = "IC_IVAU", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 5, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
+      .fgt = FGT_ICIVAU,
       .accessfn = access_tocu },
     { .name = "DC_IVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
       .access = PL1_W, .accessfn = aa64_cacheop_poc_access,
+      .fgt = FGT_DCIVAC,
       .type = ARM_CP_NOP },
     { .name = "DC_ISW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 2,
+      .fgt = FGT_DCISW,
       .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
     { .name = "DC_CVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 10, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CSW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
+      .fgt = FGT_DCCSW,
       .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
     { .name = "DC_CVAU", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 11, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
+      .fgt = FGT_DCCVAU,
       .accessfn = access_tocu },
     { .name = "DC_CIVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
+      .fgt = FGT_DCCIVAC,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CISW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
+      .fgt = FGT_DCCISW,
       .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
     /* TLBI operations */
     { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo dcpop_reg[] = {
     { .name = "DC_CVAP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
+      .fgt = FGT_DCCVAP,
       .accessfn = aa64_cacheop_poc_access, .writefn = dccvap_writefn },
 };
 
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo dcpodp_reg[] = {
     { .name = "DC_CVADP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
+      .fgt = FGT_DCCVADP,
       .accessfn = aa64_cacheop_poc_access, .writefn = dccvap_writefn },
 };
 #endif /*CONFIG_USER_ONLY*/
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo mte_reginfo[] = {
     { .name = "DC_IGVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 3,
       .type = ARM_CP_NOP, .access = PL1_W,
+      .fgt = FGT_DCIVAC,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_IGSW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 4,
+      .fgt = FGT_DCISW,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
     { .name = "DC_IGDVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 5,
       .type = ARM_CP_NOP, .access = PL1_W,
+      .fgt = FGT_DCIVAC,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_IGDSW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 6,
+      .fgt = FGT_DCISW,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
     { .name = "DC_CGSW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 4,
+      .fgt = FGT_DCCSW,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
     { .name = "DC_CGDSW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 6,
+      .fgt = FGT_DCCSW,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
     { .name = "DC_CIGSW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 4,
+      .fgt = FGT_DCCISW,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
     { .name = "DC_CIGDSW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 6,
+      .fgt = FGT_DCCISW,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
 };
 
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo mte_el0_cacheop_reginfo[] = {
     { .name = "DC_CGVAP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 3,
       .type = ARM_CP_NOP, .access = PL0_W,
+      .fgt = FGT_DCCVAP,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CGDVAP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 5,
       .type = ARM_CP_NOP, .access = PL0_W,
+      .fgt = FGT_DCCVAP,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CGVADP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 3,
       .type = ARM_CP_NOP, .access = PL0_W,
+      .fgt = FGT_DCCVADP,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CGDVADP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 5,
       .type = ARM_CP_NOP, .access = PL0_W,
+      .fgt = FGT_DCCVADP,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CIGVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 3,
       .type = ARM_CP_NOP, .access = PL0_W,
+      .fgt = FGT_DCCIVAC,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CIGDVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 5,
       .type = ARM_CP_NOP, .access = PL0_W,
+      .fgt = FGT_DCCIVAC,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_GVA", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 4, .opc2 = 3,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo mte_el0_cacheop_reginfo[] = {
 #ifndef CONFIG_USER_ONLY
       /* Avoid overhead of an access check that always passes in user-mode */
       .accessfn = aa64_zva_access,
+      .fgt = FGT_DCZVA,
 #endif
     },
     { .name = "DC_GZVA", .state = ARM_CP_STATE_AA64,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo mte_el0_cacheop_reginfo[] = {
 #ifndef CONFIG_USER_ONLY
       /* Avoid overhead of an access check that always passes in user-mode */
       .accessfn = aa64_zva_access,
+      .fgt = FGT_DCZVA,
 #endif
     },
 };
-- 
2.34.1

Mark up the sysreg definitions for the system instructions
trapped by HFGITR bits 12..17. These bits cover AT address
translation instructions.

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
     DO_BIT(HFGITR, DCCVADP),
     DO_BIT(HFGITR, DCCIVAC),
     DO_BIT(HFGITR, DCZVA),
+    DO_BIT(HFGITR, ATS1E1R),
+    DO_BIT(HFGITR, ATS1E1W),
+    DO_BIT(HFGITR, ATS1E0R),
+    DO_BIT(HFGITR, ATS1E0W),
+    DO_BIT(HFGITR, ATS1E1RP),
+    DO_BIT(HFGITR, ATS1E1WP),
 } FGTBit;
 
 #undef DO_BIT
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "AT_S1E1R", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 0,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
+      .fgt = FGT_ATS1E1R,
       .writefn = ats_write64 },
     { .name = "AT_S1E1W", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 1,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
+      .fgt = FGT_ATS1E1W,
       .writefn = ats_write64 },
     { .name = "AT_S1E0R", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 2,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
+      .fgt = FGT_ATS1E0R,
       .writefn = ats_write64 },
     { .name = "AT_S1E0W", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 8, .opc2 = 3,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
+      .fgt = FGT_ATS1E0W,
       .writefn = ats_write64 },
     { .name = "AT_S12E1R", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 7, .crm = 8, .opc2 = 4,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo ats1e1_reginfo[] = {
     { .name = "AT_S1E1RP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 0,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
+      .fgt = FGT_ATS1E1RP,
       .writefn = ats_write64 },
     { .name = "AT_S1E1WP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 9, .opc2 = 1,
       .access = PL1_W, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC,
+      .fgt = FGT_ATS1E1WP,
       .writefn = ats_write64 },
 };
 
-- 
2.34.1

Mark up the sysreg definitions for the system instructions
trapped by HFGITR bits 18..47. These bits cover TLBI
TLB maintenance instructions.

(If we implemented FEAT_XS we would need to trap some of the
instructions added by that feature using these bits; but we don't
yet, so will need to add the .fgt markup when we do.)

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
     DO_BIT(HFGITR, ATS1E0W),
     DO_BIT(HFGITR, ATS1E1RP),
     DO_BIT(HFGITR, ATS1E1WP),
+    DO_BIT(HFGITR, TLBIVMALLE1OS),
+    DO_BIT(HFGITR, TLBIVAE1OS),
+    DO_BIT(HFGITR, TLBIASIDE1OS),
+    DO_BIT(HFGITR, TLBIVAAE1OS),
+    DO_BIT(HFGITR, TLBIVALE1OS),
+    DO_BIT(HFGITR, TLBIVAALE1OS),
+    DO_BIT(HFGITR, TLBIRVAE1OS),
+    DO_BIT(HFGITR, TLBIRVAAE1OS),
+    DO_BIT(HFGITR, TLBIRVALE1OS),
+    DO_BIT(HFGITR, TLBIRVAALE1OS),
+    DO_BIT(HFGITR, TLBIVMALLE1IS),
+    DO_BIT(HFGITR, TLBIVAE1IS),
+    DO_BIT(HFGITR, TLBIASIDE1IS),
+    DO_BIT(HFGITR, TLBIVAAE1IS),
+    DO_BIT(HFGITR, TLBIVALE1IS),
+    DO_BIT(HFGITR, TLBIVAALE1IS),
+    DO_BIT(HFGITR, TLBIRVAE1IS),
+    DO_BIT(HFGITR, TLBIRVAAE1IS),
+    DO_BIT(HFGITR, TLBIRVALE1IS),
+    DO_BIT(HFGITR, TLBIRVAALE1IS),
+    DO_BIT(HFGITR, TLBIRVAE1),
+    DO_BIT(HFGITR, TLBIRVAAE1),
+    DO_BIT(HFGITR, TLBIRVALE1),
+    DO_BIT(HFGITR, TLBIRVAALE1),
+    DO_BIT(HFGITR, TLBIVMALLE1),
+    DO_BIT(HFGITR, TLBIVAE1),
+    DO_BIT(HFGITR, TLBIASIDE1),
+    DO_BIT(HFGITR, TLBIVAAE1),
+    DO_BIT(HFGITR, TLBIVALE1),
+    DO_BIT(HFGITR, TLBIVAALE1),
 } FGTBit;
 
 #undef DO_BIT
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
       .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVMALLE1IS,
       .writefn = tlbi_aa64_vmalle1is_write },
     { .name = "TLBI_VAE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1,
       .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVAE1IS,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_ASIDE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2,
       .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIASIDE1IS,
       .writefn = tlbi_aa64_vmalle1is_write },
     { .name = "TLBI_VAAE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3,
       .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVAAE1IS,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_VALE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5,
       .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVALE1IS,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_VAALE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7,
       .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVAALE1IS,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_VMALLE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0,
       .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVMALLE1,
       .writefn = tlbi_aa64_vmalle1_write },
     { .name = "TLBI_VAE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 1,
       .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVAE1,
       .writefn = tlbi_aa64_vae1_write },
     { .name = "TLBI_ASIDE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 2,
       .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIASIDE1,
       .writefn = tlbi_aa64_vmalle1_write },
     { .name = "TLBI_VAAE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 3,
       .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVAAE1,
       .writefn = tlbi_aa64_vae1_write },
     { .name = "TLBI_VALE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5,
       .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVALE1,
       .writefn = tlbi_aa64_vae1_write },
     { .name = "TLBI_VAALE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7,
       .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVAALE1,
       .writefn = tlbi_aa64_vae1_write },
     { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbirange_reginfo[] = {
     { .name = "TLBI_RVAE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 1,
       .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVAE1IS,
       .writefn = tlbi_aa64_rvae1is_write },
     { .name = "TLBI_RVAAE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 3,
       .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVAAE1IS,
       .writefn = tlbi_aa64_rvae1is_write },
    { .name = "TLBI_RVALE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 5,
       .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVALE1IS,
       .writefn = tlbi_aa64_rvae1is_write },
     { .name = "TLBI_RVAALE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 7,
       .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVAALE1IS,
       .writefn = tlbi_aa64_rvae1is_write },
     { .name = "TLBI_RVAE1OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1,
       .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVAE1OS,
       .writefn = tlbi_aa64_rvae1is_write },
     { .name = "TLBI_RVAAE1OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 3,
       .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVAAE1OS,
       .writefn = tlbi_aa64_rvae1is_write },
    { .name = "TLBI_RVALE1OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 5,
       .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVALE1OS,
       .writefn = tlbi_aa64_rvae1is_write },
     { .name = "TLBI_RVAALE1OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 7,
       .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVAALE1OS,
       .writefn = tlbi_aa64_rvae1is_write },
     { .name = "TLBI_RVAE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 1,
       .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVAE1,
       .writefn = tlbi_aa64_rvae1_write },
     { .name = "TLBI_RVAAE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 3,
       .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVAAE1,
       .writefn = tlbi_aa64_rvae1_write },
    { .name = "TLBI_RVALE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 5,
       .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVALE1,
       .writefn = tlbi_aa64_rvae1_write },
     { .name = "TLBI_RVAALE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 7,
       .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIRVAALE1,
       .writefn = tlbi_aa64_rvae1_write },
     { .name = "TLBI_RIPAS2E1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 2,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbios_reginfo[] = {
     { .name = "TLBI_VMALLE1OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 0,
       .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVMALLE1OS,
       .writefn = tlbi_aa64_vmalle1is_write },
     { .name = "TLBI_VAE1OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 1,
+      .fgt = FGT_TLBIVAE1OS,
       .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_ASIDE1OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 2,
       .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIASIDE1OS,
       .writefn = tlbi_aa64_vmalle1is_write },
     { .name = "TLBI_VAAE1OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 3,
       .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVAAE1OS,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_VALE1OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 5,
       .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVALE1OS,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_VAALE1OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 7,
       .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW,
+      .fgt = FGT_TLBIVAALE1OS,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_ALLE2OS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 0,
-- 
2.34.1

Mark up the sysreg definitions for the system instructions
trapped by HFGITR bits 48..63.

Some of these bits are for trapping instructions which are
not in the system instruction encoding (i.e. which are
not handled by the ARMCPRegInfo mechanism):
 * ERET, ERETAA, ERETAB
 * SVC

We will have to handle those separately and manually.

diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -XXX,XX +XXX,XX @@ typedef enum FGTBit {
     DO_BIT(HFGITR, TLBIVAAE1),
     DO_BIT(HFGITR, TLBIVALE1),
     DO_BIT(HFGITR, TLBIVAALE1),
+    DO_BIT(HFGITR, CFPRCTX),
+    DO_BIT(HFGITR, DVPRCTX),
+    DO_BIT(HFGITR, CPPRCTX),
+    DO_BIT(HFGITR, DCCVAC),
 } FGTBit;
 
 #undef DO_BIT
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "DC_CVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 10, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
+      .fgt = FGT_DCCVAC,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CSW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo mte_el0_cacheop_reginfo[] = {
     { .name = "DC_CGVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 10, .opc2 = 3,
       .type = ARM_CP_NOP, .access = PL0_W,
+      .fgt = FGT_DCCVAC,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CGDVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 10, .opc2 = 5,
       .type = ARM_CP_NOP, .access = PL0_W,
+      .fgt = FGT_DCCVAC,
       .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CGVAP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 3,
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_predinv(CPUARMState *env, const ARMCPRegInfo *ri,
 static const ARMCPRegInfo predinv_reginfo[] = {
     { .name = "CFP_RCTX", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 3, .opc2 = 4,
+      .fgt = FGT_CFPRCTX,
       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
     { .name = "DVP_RCTX", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 3, .opc2 = 5,
+      .fgt = FGT_DVPRCTX,
       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
     { .name = "CPP_RCTX", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 3, .opc2 = 7,
+      .fgt = FGT_CPPRCTX,
       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
     /*
      * Note the AArch32 opcodes have a different OPC1.
      */
     { .name = "CFPRCTX", .state = ARM_CP_STATE_AA32,
       .cp = 15, .opc1 = 0, .crn = 7, .crm = 3, .opc2 = 4,
+      .fgt = FGT_CFPRCTX,
       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
     { .name = "DVPRCTX", .state = ARM_CP_STATE_AA32,
       .cp = 15, .opc1 = 0, .crn = 7, .crm = 3, .opc2 = 5,
+      .fgt = FGT_DVPRCTX,
       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
     { .name = "CPPRCTX", .state = ARM_CP_STATE_AA32,
       .cp = 15, .opc1 = 0, .crn = 7, .crm = 3, .opc2 = 7,
+      .fgt = FGT_CPPRCTX,
       .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_predinv },
 };
 
-- 
2.34.1

Implement the HFGITR_EL2.ERET fine-grained trap.  This traps
execution from AArch64 EL1 of ERET, ERETAA and ERETAB.  The trap is
reported with a syndrome value of 0x1a.

The trap must take precedence over a possible pointer-authentication
trap for ERETAA and ERETAB.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Fuad Tabba <tabba@google.com>
Message-id: 20230130182459.3309057-21-peter.maydell@linaro.org
Message-id: 20230127175507.2895013-21-peter.maydell@linaro.org
---
 target/arm/cpu.h           |  1 +
 target/arm/syndrome.h      | 10 ++++++++++
 target/arm/translate.h     |  2 ++
 target/arm/helper.c        |  3 +++
 target/arm/translate-a64.c | 10 ++++++++++
 5 files changed, 26 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, PSTATE_ZA, 23, 1)
 FIELD(TBFLAG_A64, SVL, 24, 4)
 /* Indicates that SME Streaming mode is active, and SMCR_ELx.FA64 is not. */
 FIELD(TBFLAG_A64, SME_TRAP_NONSTREAMING, 28, 1)
+FIELD(TBFLAG_A64, FGT_ERET, 29, 1)
 
 /*
  * Helpers for using the above.
diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -XXX,XX +XXX,XX @@ enum arm_exception_class {
     EC_AA64_SMC               = 0x17,
     EC_SYSTEMREGISTERTRAP     = 0x18,
     EC_SVEACCESSTRAP          = 0x19,
+    EC_ERETTRAP               = 0x1a,
     EC_SMETRAP                = 0x1d,
     EC_INSNABORT              = 0x20,
     EC_INSNABORT_SAME_EL      = 0x21,
@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_sve_access_trap(void)
     return EC_SVEACCESSTRAP << ARM_EL_EC_SHIFT;
 }
 
+/*
+ * eret_op is bits [1:0] of the ERET instruction, so:
+ * 0 for ERET, 2 for ERETAA, 3 for ERETAB.
+ */
+static inline uint32_t syn_erettrap(int eret_op)
+{
+    return (EC_ERETTRAP << ARM_EL_EC_SHIFT) | ARM_EL_IL | eret_op;
+}
+
 static inline uint32_t syn_smetrap(SMEExceptionType etype, bool is_16bit)
 {
     return (EC_SMETRAP << ARM_EL_EC_SHIFT)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool mve_no_pred;
     /* True if fine-grained traps are active */
     bool fgt_active;
+    /* True if fine-grained trap on ERET is enabled */
+    bool fgt_eret;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
 
     if (arm_fgt_active(env, el)) {
         DP_TBFLAG_ANY(flags, FGT_ACTIVE, 1);
+        if (FIELD_EX64(env->cp15.fgt_exec[FGTREG_HFGITR], HFGITR_EL2, ERET)) {
+            DP_TBFLAG_A64(flags, FGT_ERET, 1);
+        }
     }
 
     if (cpu_isar_feature(aa64_mte, env_archcpu(env))) {
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
             if (op4 != 0) {
                 goto do_unallocated;
             }
+            if (s->fgt_eret) {
+                gen_exception_insn_el(s, 0, EXCP_UDEF, syn_erettrap(op3), 2);
+                return;
+            }
             dst = tcg_temp_new_i64();
             tcg_gen_ld_i64(dst, cpu_env,
                            offsetof(CPUARMState, elr_el[s->current_el]));
@@ -XXX,XX +XXX,XX @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
             if (rn != 0x1f || op4 != 0x1f) {
                 goto do_unallocated;
             }
+            /* The FGT trap takes precedence over an auth trap. */
+            if (s->fgt_eret) {
+                gen_exception_insn_el(s, 0, EXCP_UDEF, syn_erettrap(op3), 2);
+                return;
+            }
             dst = tcg_temp_new_i64();
             tcg_gen_ld_i64(dst, cpu_env,
                            offsetof(CPUARMState, elr_el[s->current_el]));
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
     dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
     dc->fgt_active = EX_TBFLAG_ANY(tb_flags, FGT_ACTIVE);
+    dc->fgt_eret = EX_TBFLAG_A64(tb_flags, FGT_ERET);
     dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL);
     dc->sme_excp_el = EX_TBFLAG_A64(tb_flags, SMEEXC_EL);
     dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16;
-- 
2.34.1

Implement the HFGITR_EL2.SVC_EL0 and SVC_EL1 fine-grained traps.
These trap execution of the SVC instruction from AArch32 and AArch64.
(As usual, AArch32 can only trap from EL0, as fine grained traps are
disabled with an AArch32 EL1.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Fuad Tabba <tabba@google.com>
Message-id: 20230130182459.3309057-22-peter.maydell@linaro.org
Message-id: 20230127175507.2895013-22-peter.maydell@linaro.org
---
 target/arm/cpu.h           |  1 +
 target/arm/translate.h     |  2 ++
 target/arm/helper.c        | 20 ++++++++++++++++++++
 target/arm/translate-a64.c |  9 ++++++++-
 target/arm/translate.c     | 12 +++++++++---
 5 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_ANY, FPEXC_EL, 8, 2)
 FIELD(TBFLAG_ANY, ALIGN_MEM, 10, 1)
 FIELD(TBFLAG_ANY, PSTATE__IL, 11, 1)
 FIELD(TBFLAG_ANY, FGT_ACTIVE, 12, 1)
+FIELD(TBFLAG_ANY, FGT_SVC, 13, 1)
 
 /*
  * Bit usage when in AArch32 state, both A- and M-profile.
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool fgt_active;
     /* True if fine-grained trap on ERET is enabled */
     bool fgt_eret;
+    /* True if fine-grained trap on SVC is enabled */
+    bool fgt_svc;
     /*
      * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
      *  < 0, set by the current instruction.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env)
     return arm_mmu_idx_el(env, arm_current_el(env));
 }
 
+static inline bool fgt_svc(CPUARMState *env, int el)
+{
+    /*
+     * Assuming fine-grained-traps are active, return true if we
+     * should be trapping on SVC instructions. Only AArch64 can
+     * trap on an SVC at EL1, but we don't need to special-case this
+     * because if this is AArch32 EL1 then arm_fgt_active() is false.
+     * We also know el is 0 or 1.
+     */
+    return el == 0 ?
+        FIELD_EX64(env->cp15.fgt_exec[FGTREG_HFGITR], HFGITR_EL2, SVC_EL0) :
+        FIELD_EX64(env->cp15.fgt_exec[FGTREG_HFGITR], HFGITR_EL2, SVC_EL1);
+}
+
 static CPUARMTBFlags rebuild_hflags_common(CPUARMState *env, int fp_el,
                                            ARMMMUIdx mmu_idx,
                                            CPUARMTBFlags flags)
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el,
 
     if (arm_fgt_active(env, el)) {
         DP_TBFLAG_ANY(flags, FGT_ACTIVE, 1);
+        if (fgt_svc(env, el)) {
+            DP_TBFLAG_ANY(flags, FGT_SVC, 1);
+        }
     }
 
     if (env->uncached_cpsr & CPSR_IL) {
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
         if (FIELD_EX64(env->cp15.fgt_exec[FGTREG_HFGITR], HFGITR_EL2, ERET)) {
             DP_TBFLAG_A64(flags, FGT_ERET, 1);
         }
+        if (fgt_svc(env, el)) {
+            DP_TBFLAG_ANY(flags, FGT_SVC, 1);
+        }
     }
 
     if (cpu_isar_feature(aa64_mte, env_archcpu(env))) {
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_exc(DisasContext *s, uint32_t insn)
     int opc = extract32(insn, 21, 3);
     int op2_ll = extract32(insn, 0, 5);
     int imm16 = extract32(insn, 5, 16);
+    uint32_t syndrome;
 
     switch (opc) {
     case 0:
@@ -XXX,XX +XXX,XX @@ static void disas_exc(DisasContext *s, uint32_t insn)
          */
         switch (op2_ll) {
         case 1:                                                     /* SVC */
+            syndrome = syn_aa64_svc(imm16);
+            if (s->fgt_svc) {
+                gen_exception_insn_el(s, 0, EXCP_UDEF, syndrome, 2);
+                break;
+            }
             gen_ss_advance(s);
-            gen_exception_insn(s, 4, EXCP_SWI, syn_aa64_svc(imm16));
+            gen_exception_insn(s, 4, EXCP_SWI, syndrome);
             break;
         case 2:                                                     /* HVC */
             if (s->current_el == 0) {
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
     dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
     dc->fgt_active = EX_TBFLAG_ANY(tb_flags, FGT_ACTIVE);
+    dc->fgt_svc = EX_TBFLAG_ANY(tb_flags, FGT_SVC);
     dc->fgt_eret = EX_TBFLAG_A64(tb_flags, FGT_ERET);
     dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL);
     dc->sme_excp_el = EX_TBFLAG_A64(tb_flags, SMEEXC_EL);
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_SVC(DisasContext *s, arg_SVC *a)
         (a->imm == semihost_imm)) {
         gen_exception_internal_insn(s, EXCP_SEMIHOST);
     } else {
-        gen_update_pc(s, curr_insn_len(s));
-        s->svc_imm = a->imm;
-        s->base.is_jmp = DISAS_SWI;
+        if (s->fgt_svc) {
+            uint32_t syndrome = syn_aa32_svc(a->imm, s->thumb);
+            gen_exception_insn_el(s, 0, EXCP_UDEF, syndrome, 2);
+        } else {
+            gen_update_pc(s, curr_insn_len(s));
+            s->svc_imm = a->imm;
+            s->base.is_jmp = DISAS_SWI;
+        }
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
     dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
     dc->fgt_active = EX_TBFLAG_ANY(tb_flags, FGT_ACTIVE);
+    dc->fgt_svc = EX_TBFLAG_ANY(tb_flags, FGT_SVC);
 
     if (arm_feature(env, ARM_FEATURE_M)) {
         dc->vfp_enabled = 1;
-- 
2.34.1

FEAT_FGT also implements an extra trap bit in the MDCR_EL2 and
MDCR_EL3 registers: bit TDCC enables trapping of use of the Debug
Comms Channel registers OSDTRRX_EL1, OSDTRTX_EL1, MDCCSR_EL0,
MDCCINT_EL0, DBGDTR_EL0, DBGDTRRX_EL0 and DBGDTRTX_EL0 (and their
AArch32 equivalents).  This trapping is independent of whether
fine-grained traps are enabled or not.

Implement these extra traps.  (We don't implement DBGDTR_EL0,
DBGDTRRX_EL0 and DBGDTRTX_EL0.)

diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/debug_helper.c
+++ b/target/arm/debug_helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tda(CPUARMState *env, const ARMCPRegInfo *ri,
     return CP_ACCESS_OK;
 }
 
+/*
+ * Check for traps to Debug Comms Channel registers. If FEAT_FGT
+ * is implemented then these are controlled by MDCR_EL2.TDCC for
+ * EL2 and MDCR_EL3.TDCC for EL3. They are also controlled by
+ * the general debug access trap bits MDCR_EL2.TDA and MDCR_EL3.TDA.
+ */
+static CPAccessResult access_tdcc(CPUARMState *env, const ARMCPRegInfo *ri,
+                                  bool isread)
+{
+    int el = arm_current_el(env);
+    uint64_t mdcr_el2 = arm_mdcr_el2_eff(env);
+    bool mdcr_el2_tda = (mdcr_el2 & MDCR_TDA) || (mdcr_el2 & MDCR_TDE) ||
+        (arm_hcr_el2_eff(env) & HCR_TGE);
+    bool mdcr_el2_tdcc = cpu_isar_feature(aa64_fgt, env_archcpu(env)) &&
+                                          (mdcr_el2 & MDCR_TDCC);
+    bool mdcr_el3_tdcc = cpu_isar_feature(aa64_fgt, env_archcpu(env)) &&
+                                          (env->cp15.mdcr_el3 & MDCR_TDCC);
+
+    if (el < 2 && (mdcr_el2_tda || mdcr_el2_tdcc)) {
+        return CP_ACCESS_TRAP_EL2;
+    }
+    if (el < 3 && ((env->cp15.mdcr_el3 & MDCR_TDA) || mdcr_el3_tdcc)) {
+        return CP_ACCESS_TRAP_EL3;
+    }
+    return CP_ACCESS_OK;
+}
+
 static void oslar_write(CPUARMState *env, const ARMCPRegInfo *ri,
                         uint64_t value)
 {
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
      */
     { .name = "MDCCSR_EL0", .state = ARM_CP_STATE_AA64,
       .opc0 = 2, .opc1 = 3, .crn = 0, .crm = 1, .opc2 = 0,
-      .access = PL0_R, .accessfn = access_tda,
+      .access = PL0_R, .accessfn = access_tdcc,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     /*
      * OSDTRRX_EL1/OSDTRTX_EL1 are used for save and restore of DBGDTRRX_EL0.
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
      */
     { .name = "OSDTRRX_EL1", .state = ARM_CP_STATE_BOTH, .cp = 14,
       .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 0, .opc2 = 2,
-      .access = PL1_RW, .accessfn = access_tda,
+      .access = PL1_RW, .accessfn = access_tdcc,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "OSDTRTX_EL1", .state = ARM_CP_STATE_BOTH, .cp = 14,
       .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 2,
-      .access = PL1_RW, .accessfn = access_tda,
+      .access = PL1_RW, .accessfn = access_tdcc,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     /*
      * OSECCR_EL1 provides a mechanism for an operating system
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
      */
     { .name = "MDCCINT_EL1", .state = ARM_CP_STATE_BOTH,
       .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 0,
-      .access = PL1_RW, .accessfn = access_tda,
+      .access = PL1_RW, .accessfn = access_tdcc,
       .type = ARM_CP_NOP },
     /*
      * Dummy DBGCLAIM registers.
-- 
2.34.1

Update the ID registers for TCG's '-cpu max' to report the
presence of FEAT_FGT Fine-Grained Traps support.

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_ETS (Enhanced Translation Synchronization)
 - FEAT_EVT (Enhanced Virtualization Traps)
 - FEAT_FCMA (Floating-point complex number instructions)
+- FEAT_FGT (Fine-Grained Traps)
 - FEAT_FHM (Floating-point half-precision multiplication instructions)
 - FEAT_FP16 (Half-precision floating-point data processing)
 - FEAT_FRINTTS (Floating-point to integer instructions)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
     t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16_2, 2); /* 16k stage2 supported */
     t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN64_2, 2); /* 64k stage2 supported */
     t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN4_2, 2);  /*  4k stage2 supported */
+    t = FIELD_DP64(t, ID_AA64MMFR0, FGT, 1);       /* FEAT_FGT */
     cpu->isar.id_aa64mmfr0 = t;
 
     t = cpu->isar.id_aa64mmfr1;
-- 
2.34.1